Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
Subject: Port Information Systems and Platforms: Proposed By: Prof Tali
Introduction......................................................................
1. Definition............................................................................
Conclusion........................................................................8
Webographie.....................................................................8
1
Introduction
In today‘s world, every tiny gadget is a potential data source, adding to the huge data
bank. Also, every bit of data generated is practically valued, be it enterprise data or personal
data, historical or transactional data. This data generated through large customer transactions,
social networking sites is varied, voluminous and rapidly generating. All this data prove
storage and processing crisis for the enterprises. The data being generated by massive web
logs, healthcare data sources, point of sale data, satellite imagery needs to be stored and
handled well. Although, this huge amount of data proves to be a very useful knowledge bank
if handled carefully. Hence big companies are investing largely in the research and harnessing
of this data. By all the predilections today for Big Data, one can easily state Big Data
technology as the next best thing to learn. All the attention it has been getting over the past
decade is but due to its overwhelming need in the industry. So how can we define big data?
Where does it come from ? And how can we handle it and benefit from it?
1.Definition
While the term “big data” is relatively new, the act of gathering and storing large
amounts of information for eventual analysis is ages old. The concept gained momentum in
the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition
of big data as the three Vs:
Volume: Organizations collect data from a variety of sources, including business
transactions, social media and information from sensor or machine-to-machine data. In the
past, storing it would’ve been a problem – but new technologies (such as Hadoop) have eased
the burden.
Velocity: Data streams in at an unprecedented speed and must be dealt with in a timely
manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of
data in near-real time.
Variety: Data comes in all types of formats – from structured, numeric data in
traditional databases to unstructured text documents, email, video, audio, stock ticker data and
financial transactions.
When it comes to big data we can consider two additional dimensions :
Variability: In addition to the increasing velocities and varieties of data, data flows
can be highly inconsistent with periodic peaks. Is something trending in social media? Daily,
seasonal and event-triggered peak data loads can be challenging to manage. Even more so
with unstructured data.
Complexity: Today's data comes from multiple sources, which makes it difficult to
link, match, cleanse and transform data across systems. However, it’s necessary to connect
and correlate relationships, hierarchies and multiple data linkages or your data can quickly
spiral out of control.
2
2.The sources of big data
There are some of many sources of BigData:
5. Broadcastings: Mainly referred to video and audio produced on real time, getting
statistical data from the contents of this kind of electronic data by now is too complex
and implies big computational and communications power, once solved the problems
of converting “digital-analog” contents to “digital-data” contents we will have similar
complications to process it like the ones that we can find on social interactions.
1. Apache Hadoop
The Apache Hadoop software is a framework that allows for the distributed processing
of large data sets across clusters of computers using simple programming models. It is
designed to scale up from single servers to thousands of machines, each offering local
computation and storage. Rather than rely on hardware to deliver high-availability, the library
itself is designed to detect and handle failures at the application layer, so delivering a highly-
available service on top of a cluster of computers, each of which may be prone to failures.
2. Microsoft HDInsight
It is a Big Data solution from Microsoft powered by Apache Hadoop which is
available as a service in the cloud. HDInsight uses Windows Azure Blob storage as the default
file system. This also provides high availability with low cost.
3. NoSQL
While the traditional SQL can be effectively used to handle large amount of structured
data, we need NoSQL (Not Only SQL) to handle unstructured data. NoSQL databases store
unstructured data with no particular schema. Each row can have its own set of column values.
NoSQL gives better performance in storing massive amount of data. There are many open-
source NoSQL DBs available to analyze big Data.
4. Hive
4
This is a distributed data management for Hadoop. This supports SQL-like query
option HiveSQL (HSQL) to access big data. This can be primarily used for Data mining
purpose. This runs on top of Hadoop.
5. Sqoop
This is a tool that connects Hadoop with various relational databases to transfer data.
This can be effectively used to transfer structured data to Hadoop or Hive.
6. PolyBase
This works on top of SQL Server 2012 Parallel Data Warehouse (PDW) and is used to
access data stored in PDW. PDW is a data warehousing appliance built for processing any
volume of relational data and provides an integration with Hadoop allowing us to access non-
relational data as well.
8. Presto
Facebook has developed and recently open-sourced its Query engine (SQL-on-
Hadoop) named Presto which is built to handle petabytes of data. Unlike Hive, Presto does
not depend on MapReduce technique and can quickly retrieve data.
Product Development
Companies like Netflix and Procter & Gamble use big data to anticipate customer demand.
They build predictive models for new products and services by classifying key attributes of
past and current products or services and modeling the relationship between those attributes
and the commercial success of the offerings. In addition, P&G uses data and analytics from
focus groups, social media, test markets, and early store rollouts to plan, produce, and launch
new products.
5
Predictive Maintenance
Factors that can predict mechanical failures may be deeply buried in structured data,
such as the equipment year, make, and model of a machine, as well as in unstructured data
that covers millions of log entries, sensor data, error messages, and engine temperature. By
analyzing these indications of potential issues before the problems happen, organizations can
deploy maintenance more cost effectively and maximize parts and equipment uptime.
Customer Experience
The race for customers is on. A clearer view of customer experience is more possible
now than ever before. Big data enables you to gather data from social media, web visits, call
logs, and other data sources to improve the interaction experience and maximize the value
delivered. Start delivering personalized offers, reduce customer churn, and handle issues
proactively.
When it comes to security, it’s not just a few rogue hackers; you’re up against entire
expert teams. Security landscapes and compliance requirements are constantly evolving. Big
data helps you identify patterns in data that indicate fraud and aggregate large volumes of
information to make regulatory reporting much faster.
Machine Learning
Machine learning is a hot topic right now. And data—specifically big data—is one of
the reasons why. We are now able to teach machines instead of program them. The availability
of big data to train machine-learning models makes that happen.
Operational Efficiency
Operational efficiency may not always make the news, but it’s an area in which big
data is having the most impact. With big data, you can analyze and assess production,
customer feedback and returns, and other factors to reduce outages and anticipate future
demands. Big data can also be used to improve decision-making in line with current market
demand.
Drive Innovation
Big data can help you innovate by studying interdependencies between humans,
institutions, entities, and process and then determining new ways to use those insights. Use
data insights to improve decisions about financial and planning considerations. Examine
trends and what customers want to deliver new products and services. Implement dynamic
pricing. There are endless possibilities.
6
The computing power of big data analytics enables us to decode entire DNA strings in
minutes and will allow us to find new cures and better understand and predict disease
patterns. Just think of what happens when all the individual data from smart watches and
wearable devices can be used to apply it to millions of people and their various diseases. The
clinical trials of the future won't be limited by small sample sizes but could potentially include
everyone!
Apple's new health app, called ResearchKit, has effectively just turned your phone into
a biomedical research device. Researchers can now create studies through which they collect
data and input from users phones to compile data for health studies. Your phone might track
how many steps you take in a day, or prompt you to answer questions about how you feel
after your chemo, or how your Parkinson's disease is progressing. It's hoped that making the
process easier and more automatic will dramatically increase the number of participants a
study can attract as well as the fidelity of the data.
Most elite sports have now embraced big data analytics. We have the IBM
SlamTracker tool for tennis tournaments; we use video analytics that track the performance of
every player in a football or baseball game, and sensor technology in sports equipment such
as basket balls or golf clubs allows us to get feedback (via smart phones and cloud servers) on
our game and how to improve it. Many elite sports teams also track athletes outside of the
sporting environment - using smart technology to track nutrition and sleep, as well as social
media conversations to monitor emotional wellbeing.
Science and research is currently being transformed by the new possibilities big data
brings. Take, for example, CERN, the nuclear physics lab with its Large Hadron Collider, the
world's largest and most powerful particle accelerator. Experiments to unlock the secrets of
our universe - how it started and works - generate huge amounts of data.
The CERN data center has 65,000 processors to analyze its 30 petabytes of data.
However, it uses the computing powers of thousands of computers distributed across 150 data
centers worldwide to analyze the data. Such computing powers can be leveraged to transform
so many other areas of science and research.
Conclusion
The increase in the amount of data available presents both opportunities and problems.
In general, having more data on one’s customers (and potential customers) should allow
7
companies to better tailor their products and marketing efforts in order to create the highest
level of satisfaction and repeat business.
Companies that are able to collect large amount of data are provided with the
opportunity to conduct deeper and richer analysis. This data can be collected from publicly
shared comments on social networks and websites, voluntarily gathered from personal
electronics and apps, through questionnaires, product purchases, and electronic check-ins. The
presence of sensors and other inputs in smart devices allows for data to be gathered across a
broad spectrum of situations and circumstances.
Webographie :
https://www.bernardmarr.com/default.asp?contentID=1076
https://www.investopedia.com/terms/b/big-data.asp
http://www.hadoopadmin.co.in/sources-of-bigdata/
https://bigdata-madesimple.com/top-big-data-tools-used-to-store-and-analyse-data/
https://www.oracle.com/big-data/guide/what-is-big-data.html