Lecture 3 SStreaming Data Systems and Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Real time systems

Pravin Y Pawar
Responding in Real Time

Few Examples
• World is now operating more and more in the now
• Ability to process data as it arrives (processing data in the present)

• Your twitter friend / celebrity posts a tweet, and you see it almost immediately…
• You are tracking flights around New Delhi Live…
• You are using the nseindia.com tracker page to track your favourite stocks…
Classification of Real Time Systems

Source : Streaming Data by Andrew Psaltis


A generic real-time system with consumers
Consumers are connected to the service online

Source : Adapted from Streaming Data by Andrew Psaltis


A generic real-time system with consumers(2)
Examples discussion
• In each of the examples, is it reasonable to conclude that
 the time delay may only last for seconds?
 no life is at risk?
 an occasional delay for minutes would not cause total system failure?
A generic real-time system without consumers
Consumers are not connected to the service but service continues its operation

Source : Adapted from Streaming Data by


Andrew Psaltis
A generic real-time system without consumers(2)

Examples discussion
• A tweet is posed on twitter
• The Live Flight Tracking service tracking flights
• Real time quotes monitoring application is tracking the stock quotes

• Does focusing on the data processing and taking consumers of the data out of picture change
your answer?

• Difference between soft real time and near real time becoming blurry
• Together they are just termed as real time
Difference between real-time and
streaming systems

Pravin Y Pawar
Real-time systems again
Soft or near real time systems
• Based on the delay experience by consumers – soft or near real time systems
• Difference between them is blurring , hard to distinguish
• Two parts are present in larger picture

Source : Adapted from Streaming Data by Andrew Psaltis


Breaking the real time system

• Lets divide the real time system into two parts


 Left part Non-hard real-time system
 Right part client consuming data

Source : Adapted from Streaming Data by Andrew Psaltis


Streaming Data Systems
Defined
• Computation part of real time system operating in non-hard real-time manner
• But Client not consuming the processed data
 Due to network issues, application requirement, no application running
• Clients consume data when they need it

• Streaming data system


 Non-hard real time system with clients that consume data when they need it
First view of Streaming Data System

Source : Adapted from Streaming Data by Andrew Psaltis


Streaming Data Systems(2)
Examples revisited
• Lets divide the earlier discussed examples into two parts and identify the streaming part of it

• Twitter
 A streaming system that processes tweets and allows clients to request the tweets when needed
• Flight Tracking System
 A streaming system that processes most recent flight status data and allows client to request the latest
data for particular flight
• Real time Quotes Tracking System
 A streaming system that processes the price quotes of stocks and allows clients to request the latest
quote of stock
Difference between Batch
Processing and Stream Processing

Pravin Y Pawar
Batch Processing System
Defined
• An efficient way of processing high volumes of data is where a group of transactions is collected
over a period of time
• Data is collected, entered, processed and then the batch results are produced
 Hadoop is focused on batch data processing
• Requires separate programs for input, process and output
• Huge volume of storage is required
• Data is sorted and then processed in sequential manner
• No hard timelines defined
• Sequential jobs are executed in repeated manner over fixed interval
• An example is payroll and billing systems
Streaming Data Systems
Defined
• Involves a continual input, process and output of data
• Data must be processed in a small time period (or near real time)
 Apache Storm, Spark Stream processing are frameworks meant for the same
• Allows an organization the ability to take immediate action for those times when acting within
seconds or minutes is significant
• No storage required if event need not be stored
• Data is processed as and when its made available to the system
• Processing has to happen in fixed / hard time lines
• Examples Radar systems, customer services and bank ATMs
Difference

Source : https://aws.amazon.com/streaming-data/
Frameworks

Source : https://www.datasciencecentral.com/profiles/blogs/batch-vs-real-time-data-processing
Streaming Data Applications

Pravin Y Pawar
Why is stream Processing needed?

Reason 1
• Some data naturally comes as a never-ending stream of events. To do batch processing, you
need to store it, stop data collection at some time and processes the data.
• Then you have to do the next batch and then worry about aggregating across multiple batches.
• In contrast, streaming handles never ending data streams gracefully and naturally.
• You can detect patterns, inspect results, look at multiple levels of focus, and also easily look at
data from multiple streams simultaneously.

Source : https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97
Why is stream Processing needed? (2)

Reason 2
• Stream processing naturally fit with time series data and detecting patterns over time.
• For example, if you are trying to detect the length of a web session in a never-ending stream (
this is an example of trying to detect a sequence).
• It is very hard to do it with batches as some session will fall into two batches.
• Stream processing can handle this easily.
Why is stream Processing needed? (3)
Reason 3
• Batch processing lets the data build up and try to process them at once while stream processing
process data as they come in hence spread the processing over time.
• Hence stream processing can work with a lot less hardware than batch processing.
• Furthermore, stream processing also enables approximate query processing via systematic load
shedding.
• Hence stream processing fits naturally into use cases where approximate answers are sufficient.
Why is stream Processing needed? (4)
Reason 4
• Finally, there are a lot of streaming data available ( e.g. customer transactions, activities,
website visits) and they will grow faster with IoT use cases ( all kind of sensors).
• Streaming is a much more natural model to think about and program those use cases.
Streaming Data Sources

Source : https://mapr.com/blog/real-time-streaming-data-pipelines-apache-apis-
kafka-spark-streaming-and-hbase/
Streaming Data Examples

• Sensors in transportation vehicles, industrial equipment, and farm machinery send data to a
streaming application. The application monitors performance, detects any potential defects in
advance, and places a spare part order automatically preventing equipment down time.
• A financial institution tracks changes in the stock market in real time, computes value-at-risk, and
automatically rebalances portfolios based on stock price movements.
• A real-estate website tracks a subset of data from consumers’ mobile devices and makes real-time
property recommendations of properties to visit based on their geo-location.

• A solar power company has to maintain power throughput for its customers, or pay penalties. It
implemented a streaming data application that monitors of all of panels in the field, and schedules
service in real time, thereby minimizing the periods of low throughput from each panel and the
associated penalty payouts.

• A media publisher streams billions of click stream records from its online properties, aggregates and
enriches the data with demographic information about users, and optimizes content placement on its
site, delivering relevancy and better experience to its audience.
• An online gaming company collects streaming data about player-game interactions, and feeds the
data into its gaming platform. It then analyzes the data in real-time, offers incentives and dynamic
experiences to engage its players.

Source : https://aws.amazon.com/streaming-data/
Who is using Stream Processing?

Streaming Data Use cases


• Algorithmic Trading, Stock Market Surveillance,
• Smart Patient Care
• Monitoring a production line
• Supply chain optimizations
• Intrusion, Surveillance and Fraud Detection
• Most Smart Device Applications: Smart Car, Smart Home ..
• Smart Grid — (e.g. load prediction and outlier plug detection see Smart grids, 4 Billion events,
throughout in range of 100Ks)
• Traffic Monitoring, Geofencing, Vehicle, and Wildlife tracking — e.g. TFL London Transport
Management System
• Sports analytics — Augment Sports with real-time analytics
• Context-aware promotions and advertising
• Computer system and network monitoring
• Predictive Maintenance, (e.g. Machine Learning Techniques for Predictive Maintenance)
• Geospatial data processing
Source : https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97
Usage of Stream Processing

Pravin Y Pawar
Uses of Stream Processing

• Example Applications
 Fraud detection applications monitors the usage of credit cards for detection of unexpected
patterns
 Stock trading system looks for the opportunities of stock values and execute the trades
based on those patterns
 Manufacturing systems keeps an eye on the assembly / manufacturing process for the
defective or malfunctioning machine
 Intelligence systems can keep track of activities of aggressors and raise alarm if something
unusual is noticed about them
 Etc.

Concepts : Designing Data Intensive Applications by Martin Kleppmann


Other Applications
Complex Event Processing CEP
• Developed in 90’s
• Can specify rules to search for certain patterns of events in streams
• Can use programming constructs or GUIs to specify the conditions for patterns
• Outcomes can be displayed visually or alerts can be raised when condition is fulfilled
• Outcomes are complex event in nature, hence is the name
• Continuous queries model is used for identifying patterns
 Query is always running
 Events flows through it

• Examples
 Esper
 IBM Infosphere
 TIBCO StreamBase
 Oracle CEP
Other Applications (2)
Stream Analytics
• Recent application area
• Difference between CEP and Streaming Analytics diminishing
• More oriented towards aggregations and metrics over large number of events
 Measuring rate at which system heart-beat messages are transmitted
 Calculation rolling averages for certain periods
 Comparing the statistics about two or more streams
• Usually windows of time are used
• Techniques used can produce exact outcomes or probabilistic outcomes

• Examples
 Apache Storm
 Spark Streaming
 Flink
 Apache Samza
Other Applications (3)
Materialized Views
• Materialized views are kind of dependent applications like caches, search index, data
warehouses etc.
• When something changes in underlying systems like databases, all dependents needs to be
updated as well
• Can be achieved by tracking what all things are changing in the applications and derived apps
can be updated based on tracked changes
Other Applications (4)
Stream Searching
• CEP mostly monitors multiple event together, correlation between the incoming events
• Sometimes individual events also needs to be monitored for certain complex conditions
• For example,
 User is interested to get informed when a particular property matching his requirement and
budget is listed for sale / rent
 User is interested to get informed about the availability of the products which are currently
unavailable for sale
Sources of Streaming Data

Pravin Y Pawar
Streaming Data Sources
1. Operational Monitoring
• Operational Monitoring
 Ex: Tracking the performance of the physical systems that power the Internet
 Temperature of the processor, speed of the fan and the voltage draw of their power supplies
, state of their disk drives ( processor load, network activity, and storage access times )
etc…
 Data is collected and aggregated in real time

Source : Adapted from Real-Time Analytics , Byron Ellis


Streaming Data Sources (2)
2. Web Analytics
• Web Analytics - track activity on a website
• Circulation numbers of a newspaper, the number of unique visitors for a webpage etc
• Structure of websites and their impact on various metrics of interest – A/B Testing – done in
sequence vs parallel
• Applications – aggregation for billing – product recommendations (NetFlix)
Streaming Data Sources (3)
3. Online Advertising

• Contributes immensely for the large and growing portion of traffic.


• A visitor arrives at a website via a modern advertising exchange
• Call is made to a number of bidding agencies (perhaps 30 or 40 at a time), who place bids on
the page view in real time by the exchange
• Auction is run, and the advertisement from the winning party is displayed
• (1) to (3) happens while the rest of the page is loading; the elapsed time is less than about 100
milliseconds
• A page with several advertisements needs simultaneous effort for each advertisements
Streaming Data Sources (4)
4. Social Media
• Sources like twitter, facebook, Instagram, google+, flickr….
• Data is collected and disseminated in real time
• “In 2011, Twitter users in New York City received information about an earth-quake outside of
Washington, D.C. about 30 seconds before the tremors struck New York itself “
• Data highly unstructured and some form of “natural language” data that must be parsed,
processed, and not well understood by automated systems
Streaming Data Sources (5)
5. Mobile data and IoT
• Measure the physical world
• Wristband that measures sleep activity
• Trigger an automated coffee maker when the user gets a poor night’s sleep and needs to be
alert the next day.
Thank You!
In our next session: Difference between Batch Processing and Stream
Processing

You might also like