Lecture 3 SStreaming Data Systems and Applications
Lecture 3 SStreaming Data Systems and Applications
Lecture 3 SStreaming Data Systems and Applications
Pravin Y Pawar
Responding in Real Time
Few Examples
• World is now operating more and more in the now
• Ability to process data as it arrives (processing data in the present)
• Your twitter friend / celebrity posts a tweet, and you see it almost immediately…
• You are tracking flights around New Delhi Live…
• You are using the nseindia.com tracker page to track your favourite stocks…
Classification of Real Time Systems
Examples discussion
• A tweet is posed on twitter
• The Live Flight Tracking service tracking flights
• Real time quotes monitoring application is tracking the stock quotes
• Does focusing on the data processing and taking consumers of the data out of picture change
your answer?
• Difference between soft real time and near real time becoming blurry
• Together they are just termed as real time
Difference between real-time and
streaming systems
Pravin Y Pawar
Real-time systems again
Soft or near real time systems
• Based on the delay experience by consumers – soft or near real time systems
• Difference between them is blurring , hard to distinguish
• Two parts are present in larger picture
• Twitter
A streaming system that processes tweets and allows clients to request the tweets when needed
• Flight Tracking System
A streaming system that processes most recent flight status data and allows client to request the latest
data for particular flight
• Real time Quotes Tracking System
A streaming system that processes the price quotes of stocks and allows clients to request the latest
quote of stock
Difference between Batch
Processing and Stream Processing
Pravin Y Pawar
Batch Processing System
Defined
• An efficient way of processing high volumes of data is where a group of transactions is collected
over a period of time
• Data is collected, entered, processed and then the batch results are produced
Hadoop is focused on batch data processing
• Requires separate programs for input, process and output
• Huge volume of storage is required
• Data is sorted and then processed in sequential manner
• No hard timelines defined
• Sequential jobs are executed in repeated manner over fixed interval
• An example is payroll and billing systems
Streaming Data Systems
Defined
• Involves a continual input, process and output of data
• Data must be processed in a small time period (or near real time)
Apache Storm, Spark Stream processing are frameworks meant for the same
• Allows an organization the ability to take immediate action for those times when acting within
seconds or minutes is significant
• No storage required if event need not be stored
• Data is processed as and when its made available to the system
• Processing has to happen in fixed / hard time lines
• Examples Radar systems, customer services and bank ATMs
Difference
Source : https://aws.amazon.com/streaming-data/
Frameworks
Source : https://www.datasciencecentral.com/profiles/blogs/batch-vs-real-time-data-processing
Streaming Data Applications
Pravin Y Pawar
Why is stream Processing needed?
Reason 1
• Some data naturally comes as a never-ending stream of events. To do batch processing, you
need to store it, stop data collection at some time and processes the data.
• Then you have to do the next batch and then worry about aggregating across multiple batches.
• In contrast, streaming handles never ending data streams gracefully and naturally.
• You can detect patterns, inspect results, look at multiple levels of focus, and also easily look at
data from multiple streams simultaneously.
Source : https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97
Why is stream Processing needed? (2)
Reason 2
• Stream processing naturally fit with time series data and detecting patterns over time.
• For example, if you are trying to detect the length of a web session in a never-ending stream (
this is an example of trying to detect a sequence).
• It is very hard to do it with batches as some session will fall into two batches.
• Stream processing can handle this easily.
Why is stream Processing needed? (3)
Reason 3
• Batch processing lets the data build up and try to process them at once while stream processing
process data as they come in hence spread the processing over time.
• Hence stream processing can work with a lot less hardware than batch processing.
• Furthermore, stream processing also enables approximate query processing via systematic load
shedding.
• Hence stream processing fits naturally into use cases where approximate answers are sufficient.
Why is stream Processing needed? (4)
Reason 4
• Finally, there are a lot of streaming data available ( e.g. customer transactions, activities,
website visits) and they will grow faster with IoT use cases ( all kind of sensors).
• Streaming is a much more natural model to think about and program those use cases.
Streaming Data Sources
Source : https://mapr.com/blog/real-time-streaming-data-pipelines-apache-apis-
kafka-spark-streaming-and-hbase/
Streaming Data Examples
• Sensors in transportation vehicles, industrial equipment, and farm machinery send data to a
streaming application. The application monitors performance, detects any potential defects in
advance, and places a spare part order automatically preventing equipment down time.
• A financial institution tracks changes in the stock market in real time, computes value-at-risk, and
automatically rebalances portfolios based on stock price movements.
• A real-estate website tracks a subset of data from consumers’ mobile devices and makes real-time
property recommendations of properties to visit based on their geo-location.
• A solar power company has to maintain power throughput for its customers, or pay penalties. It
implemented a streaming data application that monitors of all of panels in the field, and schedules
service in real time, thereby minimizing the periods of low throughput from each panel and the
associated penalty payouts.
• A media publisher streams billions of click stream records from its online properties, aggregates and
enriches the data with demographic information about users, and optimizes content placement on its
site, delivering relevancy and better experience to its audience.
• An online gaming company collects streaming data about player-game interactions, and feeds the
data into its gaming platform. It then analyzes the data in real-time, offers incentives and dynamic
experiences to engage its players.
Source : https://aws.amazon.com/streaming-data/
Who is using Stream Processing?
Pravin Y Pawar
Uses of Stream Processing
• Example Applications
Fraud detection applications monitors the usage of credit cards for detection of unexpected
patterns
Stock trading system looks for the opportunities of stock values and execute the trades
based on those patterns
Manufacturing systems keeps an eye on the assembly / manufacturing process for the
defective or malfunctioning machine
Intelligence systems can keep track of activities of aggressors and raise alarm if something
unusual is noticed about them
Etc.
• Examples
Esper
IBM Infosphere
TIBCO StreamBase
Oracle CEP
Other Applications (2)
Stream Analytics
• Recent application area
• Difference between CEP and Streaming Analytics diminishing
• More oriented towards aggregations and metrics over large number of events
Measuring rate at which system heart-beat messages are transmitted
Calculation rolling averages for certain periods
Comparing the statistics about two or more streams
• Usually windows of time are used
• Techniques used can produce exact outcomes or probabilistic outcomes
• Examples
Apache Storm
Spark Streaming
Flink
Apache Samza
Other Applications (3)
Materialized Views
• Materialized views are kind of dependent applications like caches, search index, data
warehouses etc.
• When something changes in underlying systems like databases, all dependents needs to be
updated as well
• Can be achieved by tracking what all things are changing in the applications and derived apps
can be updated based on tracked changes
Other Applications (4)
Stream Searching
• CEP mostly monitors multiple event together, correlation between the incoming events
• Sometimes individual events also needs to be monitored for certain complex conditions
• For example,
User is interested to get informed when a particular property matching his requirement and
budget is listed for sale / rent
User is interested to get informed about the availability of the products which are currently
unavailable for sale
Sources of Streaming Data
Pravin Y Pawar
Streaming Data Sources
1. Operational Monitoring
• Operational Monitoring
Ex: Tracking the performance of the physical systems that power the Internet
Temperature of the processor, speed of the fan and the voltage draw of their power supplies
, state of their disk drives ( processor load, network activity, and storage access times )
etc…
Data is collected and aggregated in real time