Streaming Data: Understanding the real-time pipeline

Ebook398 pages4 hours

Streaming Data: Understanding the real-time pipeline

Name: Streaming Data: Understanding the real-time pipeline
Author: Andrew Psaltis
ISBN: 9781638357247

By Andrew Psaltis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Summary

Streaming Data introduces the concepts and requirements of streaming and real-time data systems. The book is an idea-rich tutorial that teaches you to think about how to efficiently interact with fast-flowing data.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Technology

As humans, we're constantly filtering and deciphering the information streaming toward us. In the same way, streaming data applications can accomplish amazing tasks like reading live location data to recommend nearby services, tracking faults with machinery in real time, and sending digital receipts before your customers leave the shop. Recent advances in streaming data technology and techniques make it possible for any developer to build these applications if they have the right mindset. This book will let you join them.

About the Book

Streaming Data is an idea-rich tutorial that teaches you to think about efficiently interacting with fast-flowing data. Through relevant examples and illustrated use cases, you'll explore designs for applications that read, analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies like Spark, Storm, Kafka, Flink, RabbitMQ, and more. This book offers the perfect balance between big-picture thinking and implementation details.

What's Inside

The right way to collect real-time data
Architecting a streaming pipeline
Analyzing the data
Which technologies to use and when

About the Reader

Written for developers familiar with relational database concepts. No experience with streaming or real-time applications required.

About the Author

Andrew Psaltis is a software engineer focused on massively scalable real-time analytics.

Table of Contents

Introducing streaming data
Getting data from clients: data ingestion
Transporting the data from collection tier: decoupling the data pipeline
Analyzing streaming data
Algorithms for data analysis
Storing the analyzed or collected data
Making the data available
Consumer device capabilities and limitations accessing the data
Analyzing Meetup RSVPs in real time

Skip carousel

LanguageEnglish

PublisherManning

Release dateMay 31, 2017

ISBN9781638357247

Author

Andrew Psaltis

Related authors

Skip carousel

Related to Streaming Data

Related ebooks

Skip carousel

Designing Cloud Data Platforms
Ebook
Designing Cloud Data Platforms
byDanil Zburivsky
Rating: 0 out of 5 stars
0 ratings
Grokking Streaming Systems: Real-time event processing
Ebook
Grokking Streaming Systems: Real-time event processing
byJosh Fischer
Rating: 5 out of 5 stars
5/5
Cloud Native Patterns: Designing change-tolerant software
Ebook
Cloud Native Patterns: Designing change-tolerant software
byCornelia Davis
Rating: 4 out of 5 stars
4/5
Serverless Architectures on AWS: With examples using AWS Lambda
Ebook
Serverless Architectures on AWS: With examples using AWS Lambda
byPeter Sbarski
Rating: 0 out of 5 stars
0 ratings
Google Cloud Platform in Action
Ebook
Google Cloud Platform in Action
byJohn J. (JJ) Geewax
Rating: 0 out of 5 stars
0 ratings
Go in Practice
Ebook
Go in Practice
byMatt Farina
Rating: 5 out of 5 stars
5/5
Re-Engineering Legacy Software
Ebook
Re-Engineering Legacy Software
byChris Birchall
Rating: 0 out of 5 stars
0 ratings
Visualizing Graph Data
Ebook
Visualizing Graph Data
byCorey Lanum
Rating: 0 out of 5 stars
0 ratings
MLOps Engineering at Scale
Ebook
MLOps Engineering at Scale
byCarl Osipov
Rating: 0 out of 5 stars
0 ratings
Machine Learning Engineering in Action
Ebook
Machine Learning Engineering in Action
byBen Wilson
Rating: 0 out of 5 stars
0 ratings
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
Infrastructure as Code, Patterns and Practices: With examples in Python and Terraform
Ebook
Infrastructure as Code, Patterns and Practices: With examples in Python and Terraform
byRosemary Wang
Rating: 0 out of 5 stars
0 ratings
Operations Anti-Patterns, DevOps Solutions
Ebook
Operations Anti-Patterns, DevOps Solutions
byJeffery Smith
Rating: 0 out of 5 stars
0 ratings
Xamarin in Action: Creating native cross-platform mobile apps
Ebook
Xamarin in Action: Creating native cross-platform mobile apps
byJim Bennett
Rating: 0 out of 5 stars
0 ratings
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
Ebook
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
byJean-Georges Perrin
Rating: 0 out of 5 stars
0 ratings
AI as a Service: Serverless machine learning with AWS
Ebook
AI as a Service: Serverless machine learning with AWS
byPeter Elger
Rating: 1 out of 5 stars
1/5
Graph-Powered Machine Learning
Ebook
Graph-Powered Machine Learning
byAlessandro Negro
Rating: 0 out of 5 stars
0 ratings
Kafka in Action
Ebook
Kafka in Action
byDylan Scott
Rating: 0 out of 5 stars
0 ratings
Data Engineering on Azure
Ebook
Data Engineering on Azure
byVlad Riscutia
Rating: 0 out of 5 stars
0 ratings
AWS Lambda in Action: Event-driven serverless applications
Ebook
AWS Lambda in Action: Event-driven serverless applications
byDanilo Poccia
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Go Web Programming
Ebook
Go Web Programming
bySau Sheong Chang
Rating: 5 out of 5 stars
5/5
Node.js in Practice
Ebook
Node.js in Practice
byMarc Harter
Rating: 0 out of 5 stars
0 ratings
Amazon Web Services in Action
Ebook
Amazon Web Services in Action
byMichael Wittig
Rating: 0 out of 5 stars
0 ratings
Implementing Cloud Design Patterns for AWS
Ebook
Implementing Cloud Design Patterns for AWS
byMarcus Young
Rating: 0 out of 5 stars
0 ratings
Spring Boot in Action
Ebook
Spring Boot in Action
byCraig Walls
Rating: 0 out of 5 stars
0 ratings
Algorithms of the Intelligent Web
Ebook
Algorithms of the Intelligent Web
byDoug McIlwraith
Rating: 0 out of 5 stars
0 ratings
Irresistible APIs: Designing web APIs that developers will love
Ebook
Irresistible APIs: Designing web APIs that developers will love
byKirsten Hunter
Rating: 0 out of 5 stars
0 ratings
Spring Microservices in Action
Ebook
Spring Microservices in Action
byJohn Carnell
Rating: 0 out of 5 stars
0 ratings
Machine Learning Systems: Designs that scale
Ebook
Machine Learning Systems: Designs that scale
byJeffrey Smith
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Storytelling with Data: Let's Practice!
Ebook
Storytelling with Data: Let's Practice!
byCole Nussbaumer Knaflic
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama's Books of 2019
Ebook
The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama's Books of 2019
byShoshana Zuboff
Rating: 4 out of 5 stars
4/5
Algorithms to Live By: The Computer Science of Human Decisions
Ebook
Algorithms to Live By: The Computer Science of Human Decisions
byBrian Christian
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Get Into UX: A foolproof guide to getting your first user experience job
Ebook
Get Into UX: A foolproof guide to getting your first user experience job
byVy Alechnavicius
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Good Code, Bad Code: Think like a software engineer
Ebook
Good Code, Bad Code: Think like a software engineer
byTom Long
Rating: 5 out of 5 stars
5/5
Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you
Ebook
Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you
byMarc A. Palmer
Rating: 5 out of 5 stars
5/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
The Alignment Problem: How Can Machines Learn Human Values?
Ebook
The Alignment Problem: How Can Machines Learn Human Values?
byBrian Christian
Rating: 4 out of 5 stars
4/5
Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis
Ebook
Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis
bySebastien Donadio
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 4 out of 5 stars
4/5
Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit
Ebook
Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit
byJeremy P. Jones
Rating: 4 out of 5 stars
4/5
Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
Ebook
Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
byAlex J. Gutman
Rating: 5 out of 5 stars
5/5
Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI
Ebook
Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI
byGreg Deckler
Rating: 5 out of 5 stars
5/5
Learning the Chess Openings
Ebook
Learning the Chess Openings
byJef Kaan
Rating: 5 out of 5 stars
5/5
UX/UI Design Playbook
Ebook
UX/UI Design Playbook
byOlha Bahaieva
Rating: 4 out of 5 stars
4/5
Blender 3D Basics Beginner's Guide Second Edition
Ebook
Blender 3D Basics Beginner's Guide Second Edition
byGordon Fisher
Rating: 5 out of 5 stars
5/5
ChatGPT
Ebook
ChatGPT
byGary Stevens
Rating: 3 out of 5 stars
3/5

Related podcast episodes

Skip carousel

Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
API First, Lifecycles and Governance
Podcast episode
API First, Lifecycles and Governance
byThe Cloudcast
0 ratings
0% found this document useful
Introduction to Data Mesh
Podcast episode
Introduction to Data Mesh
byThe Cloudcast
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
All Roads Lead to Kubernetes with Kendall Miller: Kendall Miller is the president at Fairwinds, a shop that helps teams optimize containerized apps and get the most out of Kubernetes that was formerly called ReactiveOps. He's also the host of Authority Issues, a podcast about leadership. Prior to these p
Podcast episode
All Roads Lead to Kubernetes with Kendall Miller: Kendall Miller is the president at Fairwinds, a shop that helps teams optimize containerized apps and get the most out of Kubernetes that was formerly called ReactiveOps. He's also the host of Authority Issues, a podcast about leadership. Prior to these p
byScreaming in the Cloud
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
An Introduction to the Go Programming language with Andrew Gerrand: Andrew Gerrand is a developer at Google who works on the Go Programming Language (golang). Why Go and why now? What kinds of problems does Go solve that aren't a good match for existing languages? How does Go compare to C++ and improve upon it?
Podcast episode
An Introduction to the Go Programming language with Andrew Gerrand: Andrew Gerrand is a developer at Google who works on the Go Programming Language (golang). Why Go and why now? What kinds of problems does Go solve that aren't a good match for existing languages? How does Go compare to C++ and improve upon it?
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
#76 - Learning Domain-Driven Design - Vladik Khononov
Podcast episode
#76 - Learning Domain-Driven Design - Vladik Khononov
byTech Lead Journal
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
Podcast episode
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
Distributed Systems Research with Peter Alvaro: Every software company is a distributed system, and distributed systems fail in unexpected ways. This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering.
Podcast episode
Distributed Systems Research with Peter Alvaro: Every software company is a distributed system, and distributed systems fail in unexpected ways. This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook: An interview about the Querybook SQL IDE for big data analytics and how you can use it to build more expressive and maintainable analytics.
Podcast episode
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook: An interview about the Querybook SQL IDE for big data analytics and how you can use it to build more expressive and maintainable analytics.
byData Engineering Podcast
0 ratings
0% found this document useful
#40 Becoming a Data Scientist
Podcast episode
#40 Becoming a Data Scientist
byDataFramed
100%
100% found this document useful
Edge Databases with Glauber Costa: Picture a user interacting with a web app on their phone. When they tap the screen the app triggers communication with a server, which in turn communicates with a database. This process then happens in reverse to eventually update what the user sees on...
Podcast episode
Edge Databases with Glauber Costa: Picture a user interacting with a web app on their phone. When they tap the screen the app triggers communication with a server, which in turn communicates with a database. This process then happens in reverse to eventually update what the user sees on...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
#444: [INTRODUCING] Amazon DevOps Guru: Amazon DevOps Guru is a machine learning powered service that makes it easy to improve an applicatio
Podcast episode
#444: [INTRODUCING] Amazon DevOps Guru: Amazon DevOps Guru is a machine learning powered service that makes it easy to improve an applicatio
byAWS Podcast
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
Kafka Streams with Jay Kreps: Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another?
Podcast episode
Kafka Streams with Jay Kreps: Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another?
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Taming Distributed Architecture with Caitie McCaffrey: Distributed systems programming will always be a world of tradeoffs -- there is no silver bullet in the future. But life can be made easier with tactics such as the actor pattern and the use of conflict-free replicated data types (CRDTs). -
Podcast episode
Taming Distributed Architecture with Caitie McCaffrey: Distributed systems programming will always be a world of tradeoffs -- there is no silver bullet in the future. But life can be made easier with tactics such as the actor pattern and the use of conflict-free replicated data types (CRDTs). -
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
Podcast episode
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
EP 22: What is OAuth 2?
Podcast episode
EP 22: What is OAuth 2?
byPro Coder Show
0 ratings
0% found this document useful
Running Databases on Kubernetes
Podcast episode
Running Databases on Kubernetes
byThe Cloudcast
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
#21 - Domain-Driven Design and Event-Driven Architecture - Vaughn Vernon
Podcast episode
#21 - Domain-Driven Design and Event-Driven Architecture - Vaughn Vernon
byTech Lead Journal
0 ratings
0% found this document useful
#176 - Acing the System Design Interview - Zhiyong Tan
Podcast episode
#176 - Acing the System Design Interview - Zhiyong Tan
byTech Lead Journal
0 ratings
0% found this document useful
Patterns of distributed systems: In today’s cloud-first world, distributed systems are everywhere. Unmesh Joshi gives an insight into his work looking at distributed systems — from distributed databases such as Cassandra to messaging brokers such as Kafka or infrastructure...
Podcast episode
Patterns of distributed systems: In today’s cloud-first world, distributed systems are everywhere. Unmesh Joshi gives an insight into his work looking at distributed systems — from distributed databases such as Cassandra to messaging brokers such as Kafka or infrastructure...
byThoughtworks Technology Podcast
0 ratings
0% found this document useful
97 Things Every Java Programmer Should Know with Kevlin Henney: In this episode, 97 Things Every Java Programmer …
Podcast episode
97 Things Every Java Programmer Should Know with Kevlin Henney: In this episode, 97 Things Every Java Programmer …
byCoding Over Cocktails
0 ratings
0% found this document useful
Data Mechanics: Data Engineering with Jean-Yves Stephan: Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud.
Podcast episode
Data Mechanics: Data Engineering with Jean-Yves Stephan: Apache Spark is a popular open source analytics engine for large-scale data processing. Applications can be written in Java, Scala, Python, R, and SQL. These applications have flexible options to run on like Kubernetes or in the cloud.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Revisiting The Technical And Social Benefits Of The Data Mesh: An interview with Zhamak Dehghani about her experience working with the community that has grown up around her idea of the data mesh and the lessons that she has learned.
Podcast episode
Revisiting The Technical And Social Benefits Of The Data Mesh: An interview with Zhamak Dehghani about her experience working with the community that has grown up around her idea of the data mesh and the lessons that she has learned.
byData Engineering Podcast
0 ratings
0% found this document useful
Building event-driven microservices with Adam Bellemare: Event-driven architectures are known to improve a…
Podcast episode
Building event-driven microservices with Adam Bellemare: Event-driven architectures are known to improve a…
byCoding Over Cocktails
0 ratings
0% found this document useful
Solution Architects with Miles Ward and Grace Mollison: The Director of Solutions Miles Ward and Cloud Solutions Architect Grace Mollison join us to discuss Solution Architects - what they do and how they interact with Customers at Google Cloud Platform.
Podcast episode
Solution Architects with Miles Ward and Grace Mollison: The Director of Solutions Miles Ward and Cloud Solutions Architect Grace Mollison join us to discuss Solution Architects - what they do and how they interact with Customers at Google Cloud Platform.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful

Skip carousel

Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
An Introduction To Rabbitmq
Linux Format
Article
An Introduction To Rabbitmq
Jun 29, 2021
RabbitMQ is a Message Broker, which means that it can safely hold messages generated by applications and make them available to other applications. The main advantages are reliability, support for clustering and high-availability queues, tracing capa
1 min read
Basic Concepts
Linux Format
Article
Basic Concepts
Jul 2, 2019
A messaging system such as Kafka enables you to send messages between processes, applications and servers. Applications connect to Kafka to send or get data. Strictly speaking, a Kafka ‘topic’ is a unit of storage in Kafka: data in Kafka is stored in
1 min read
Are Docker Containers a Good Idea for Laptops?
Maximum PC
Article
Are Docker Containers a Good Idea for Laptops?
Mar 31, 2020
Docker containers are cool. If you haven’t yet played with Docker, you’re missing a large world of easily deployed applications. For example, I can deploy NodeRed, Plex, Jupyter Lab, and Nextcloud servers, and run them behind a Traefik reverse proxy
2 min read
Metrics & Visuals In Go
Linux Format
Article
Metrics & Visuals In Go
Nov 17, 2020
Mihalis Tsoukalos is a DataOps engineer and a technical writer. He’s the author of Go Systems Programming and Mastering Go, 2nd edition. The subject of this tutorial is two-fold. First, it’s about creating a Go application that exports metrics to P
7 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Introduction to eBPF Revolutionizing Linux Kernel Technology
Techfastly
Article
Introduction to eBPF Revolutionizing Linux Kernel Technology
Apr 1, 2022
6 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Create Visualisations And Cool Dashboards
Linux Format
Article
Create Visualisations And Cool Dashboards
Jan 14, 2020
8 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
“We Might Beliving On ‘The Edge’, But That’s A Passing Label That Now Only Reflects A By Gone Way Of Working”
PC Pro Magazine
Article
“We Might Beliving On ‘The Edge’, But That’s A Passing Label That Now Only Reflects A By Gone Way Of Working”
Aug 13, 2020
8 min read
THE NEW LEADERSHIP IMPERATIVE: Embracing Digital Transformation
Rotman Management
Article
THE NEW LEADERSHIP IMPERATIVE: Embracing Digital Transformation
Jan 1, 2018
12 min read
Letters
Computeractive
Article
Letters
May 11, 2022
I’d like to ask Computeractive a question: why are you so obsessed with Windows 11? Every other news story you publish seems to be about new tools added to it, and yet interest in the operating system seems lukewarm at best. It makes me wonder whethe
6 min read
Picture In A Mainframe
Linux Format
Article
Picture In A Mainframe
Jul 2, 2019
11 min read
Netscape Navigator: Fast Lane To Success
PC Pro Magazine
Article
Netscape Navigator: Fast Lane To Success
Aug 8, 2024
If you ever used Netscape Navigator in the early days of the World Wide Web, then your memory likely comprises two things: the sight of comets falling to earth around the “N” of the Netscape logo as web pages loaded, and the fact that you’d be starin
9 min read
Enterprise Soaring Success
Linux Format
Article
Enterprise Soaring Success
Aug 27, 2019
7 min read
Doctor
Maximum PC
Article
Doctor
Aug 16, 2022
⟶ Quick Privacy Tips ⟶ A New Browser ⟶ PortableApps In the July issue, you had a news article titled “FBI Searches Data Without Warrants”. They aren’t just spying on people, they act on it, too. Thousands of arrests are made every year due to the FBI
5 min read
Your Favourite Programs Of 2021
Computeractive
Article
Your Favourite Programs Of 2021
Jan 5, 2022
www.snipca.com/37141 O&O AppBuster played a leading role in Issue 599’s Cover Feature (pictured above right), just two months into 2021, and remained your favourite program throughout the rest of the year. We explained how it can remove all unwanted
2 min read
“Real Security People Shake Their Heads At Such A Concept. I Have Two Responses”
PC Pro Magazine
Article
“Real Security People Shake Their Heads At Such A Concept. I Have Two Responses”
May 9, 2024
7 min read
Netscape Navigator: Fast Lane To Success
APC
Article
Netscape Navigator: Fast Lane To Success
Aug 19, 2024
9 min read
The Problem Solvers
APC
Article
The Problem Solvers
Sep 5, 2022
I do worry about govt data collection, in particular the US FBI, even though I’m Australian it scares the heck out of me. They aren’t just spying on people, they act on it, too. Thousands of arrests are made every year due to the FBI or other alphabe
5 min read
“You Can’t Create A Set Of Rules And Expect Them To Work Forever”
PC Pro Magazine
Article
“You Can’t Create A Set Of Rules And Expect Them To Work Forever”
Dec 5, 2024
2024 marked a turning point in personal computing. For now, I’m going to ignore the corporate and business space, which Intel-based PCs still dominate, and will continue to do so for the next five years or more. The home market is where change is rus
8 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
How Apple Sweats The Security Details – And Sometimes Gets It Wrong
Macworld UK
Article
How Apple Sweats The Security Details – And Sometimes Gets It Wrong
Jan 15, 2021
3 min read
Opinion
Linux Format
Article
Opinion
Aug 20, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “Think about the personal and confidential information in your office suite documents; it’s essential your office suite respects user privacy. LibreOffice does not ask y
3 min read
“Reputations Are Going To Be Staked On How ‘The Computer’ Goes About Making Decisions”
PC Pro Magazine
Article
“Reputations Are Going To Be Staked On How ‘The Computer’ Goes About Making Decisions”
Jun 10, 2021
We live lonely lives here sometimes. The type of critic who sees patterns in everything loves to tell me that I’m in the pockets of PC Pro advertisers, and that we all toe the party line – most recently over systems such as the Raspberry Pi 400 or an
6 min read
“I Made The Mistake Of Telling The Truth And Got An Inbox Full Of Emails From Conspiracy Crazies”
PC Pro Magazine
Article
“I Made The Mistake Of Telling The Truth And Got An Inbox Full Of Emails From Conspiracy Crazies”
Sep 5, 2024
7 min read
“I Made The Mistake Of Telling The Truth And Got An Inbox Full Of Emails From Conspiracy Crazies”
PC Pro Magazine
Article
“I Made The Mistake Of Telling The Truth And Got An Inbox Full Of Emails From Conspiracy Crazies”
Sep 5, 2024
7 min read

Related categories

Skip carousel

Reviews for Streaming Data

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Streaming Data - Andrew Psaltis

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

Special Sales Department

Manning Publications Co.

20 Baldwin Road

PO Box 761

Shelter Island, NY 11964

Email:

[email protected]

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.

Development editor: Karen Miller

Technical development editor: Gregor Zurowski

Project editor: Janet Vail

Copyeditor: Corbin Collins

Proofreader: Elizabeth Martin

Technical proofreader: Al Krinker

Typesetter: Dennis Dalinnik

Cover designer: Marija Tudor

ISBN: 9781617292286

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 – EBM – 22 21 20 19 18 17

Brief Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book

1. A new holistic approach

Chapter 1. Introducing streaming data

Chapter 2. Getting data from clients: data ingestion

Chapter 3. Transporting the data from collection tier: decoupling the data pipeline

Chapter 4. Analyzing streaming data

Chapter 5. Algorithms for data analysis

Chapter 6. Storing the analyzed or collected data

Chapter 7. Making the data available

Chapter 8. Consumer device capabilities and limitations accessing the data

2. Taking it real world

Chapter 9. Analyzing Meetup RSVPs in real time

The streaming data architectural blueprint

Index

List of Figures

List of Tables

List of Listings

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book

1. A new holistic approach

Chapter 1. Introducing streaming data

1.1. What is a real-time system?

1.2. Differences between real-time and streaming systems

1.3. The architectural blueprint

1.4. Security for streaming systems

1.5. How do we scale?

1.6. Summary

Chapter 2. Getting data from clients: data ingestion

2.1. Common interaction patterns

2.1.1. Request/response pattern

2.1.2. Request/acknowledge pattern

2.1.3. Publish/subscribe pattern

2.1.4. One-way pattern

2.1.5. Stream pattern

2.2. Scaling the interaction patterns

2.2.1. Request/response optional pattern

2.2.2. Scaling the stream pattern

2.3. Fault tolerance

2.3.1. Receiver-based message logging

2.3.2. Sender-based message logging

2.3.3. Hybrid message logging

2.4. A dose of reality

2.5. Summary

Chapter 3. Transporting the data from collection tier: decoupling the data pipeline

3.1. Why we need a message queuing tier

3.2. Core concepts

3.2.1. The producer, the broker, and the consumer

3.2.2. Isolating producers from consumers

3.2.3. Durable messaging

3.2.4. Message delivery semantics

3.3. Security

3.4. Fault tolerance

3.5. Applying the core concepts to business problems

Finance: fraud detection

Internet of Things: a tweeting Coke machine

E-commerce: product recommendations

3.6. Summary

Chapter 4. Analyzing streaming data

4.1. Understanding in-flight data analysis

4.2. Distributed stream-processing architecture

A generalized architecture

Apache Spark Streaming

Apache Storm

Apache Flink

Apache Samza

4.3. Key features of stream-processing frameworks

4.3.1. Message delivery semantics

State management

Fault tolerance

4.4. Summary

Chapter 5. Algorithms for data analysis

5.1. Accepting constraints and relaxing

5.2. Thinking about time

Stream time vs. event time

Windows of time

5.2.1. Sliding window

Example usage

Framework support

5.2.2. Tumbling window

Example use

Framework support

5.3. Summarization techniques

5.3.1. Random sampling

5.3.2. Counting distinct elements

5.3.3. Frequency

5.3.4. Membership

5.4. Summary

Chapter 6. Storing the analyzed or collected data

6.1. When you need long-term storage

Direct writing

Indirect writing

6.2. Keeping it in-memory

6.2.1. Embedded in-memory/flash-optimized

6.2.2. Caching system

Read-through

Refresh-ahead

Write-through

Write-around

Write-back (write-behind)

6.2.3. In-memory database and in-memory data grid

6.3. Use case exercises

6.3.1. In-session personalization

Embedded in-memory/flash-optimized

Caching system

IMDB or IMDG

Taking it to the next level

6.3.2. Next-generation energy company

6.4. Summary

Chapter 7. Making the data available

7.1. Communications patterns

7.1.1. Data Sync

Benefits

Drawbacks

7.1.2. Remote Method Invocation and Remote Procedure Call

Benefits

Drawbacks

7.1.3. Simple Messaging

Benefits

Drawbacks

7.1.4. Publish-Subscribe

Benefits

Drawbacks

7.2. Protocols to use to send data to the client

7.2.1. Webhooks

7.2.2. HTTP Long Polling

7.2.3. Server-sent events

7.2.4. WebSockets

7.3. Filtering the stream

7.3.1. Where to filter

7.3.2. Static vs. dynamic filtering

7.4. Use case: building a Meetup RSVP streaming API

7.5. Summary

Chapter 8. Consumer device capabilities and limitations accessing the data

8.1. The core concepts

UI/end-user application

Integration with third-party/stream processors

8.1.1. Reading fast enough

Third-party streaming API

Your streaming API

8.1.2. Maintaining state

8.1.3. Mitigating data loss

8.1.4. Exactly-once processing

8.2. Making it real: SuperMediaMarkets

8.3. Introducing the web client

8.3.1. Integrating with the streaming API service

8.4. The move toward a query language

8.5. Summary

2. Taking it real world

Chapter 9. Analyzing Meetup RSVPs in real time

9.1. The collection tier

9.1.1. Collection service data flow

9.2. Message queuing tier

9.2.1. Installing and configuring Kafka

9.2.2. Integrating the collection service and Kafka

9.3. Analysis tier

9.3.1. Installing Storm and preparing Kafka

9.3.2. Building the top n Storm topology

9.3.3. Integrating analysis

9.4. In-memory data store

9.5. Data access tier

9.5.1. Taking it to production

9.6. Summary

The streaming data architectural blueprint

Index

List of Figures

List of Tables

List of Listings

Preface

For as long as I can remember, I have been fascinated with speed as it relates to computing and am always trying to find a way to do something faster. In the late 1990s, when I spent most of my time writing software in C++, my favorite keyword was __asm, which means the following block of code is in assembly language, and I understood what was happening at the machine level. I worked on mobile software in the early 2000s and again the story was how could we sync data faster or make things run faster on the PalmPilots and Windows CE devices we were using? At the time we had huge (by that day’s standards, anyway) medical databases (around 25–50 MB in size) that required external cards on a PalmPilot to store and several applications that needed to provide interactive speed when searching and browsing the data.

As data volumes started to grow in the industries I was working in, I found myself at the perfect intersection of large data sets and speed to business insight. The data was growing in volume and being generated at faster and faster speeds, and business wanted answers to questions in shorter and shorter timeframes from the time data was being generated. To me, it was the perfect marriage: large data and a need for speed. Around 2001 I began to work on marketing analytics and e-commerce applications, where data was continuously being updated and we needed to provide insight into it in near real time. In 2009 I started working at Webtrends, where my love for speed and delivering insight at speed really matured. At Webtrends, analytics was our core business, and the idea of real-time analytics was just starting to catch the interest of our customers. The first project I worked on aimed to deliver key metrics in a dashboard within five minutes of a clickstream event happening anywhere in the world. At the time, that was pushing the envelope.

In 2011 I was part of an emerging products team. Our mission was to continue to push the idea of real-time analytics and try to disrupt our industry. After spending time researching, prototyping, and thinking through our next step, a perfect storm occurred. We had been looking at Apache Kafka, and then in September 2011 Apache Storm was open sourced. We immediately started to run like crazy with it. By winter we had early-adopter customers looking at what we were building. At that point we never looked back and set our sights on delivering on a Service Level Agreement (SLA) that was, in essence: From click to dashboard in three seconds or less, globally! After many months and a lot of work by what became a much larger team, we delivered on our promise and won the Digital Analytics New Technology of the Year award (www.digitalanalyticsassociation.org/awards2013). I was deeply involved in building and architecting all aspects of this solution, from the data collection to the initial UI (which was affectionately called Bare Bones, due to my lack of UI skills).

We continued our pursuit and began looking at Spark Streaming when it was still part of the Berkley AMPLab. Since those days I have continued to pursue building more and more streaming systems that deliver on the ultimate goal of delivering insights at the speed of thought. Today I continue to speak internationally on the topic and work with companies across the globe in designing, building, and solving streaming problems.

Even today I still see a widespread lack of understanding of all the pieces that go into building and delivering a streaming system. You can usually find references to pieces of the stack, but rarely do you find out how to think through the entire stack and understand each of the tiers.

It is therefore with great pleasure that I have tried in this book to share and distill this real-world experience and knowledge. My goal has been to provide a solid foundation from which you can build and explore a complete streaming system.

Acknowledgments

First, I want to thank my family for their support during the writing of this book. There were many weekends and nights of Sorry, I can’t help with the garden (or play lacrosse or go to the get-together)—I need to write. I’m sure that wasn’t easy for my children to hear; nor was it always easy for my wife to buffer and pick up my slack. Through all the highs and lows that go into this process their support never wavered and they remained a constant source of encouragement and inspiration. For this I owe a tremendous debt of gratitude to my wife and children; a simple thank you cannot express it enough.

Thanks to Karen, my development editor, for her endless patience, understanding, and willingness to always talk things through with me throughout this entire journey. To Robin, my acquisition editor, for believing in me, nurturing the idea of this book, and being a sounding board to make sure the train was staying on the tracks during some rough patches in the early days. To Bert, for his teachings on how to tell a story, how to find the right level of depth with a narrative, and pedagogical insight into the construction of a technical book. To my technical development editor Gregor, whose very thoughtful and insightful feedback helped craft this book into what it is today. Lastly, but certainly not least, thanks to the entire Manning team for the fantastic effort to finally get us to this point.

Thanks also to all the people who bought and read early versions of the manuscript through the MEAP early access program, to those who contributed to the Author Online forum, and to the countless reviewers for their invaluable feedback, including Andrew Gibson, Dr. Tobias Bürger, Jake McCrary, Rodrigo Abreu, Andy Keffalas, John Guthrie, Kosmas Chatzimichalis, Giuliano Bertoti, Carlos Curotto, Andy Kirsch, Douglas Duncan, Jeff Smith, and Sergio Fernández González, Jaromir D.B. Nemec, Jose Samonte, Jan Nonnen, Romit Singhai, Chris Allan, Jonathan Thoms, Steven Jenkins, Lee Gilbert, Amandeep Khurana, Charlie Gaines. Without all of you, this book wouldn’t be what it is today.

Many others contributed in various different ways. I can’t mention everyone by name because the acknowledgments would just roll on and on, but a big thank you goes out to everyone else who had a hand in helping make this possible!

About this Book

The world of real-time systems has been around for a long time; for many years real-time and/or streaming was solely the domain of hardware real-time systems. Those are systems where if an SLA isn’t met, there is potential loss of life. Over the last decade near-real-time systems have emerged and grown at an amazing rate. Everywhere you look you can find examples of data streaming: social media, games, smart cities, smart meters, your new washing machine, and the list goes on. Consider the following: Today if a byte of data were a gallon of water, an average home would be filled within 10 seconds; by the year 2020, it will only take 2 seconds. Making sense of and using such a deluge of data means building streaming systems.

Focusing on the big ideas of streaming and real-time data, the goals of this book are two-fold: The first objective is to teach you how to think about the entire pipeline so you’re equipped with the skills to not only build a streaming system but also understand the tradeoffs at every tier. Secondly, this book is meant to provide a solid launching point for you to delve deeper into each tier, as your business needs require or as your interest pulls you.

How to use this book

Although this book was designed to read from start to finish, each chapter provides enough information so that you can read and understood it on its own. Therefore if want to understand a particular tier, you should feel comfortable jumping straight to that chapter and then using what you learned there as your base for deeper exploration of the other chapters.

Who should read this book

This book is perfect for developers or architects and has been written to be easily accessible to technical managers and business decision makers—no prior experience with streaming or real-time data systems required. The only technical requirement this book makes is that you should feel comfortable reading Java. The source code is written in Java, as is the example code that accompanies chapter 9

Roadmap

The roadmap of this book is represented in figure 1. A synopsis of each chapter follows.

Figure 1. Architectural blueprint with chapter mappings

Chapter 1 introduces the architectural blueprint of the book, which tells you where we are in the pipeline and serves as a great map if you need to jump from tier to tier. After laying out this blueprint, chapter 1 defines a real-time system, explores the differences between real-time and in-the-moment systems, and briefly touches on the importance of security (which could be its own book).

Chapter 2 explores all aspects of collecting data for a streaming system, from the interaction patterns through scaling and fault-tolerance techniques. This chapter covers all the relevant aspects of the collection tier and prepares you to build a scalable and reliable tier.

Chapter 3 is all about how to decouple the data being collected from the data being analyzed by using a message queuing tier in the middle. You will learn why you need a message queuing tier, how to understand message durability and different message delivery semantics, and how to choose the right technology for your business problem.

Chapter 4 dives into the common architectural patterns of distributed stream-processing frameworks, covering topics such as what message delivery semantics mean for this tier, how state is commonly handled, and what fault tolerance is and why we need it.

Chapter 5 jumps from discussing architecture to querying a stream, the problems with time, and the four popular summarization techniques. If chapter 4 is the what for distributed stream-processing engines, chapter 5 is the how.

Chapter 6 discusses options for storing data in-memory during and post analysis. It doesn’t spend much time discussing disk-based long-term storage solutions because they’re often used out of band of a streaming analysis and don’t offer the performance of the in-memory stores.

Chapter 7 is where we start to discuss what to do with the data we have collected and analyzed. It talks about communications patterns and protocols used for sending data to a streaming client. Along the way we’ll find out how to match up our business requirements to the various protocols and how to choose the right one.

Chapter 8 explores concepts to keep in mind when building a streaming client. This is not a chapter on just building an HTML web app; it goes much deeper into lower-level things to consider when designing the client side of a streaming system.

Chapter 9 . . . at this point, if you have read all the way through, congrats! A lot of material is covered in the first eight chapters. Chapter 9 is where we make it all come to life. Here we build a complete streaming data pipeline and discuss taking our sample to production.

About the code

All the code shown in the final chapter of this book can be found in the sample source code that accompanies this book. You can download the sample code free of charge from the Manning website at www.manning.com/books/streaming-data. You may also find the code on GitHub at https://github.com/apsaltis/StreamingData-Book-Examples.

The sample code is structured as separate Maven projects, one for each of the tiers we walk through in chapter 9. Instructions for building and running the software are provided during the walkthrough in chapter 9.

All source code in listings or in the text is in a fixed-width font like this to separate it from ordinary text. In some listings, the code is annotated to point out the key concepts.

About the author

Andrew Psaltis is deeply entrenched in streaming systems and obsessed with delivering insight at the speed of thought. He spends most of his waking hours thinking about, writing about, and building streaming systems. He helps customers of all sizes build and/or fix complex streaming systems, speaks around the globe about streaming, and teaches others how to build streaming systems. When he’s not busy being busy, he’s spending time with his lovely wife, two kids, and watching as much lacrosse as possible.

Author Online

The purchase of Streaming Data includes free access to a private forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and other users. To access and subscribe to the forum, point your browser to www.manning.com/books/streaming-data. This page provides information on how to get on the forum once you’re registered, what kind of help is available, and the rules of conduct in the forum.

Manning’s commitment to our readers is to provide a venue where meaningful dialogue between individual readers and between readers and the author can take place. It’s not a commitment to any specific amount of participation on the part of the author, whose contribution to the book’s forum remains voluntary (and unpaid). We suggest you try asking him challenging questions, lest his interest stray!

The Author Online forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

About the cover illustration

The figure on the cover of Streaming Data is captioned Habit of a Moor of Morrocco in winter in 1695. The illustration is taken from Thomas Jefferys’ A Collection of the Dresses of Different Nations, Ancient and Modern (four volumes), London, published between 1757 and 1772. The title page states that these are hand-colored copperplate engravings, heightened with gum arabic. Thomas Jefferys (1719–1771) was called Geographer to King George III. He was an English cartographer who was the leading map supplier of his day. He engraved and printed maps for government and other official bodies and produced a wide range of commercial maps and atlases, especially of North America. His work as a mapmaker sparked an interest in local dress customs of the lands he surveyed and mapped, which are brilliantly displayed in this collection.

Fascination with faraway lands and travel for pleasure were relatively new phenomena in the late 18th century and collections such as this one were popular, introducing both the tourist as well as the armchair traveler to the inhabitants of other countries. The diversity of the drawings in Jefferys’ volumes speaks vividly of the uniqueness and individuality of the world’s nations some 200 years ago. Dress codes have changed since then and the diversity by region and country, so rich at the time, has faded away. It is now often hard to tell the inhabitant of one continent from another. Perhaps, trying to view it optimistically, we have traded a cultural and visual diversity for a more varied personal life. Or a more varied and interesting intellectual and technical life.

At a time when it is hard to tell one computer book from

Enjoying the preview?

Page 1 of 1

Streaming Data: Understanding the real-time pipeline

About this ebook

Andrew Psaltis

Related authors

Related to Streaming Data

Related ebooks

Designing Cloud Data Platforms

Grokking Streaming Systems: Real-time event processing

Cloud Native Patterns: Designing change-tolerant software

Serverless Architectures on AWS: With examples using AWS Lambda

Google Cloud Platform in Action

Go in Practice

Re-Engineering Legacy Software

Visualizing Graph Data

MLOps Engineering at Scale

Machine Learning Engineering in Action

Serverless Architectures on AWS, Second Edition

Infrastructure as Code, Patterns and Practices: With examples in Python and Terraform

Operations Anti-Patterns, DevOps Solutions

Xamarin in Action: Creating native cross-platform mobile apps

Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala

AI as a Service: Serverless machine learning with AWS

Graph-Powered Machine Learning

Kafka in Action

Data Engineering on Azure

AWS Lambda in Action: Event-driven serverless applications

Real-Time Big Data Analytics

Go Web Programming

Node.js in Practice

Amazon Web Services in Action

Implementing Cloud Design Patterns for AWS

Spring Boot in Action

Algorithms of the Intelligent Web

Irresistible APIs: Designing web APIs that developers will love

Spring Microservices in Action

Machine Learning Systems: Designs that scale

Computers For You

Storytelling with Data: Let's Practice!

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama's Books of 2019

Algorithms to Live By: The Computer Science of Human Decisions

Elon Musk

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Data Analytics for Beginners: Introduction to Data Analytics

Get Into UX: A foolproof guide to getting your first user experience job

Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work

Grokking Algorithms: An illustrated guide for programmers and other curious people

Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention

Good Code, Bad Code: Think like a software engineer

Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you

People Skills for Analytical Thinkers

The Alignment Problem: How Can Machines Learn Human Values?

Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis

Practical Data Analysis

The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology

Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit

Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning

Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI

Learning the Chess Openings

UX/UI Design Playbook

Blender 3D Basics Beginner's Guide Second Edition

ChatGPT

Related podcast episodes

Related articles

Related categories

Reviews for Streaming Data

What did you think?

Book preview

Streaming Data - Andrew Psaltis

Copyright

Brief Table of Contents

Table of Contents

Preface

Acknowledgments

About this Book