Streamsets: By: Avleen Kaur
Streamsets: By: Avleen Kaur
Streamsets: By: Avleen Kaur
It is a system for creating, executing and operating continuous dataflows that connect
various parts of data infrastructure.
Data Collector
Control Hub
Data Protector
Transformer
Provides the crucial connection between the hops in the stream of data that is
JDBC Property Description
JDBC Connection String Connection string used to connect to the database.
JDBC Property Description
Property Description
Name Stage name
Description Optional description.
Produce events Generates event records when events occur. Use for event
handling.
On Record Error •Error record handling for the stage:Discard - Discards the
record.
•Send to Error - Sends the record to the pipeline for error
handling.
•Stop Pipeline - Stops the pipeline.
Zero downtime for upgrades, direct integration with big data governance tools
Flexible deployment, 100% in-memory operation for high throughput and low latency
Simplified problem diagnosis as historical metrics are used for comparing dataflow
performance over time
Integrates with StreamSets Dataflow Performance Manager for live dataflow metrics and SLA
enforcement
Provides software as a service to discover, secure and govern movement of sensitive data
as it arrives from a source or moves between compute platforms
Used to read data from an edge device or to receive data from another pipeline and then act
on that data to control an edge device
Written in Go, SDC Edge provides a single solution across a broad range of edge
hardware platforms
Can act as a simple data forwarder or can be configured to perform transformations and
analytics on edge
Less than 5MB installation footprint, Low memory and CPU (1-2%) utilization
Manage real-time pipeline and dataflow topology performance with StreamSets DPM
Enables users to solve their core business problems by abstracting away the complexity of
operating the Spark cluster
Can execute both batch or streaming operations, mixing and matching as required
Easy-to-use interface and rich tools democratize the process of data transformation
lifecycle, including how to build, execute and operate enterprise dataflows at scale.
Developers can design batch and streaming pipelines with a minimum of code, while
operators can aggregate dataflows into topologies for centralized provisioning and
performance management.