4 Jitney, Kafka at Airbnb

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

张振,K A F K A M E E T U P

Jitney,
Kafka at Airbnb
关于Airbnb爱彼迎

成⽴立于2008年8⽉月,爱彼迎总部
位于加利福尼亚州旧⾦金⼭山市。爱
彼迎是⼀一个值得信赖的社区型市
场,在这⾥里⼈人们可以通过⺴⽹网站、
⼿手机或平板电脑发布、发掘和预
订世界各地的独特房源。
Jitney ?!
a bus carrying passengers for
a low fare
Some Kafka Facts
• 1 Production cluster
• v0.8.2
• 90 “small” brokers, d2.2xlarge
• 70 topics
• Replication Factor of 3
• 5 Billions events / day
• IN: 80MB / second
• OUT: 1.5GB / second
• Network bound
• Super stable
Why Jitney?
Pick any metric
Standardization!
What Use Cases ?
Classic Message Bus

Jitney

Jitney Client
& Schemas

Monorail

MySQL MySQL Elasticsearch


• Decouple Services
• Standard Events

Message Bus • At-least once delivery


• Standard clients, for Java and Ruby
• Easy to use
• Conventions over configuration
• Site and image load times, OOM events
User Activity • Searches, requests, bookings, etc.
Logging • Experiment assignments

• Event data is critical for building data products


• Data ingestion should be reliable: timely and complete
• JSON events without schemas
• Easy to break events during evolution/code changes
• One topic overall for 800+ event types
Improper producer configs
Challenges •

• Lack of monitoring

• Lead to:
• Too many data outages, data loss incidents
• Lack of trust on data systems
Data Stability
A Year Ago

CEO dashboard and


Magical booking
dashboards were
regularly broken.
Data Stability
A Year Ago

Hi team,

This is partly a PSA to let you


ERF was unstable and
experimentation culture know ERF dashboard data
was weak hasn't been up to date/
accurate for several weeks
now. Do not rely on the ERF
dashboard for information
about your experiment.
Jitney Components
Jitney Components

Schema
Repository
Thrift Schema Repository

Why Thrift?
• Easy syntax
• Good performance in Ruby
• Ubiquitous
Advantages of schema repo?
• Great Catalyst for communication, documentation, etc
• it ships jar and gems

• Will developers hate you for this? no


• Standard Field in the event schema
• Managed Explicitly
• use Semantic Versioning:
1.0.0 = MODEL . REVISION . ADDITION

Schema MODEL is a change which breaks the rules of backward


compatibility.
Evolution
Example: changing the type of a field.
REVISION is a change which is backward compatible but not
forward compatible.
Example: adding a new field to a union type.
ADDITION is a change which is both backward compatible and
forward compatible.
Example: adding a new optional field.
Example of Thrift Event

because the event is your API


Jitney Components

Schema
Topic Repository
Repository
Topic Repository

• Declare all Jitney topics


• Aggregate all characteristics of a topic:
name
ordering (partitioning function)
whitelist of accepted schemas

• Great for documentation purposes


• DRY
Example of a Topic
Jitney Components

Schema
Topic Repository Clients
Repository
Jitney Clients

• Kafka clients are hard to use correctly


• it’s better with 0.9

• Committing offsets is tricky, someone will get it wrong


• even with 0.9

• Configuration is a mess
Jitney Clients

it provides:
• metrics reporting: github.com/airbnb/kafka-statsd-metrics2
• configuration for default clusters
• built-in support for Schema Repository and Topic Repository

Consumer:
• offset management to implement at-least once delivery
• polymorphic dispatching to event handler
Example of a Java Producer
Example of a Java Consumer
Jitney Components

Schema
Topic Repository Clients
Repository

HTTP Proxy
Jitney Components

Schema
Topic Repository Clients
Repository

Warehouse
HTTP Proxy
Integration
Data Ingestion Pipeline

• Stack: Jitney, Spark Streaming, HBase, HDFS


• Spark Streaming 1.5 with Kafka “direct” connect
• Process 1 minute batches
• Write to HBase after deserializing with the right schema
• Dump data to HDFS every hour (with dedup) and add a Hive partition
• But live data can be queried via “current” partition
Data Ingestion Pipeline
end to end
Audit

4 3 2 1

4 2 1
Event Schema for Audit
Metadata
How is Jitney used in the org?
Use cases
User activity Cache
ingestion currently invalidation
powered

DB change Experimentation
ingestion

Payment
processing via
pub/sub
Key take aways

1 2 3

Huge Advantage
Standardization! Auditing Pipeline
for the organization
Join Airbnb

Learn More
https://www.airbnb.com/careers/locations/beijing-china

You might also like