Tuning Kafka Pipelines
Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance.
This presenter will begin by introducing the basic concepts of Kafka including, producer, consumer, and the broker. Kafka pipelines will be discussed next and the role of Mirror-maker in creating global data pipelines that span multiple datacenters. The presenter will then discuss select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline.
In the second part the presenter will discuss anecdotes based on the real-world experience in running Kafka at LinkedIn. Specifically, lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes will be discussed.
No prior knowledge of Kafka is necessary however familiarity with publish-subscribe, big data technologies, sharding/partitioning will come in handy.
- Not Interested