My Learnings - Kafka as a Platform

The following are the my learning for the day :

Link : https://www.youtube.com/watch?v=y4B3rLbXIAY&t=2s

Learnt about Avro format
Implemented Sliding Timewindows in Apache Beam
CAP - Tradeoffs between consistency and accuracy is pretty much acknowledged by many in the Industry
Cassandra - Use it when you need to work on large database where you cannot solve by merely read replication
Cassandra uses a hashing mechanism to distribute load across cluster
Cassandra has the leader - follower architecture for a partition
I have still not understood the internals of Cassandra - Is it a pure peer to peer database ?
Kafka
- Messaging
- Partitioned Messages
- Leader-Follower architecture
- Kafka Connect API
- Kafka SQL API called KSQL that can be used to query Kafka
- Replace the old way of doing ETL with Stream at the center
- NY Times stores all its data on Kafka since 1970
I think getting connection to ERT is a wonderful way to explore real time streaming analytics
- Should make the most of it
Some of the projects that I can think of
- Pump ERT data in to Kafka and then used KSQL to fire SQL queries
- Pump ERT to pubsub and then move the data to GCP
Tim Berglund: Kafka as a Platform: the Ecosystem from the Ground Up
- Kafka is a log of events
- In Kafka - Topic is a persistent log of events - Events are key value pairs
- Logs have strict ordering
- Constant time reads and writes in Kafka
- What separates Kafka from other is the idea of Partition
- Split the partition in to pieces
- Run key in to hashing algo and based on the output, the data goes in to a specific partition
- Key selection becomes a data modeling concern
- Ordering is within a partition only
- Replication - Key part of Kafka
- Producers
- Consumers
- Reads and Write to the leader partition
- Schema registry
Job tracker and task tracker in the context of Hadoop ecosystem
Name Node and Job Node
You should spend atleast thrice amount of time on output as much as you would do on the input
Consistent Hashing - That’s how Cassandra works and makes it a distributed database

Contents