ADR-0008: Kafka Event Bus Integration¶
Status¶
Accepted
Date¶
2025-12-08
Context¶
The eventsource library already supports two distributed event bus implementations:
- RedisEventBus: Using Redis Streams for event distribution with consumer groups
- RabbitMQEventBus: Using AMQP 0-9-1 for message-based event distribution
Users have requested Apache Kafka support for scenarios requiring:
- Higher throughput (10,000+ events/second)
- Long-term event retention (weeks/months vs. Redis's typical hours/days)
- Integration with existing Kafka infrastructure in enterprise environments
- Cross-datacenter replication using Kafka's built-in mirroring
- Ecosystem integration with Kafka Connect, Kafka Streams, and KSQL
The challenge is implementing Kafka support while maintaining consistency with the existing EventBus interface and established patterns from Redis and RabbitMQ implementations.
Decision¶
We will implement a KafkaEventBus class that:
- Implements the EventBus interface exactly as RedisEventBus and RabbitMQEventBus do
- Uses aiokafka as the async Kafka client library
- Follows established patterns from existing implementations for configuration, statistics, error handling, and lifecycle management
- Provides Kafka-specific features including partition-based ordering, consumer groups, and DLQ with replay capability
Key Design Decisions¶
1. Topic Naming Convention¶
Topics follow the pattern {topic_prefix}.stream for the main topic and {topic_prefix}.stream.dlq for the dead letter queue. This mirrors the Redis stream naming convention.
2. Partitioning Strategy¶
Events are partitioned by aggregate_id to ensure ordering within an aggregate while allowing parallel processing across aggregates. This is a critical design choice that maintains event sourcing consistency guarantees.
3. Consumer Group Management¶
Consumer groups are managed via Kafka's built-in consumer group protocol, similar to Redis consumer groups but with Kafka-native partition assignment and rebalancing.
4. Dead Letter Queue¶
Failed messages (after configurable retries) are sent to a separate DLQ topic with metadata headers including: - Original topic - Error message - Retry count - Timestamp
A replay_dlq_message method allows republishing DLQ messages to the main topic.
5. Security¶
Full support for Kafka security protocols: - PLAINTEXT (development) - SSL (TLS encryption) - SASL_PLAINTEXT (authentication without encryption) - SASL_SSL (authentication with encryption) - mTLS (mutual TLS)
6. OpenTelemetry Integration¶
Tracing spans are created for publish and consume operations, with trace context propagated through Kafka message headers.
Consequences¶
Positive¶
- Enterprise adoption: Organizations with existing Kafka infrastructure can now use eventsource-py
- High throughput: Kafka's architecture enables significantly higher event throughput than Redis or RabbitMQ
- Long-term retention: Events can be retained for weeks/months for replay and audit purposes
- Consistent API: Users can switch between Redis, RabbitMQ, and Kafka with minimal code changes
- Ecosystem integration: Access to Kafka Connect, Kafka Streams, KSQL, and other Kafka tools
Negative¶
- Operational complexity: Kafka requires more operational expertise than Redis
- Topic management: Topics should be pre-created in production (unlike Redis streams)
- Latency: Kafka optimizes for throughput, not latency; sub-millisecond requirements may be better served by Redis
- Dependency size: aiokafka adds approximately 500KB to the package
Neutral¶
- Optional dependency: Kafka support is optional via
pip install eventsource-py[kafka] - Learning curve: Users familiar with Redis/RabbitMQ patterns will need to learn Kafka-specific concepts (partitions, offsets, rebalancing)
Alternatives Considered¶
1. confluent-kafka-python¶
Pros: - Official Confluent Python client - Full feature parity with librdkafka - Schema Registry integration
Cons: - Synchronous API requiring complex async wrappers - Larger dependency footprint - More complex installation (requires librdkafka)
Decision: Rejected in favor of aiokafka for its native async support and simpler installation.
2. kafka-python¶
Pros: - Pure Python implementation - No native dependencies
Cons: - Synchronous only - Less actively maintained - Missing some newer Kafka features
Decision: Rejected due to lack of async support.
3. Multiple Topic Pattern¶
Alternative: Create separate topics per event type instead of a single stream topic.
Pros: - More granular consumer subscription - Better alignment with Kafka conventions
Cons: - Topic proliferation - Deviates from Redis/RabbitMQ patterns - Complicates subscriber management
Decision: Rejected to maintain consistency with existing implementations.
Implementation Notes¶
Test Coverage¶
The implementation includes: - 51 unit tests covering all configuration options, subscription patterns, and error handling - 31 integration tests using testcontainers-python with a real Kafka broker
Configuration Dataclass¶
@dataclass
class KafkaEventBusConfig:
bootstrap_servers: str = "localhost:9092"
topic_prefix: str = "events"
consumer_group: str = "default"
consumer_name: str | None = None
acks: str = "all"
compression_type: str | None = "gzip"
# ... additional fields
Statistics Dataclass¶
@dataclass
class KafkaEventBusStats:
events_published: int = 0
events_consumed: int = 0
events_processed_success: int = 0
events_processed_failed: int = 0
messages_sent_to_dlq: int = 0
handler_errors: int = 0
reconnections: int = 0
rebalance_count: int = 0