🧠 Architecture Design Choices
This page provides a behind-the-scenes look at the technical decisions made while building Speech Coach — what technologies were chosen and why.
💡 Why FastAPI + Redis Stream + Worker?
I previously worked with Celery, but found it heavy and configuration-intensive for lean prototypes. Instead, I chose FastStream with Redis Streams, which felt like a better fit for a lightweight, real-time system.
Redis Streams preserve message history (unlike pub/sub), similar to Kafka, but with far lower overhead — perfect for business-layer logic.
I considered using Redpanda for business messaging too, but decided to separate responsibilities:
- Redis Streams handle core task distribution
- Redpanda handles analytics pipelines
Redis will also be used for caching — for instance, mapping telegram_id → user_id
, so all internal logic uses a unified user ID. This abstraction allows me to swap Telegram with WhatsApp, Discord, etc. in the future.
🔁 Why Separate FastAPI and Worker?
The architecture is split for performance and scalability:
- FastAPI handles routing and distributes tasks — think of it as the "message switchboard"
- Workers process audio and generate feedback
This keeps the API responsive and allows independent scaling of workers — even on separate machines. It also reduces coupling and simplifies deployment.
📤 Why Redis Streams Instead of Kafka?
Kafka is powerful, but overkill for this project. Redpanda is a great Kafka alternative — faster, simpler, C++-based — but even that felt too much for core messaging.
Redis is already in the stack for caching and its Streams API is a natural fit for low-latency queuing.
📦 Why MinIO for Audio?
MinIO is a self-hosted, S3-compatible solution — fast, cheap, local. It avoids Internet upload delays during audio ingestion, and files can be replicated to S3 later as a cold backup.
This hybrid gives both performance and portability — with full control.
📊 Why Bytewax for Analytics?
Bytewax was chosen for two reasons:
- It's built in Python, just like the rest of the backend
- Analysts can prototype in notebooks, then deploy to production with minimal changes
Other options (like Flink) required Scala/Java and didn't justify the complexity for a Python-native stack.