Real-Time Sensor Data at Scale: MQTT, Kafka, and Modern OT/IT Integration
From connected cars to smart grids, real-time sensor data requires robust streaming infrastructure. We examine how Kafka and MQTT work together for industrial IoT.
From seismic sensors scattered across remote Australia to connected cars streaming telematics, real-time sensor data at scale requires robust streaming infrastructure. The answer isn't a single technology but a carefully designed combination: MQTT for edge communication, Apache Kafka for enterprise integration, and specialized processing for domain-specific analytics.
MQTT and Kafka: Complementary, Not Competing
MQTT was designed for constrained devices and unreliable networks. It's lightweight (a client can run on an 8-bit microcontroller), supports QoS levels for guaranteed delivery, and handles the reality of sensors that go offline and reconnect. Kafka, conversely, was built for enterprise data streaming—high throughput, durability, exactly-once semantics, and integration with the broader data platform.
Since Kafka was not built for IoT communication at the edge, the combination of Apache Kafka and MQTT together are a match made in heaven for building scalable, reliable, and secure IoT infrastructures.
The pros and cons map to each other complementarily. MQTT handles the last mile to devices; Kafka handles everything after. Trying to use Kafka clients directly on constrained devices is impractical—they're resource-intensive and assume reliable connectivity that IoT environments can't guarantee.
The Integration Architecture
Typical IoT Streaming Stack
Edge Layer
Sensors publish to MQTT broker (EMQX, Mosquitto, HiveMQ)
Bridge Layer
MQTT-Kafka connector subscribes and produces to Kafka topics
Processing Layer
Kafka Streams or Apache Flink for real-time analytics
Storage Layer
Time-series database (TimescaleDB, InfluxDB) for hot data
Analytics Layer
Data warehouse (BigQuery, Snowflake) for historical analysis
The MQTT broker handles the complexity of device connections—authentication, session state, last-will messages for detecting disconnected sensors. The Kafka connector creates a clean interface between edge chaos and enterprise order.
Real-World Implementations
The pattern is proven across industries:
Production Deployments
- Audi — Connected car infrastructure for real-time ingestion and analysis
- Deutsche Bahn — Real-time train information systems across Germany
- E.ON — IoT cloud platform for smart homes and energy grids
- Bosch Power Tools — Real-time alerting dashboards for industrial equipment
- Severstal — Edge analytics for predictive maintenance in steel production
Quarterhill's intelligent traffic system for tolling demonstrates the transformative potential. Dynamic pricing—adjusting toll rates based on real-time congestion—is only possible with data streaming that maintains sub-second latency from sensor to decision.
Modern OT/IT Integration
Industrial IoT is undergoing an architectural shift. The traditional OT (Operational Technology) middleware—vendor-locked, polling-based systems—is giving way to event-driven architectures built on Kafka, MQTT, and OPC-UA.
This isn't just a technology upgrade; it's an integration pattern. Kafka serves as the central event backbone, MQTT enables lightweight device communication, and OPC-UA ensures secure industrial data exchange. Together, they allow organizations to scale dynamically without vendor lock-in.
Edge-Cloud Synchronization
Kafka's advantage in IoT isn't just throughput—it's resilience. The platform handles network partitions gracefully. If connectivity between edge and cloud is interrupted, Kafka's storage semantics guarantee that records aren't lost and can be delivered once connection is reestablished.
For our seismic sensor networks, where remote stations may lose satellite connectivity during storms, this durability is essential. Sensors buffer locally, the edge gateway buffers to disk, and when connectivity returns, everything flows through to the central platform without data loss.
EMQX: The Scalable MQTT Broker
# EMQX Kafka integration configuration
bridges:
kafka:
servers: "kafka:9092"
topic: sensor_data
message_key: "${clientid}"
value_encoder: json
ssl:
enable: true
cacertfile: /etc/emqx/certs/ca.crtEMQX has emerged as the enterprise MQTT broker of choice, offering native Kafka integration without custom connectors. Its clustering capability handles millions of concurrent connections, making it suitable for large-scale IoT deployments.
Challenges and Limitations
Kafka has limitations for IoT-specific patterns. Managing a large number of topics (common when each device has multiple data streams) creates overhead. The recommendation is to design topic hierarchies carefully—perhaps one topic per sensor type rather than per device.
The other challenge is cost. Kafka clusters aren't cheap to operate, especially in managed cloud offerings. For smaller deployments, evaluating whether the full Kafka ecosystem is necessary versus simpler alternatives (Redis Streams, NATS) is worthwhile.
Our Perspective
Having built sensor data pipelines for Geoscience Australia's earthquake monitoring network, the MQTT-Kafka pattern is now our default architecture for IoT projects. The separation of concerns is clean: MQTT handles the messiness of device communication; Kafka provides the enterprise integration layer.
But I'd caution against over-architecting. For projects with fewer than a hundred sensors and straightforward analytics requirements, simpler stacks (MQTT direct to TimescaleDB, for example) may be more appropriate. Kafka shines at scale; at smaller scales, it's overhead.
References & Further Reading
Kafka for IoT: Key Capabilities and Top Use Cases in 2025
Instaclustr guide to Kafka for IoT applications
https://www.instaclustr.com/education/apache-kafka/kafka-for-iot-4-key-capabilities-and-top-use-cases-in-2025/
IoT and Event Streaming at Scale with Kafka & MQTT
Confluent's architecture guide for IoT streaming
https://www.confluent.io/blog/iot-with-kafka-connect-mqtt-and-rest-proxy/
MQTT to Kafka: Benefits, Use Cases & Quick Guide
EMQX integration guide for MQTT-Kafka bridging
https://www.emqx.com/en/blog/mqtt-and-kafka
IoT (MQTT) and Data Streaming for Tolling Traffic System
Real-world case study of Kafka for intelligent transportation
https://www.kai-waehner.de/blog/2024/11/01/iot-and-data-streaming-with-kafka-for-a-tolling-traffic-system-with-dynamic-pricing/
Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures
Kai Waehner's analysis of industrial IoT evolution
https://www.kai-waehner.de/blog/2025/03/17/modernizing-ot-middleware-the-shift-to-open-industrial-iot-architectures-with-data-streaming/