Infrastructure 15 November 2024 12 min read

Real-Time Sensor Data at Scale: MQTT, Kafka, and Modern OT/IT Integration

From connected cars to smart grids, real-time sensor data requires robust streaming infrastructure. We examine how Kafka and MQTT work together for industrial IoT.

KafkaMQTTIoTStreamingSensors
Industrial sensors and monitoring equipment
ThisisEngineering RAEng on Unsplash

From seismic sensors scattered across remote Australia to connected cars streaming telematics, real-time sensor data at scale requires robust streaming infrastructure. The answer isn't a single technology but a carefully designed combination: MQTT for edge communication, Apache Kafka for enterprise integration, and specialized processing for domain-specific analytics.

MQTT and Kafka: Complementary, Not Competing

MQTT was designed for constrained devices and unreliable networks. It's lightweight (a client can run on an 8-bit microcontroller), supports QoS levels for guaranteed delivery, and handles the reality of sensors that go offline and reconnect. Kafka, conversely, was built for enterprise data streaming—high throughput, durability, exactly-once semantics, and integration with the broader data platform.

Since Kafka was not built for IoT communication at the edge, the combination of Apache Kafka and MQTT together are a match made in heaven for building scalable, reliable, and secure IoT infrastructures.
Confluent IoT Architecture Guide

The pros and cons map to each other complementarily. MQTT handles the last mile to devices; Kafka handles everything after. Trying to use Kafka clients directly on constrained devices is impractical—they're resource-intensive and assume reliable connectivity that IoT environments can't guarantee.

The Integration Architecture

Typical IoT Streaming Stack

Edge Layer

Sensors publish to MQTT broker (EMQX, Mosquitto, HiveMQ)

Bridge Layer

MQTT-Kafka connector subscribes and produces to Kafka topics

Processing Layer

Kafka Streams or Apache Flink for real-time analytics

Storage Layer

Time-series database (TimescaleDB, InfluxDB) for hot data

Analytics Layer

Data warehouse (BigQuery, Snowflake) for historical analysis

The MQTT broker handles the complexity of device connections—authentication, session state, last-will messages for detecting disconnected sensors. The Kafka connector creates a clean interface between edge chaos and enterprise order.

Real-World Implementations

The pattern is proven across industries:

Production Deployments

  • Audi — Connected car infrastructure for real-time ingestion and analysis
  • Deutsche Bahn — Real-time train information systems across Germany
  • E.ON — IoT cloud platform for smart homes and energy grids
  • Bosch Power Tools — Real-time alerting dashboards for industrial equipment
  • Severstal — Edge analytics for predictive maintenance in steel production

Quarterhill's intelligent traffic system for tolling demonstrates the transformative potential. Dynamic pricing—adjusting toll rates based on real-time congestion—is only possible with data streaming that maintains sub-second latency from sensor to decision.

Modern OT/IT Integration

Industrial IoT is undergoing an architectural shift. The traditional OT (Operational Technology) middleware—vendor-locked, polling-based systems—is giving way to event-driven architectures built on Kafka, MQTT, and OPC-UA.

This isn't just a technology upgrade; it's an integration pattern. Kafka serves as the central event backbone, MQTT enables lightweight device communication, and OPC-UA ensures secure industrial data exchange. Together, they allow organizations to scale dynamically without vendor lock-in.

Edge-Cloud Synchronization

Kafka's advantage in IoT isn't just throughput—it's resilience. The platform handles network partitions gracefully. If connectivity between edge and cloud is interrupted, Kafka's storage semantics guarantee that records aren't lost and can be delivered once connection is reestablished.

For our seismic sensor networks, where remote stations may lose satellite connectivity during storms, this durability is essential. Sensors buffer locally, the edge gateway buffers to disk, and when connectivity returns, everything flows through to the central platform without data loss.

EMQX: The Scalable MQTT Broker

emqx-kafka-bridge.yaml
# EMQX Kafka integration configuration
bridges:
  kafka:
    servers: "kafka:9092"
    topic: sensor_data
    message_key: "${clientid}"
    value_encoder: json
    ssl:
      enable: true
      cacertfile: /etc/emqx/certs/ca.crt

EMQX has emerged as the enterprise MQTT broker of choice, offering native Kafka integration without custom connectors. Its clustering capability handles millions of concurrent connections, making it suitable for large-scale IoT deployments.

Challenges and Limitations

Kafka has limitations for IoT-specific patterns. Managing a large number of topics (common when each device has multiple data streams) creates overhead. The recommendation is to design topic hierarchies carefully—perhaps one topic per sensor type rather than per device.

The other challenge is cost. Kafka clusters aren't cheap to operate, especially in managed cloud offerings. For smaller deployments, evaluating whether the full Kafka ecosystem is necessary versus simpler alternatives (Redis Streams, NATS) is worthwhile.

Our Perspective

Having built sensor data pipelines for Geoscience Australia's earthquake monitoring network, the MQTT-Kafka pattern is now our default architecture for IoT projects. The separation of concerns is clean: MQTT handles the messiness of device communication; Kafka provides the enterprise integration layer.

But I'd caution against over-architecting. For projects with fewer than a hundred sensors and straightforward analytics requirements, simpler stacks (MQTT direct to TimescaleDB, for example) may be more appropriate. Kafka shines at scale; at smaller scales, it's overhead.

Tell us about your project

Our Offices

  • Canberra
    ACT, Australia