Follow us on

Redefining the Real-Time Drone data pipeline at TSAW

Technology

by Sudhanshu Mishra

08 April 2025
img

At TSAW Drones, every second in the air counts—and every data point matters. Drones, especially in autonomous or semi-autonomous missions, generate a constant stream of telemetry: position, velocity, altitude, battery status, signal quality, system faults, and payload-specific information. This data isn't just useful for retrospective analysis; it’s critical in the moment. Pilots rely on it for situational awareness. Ground systems use it to enforce geofencing, monitor airspace, and make safety decisions. Our infrastructure must handle this stream of high-frequency, real-time data without lag, loss, or downtime.

In our early systems, we approached this with direct WebSocket-based communication between each drone and the backend. It seemed natural—WebSockets offered low-latency, persistent communication, and let us stream telemetry to our dashboards as it was emitted. But as our operations scaled, so did the complexity. Maintaining thousands of concurrent WebSocket sessions over mobile networks introduced reliability issues. Intermittent drops due to signal fluctuations caused data gaps. Each drone’s stream was tightly coupled with the processing and visualization stack, so any failure downstream risked breaking the full pipeline. There was no clean separation of concerns, no inherent message durability, and recovery mechanisms were brittle.

We needed a system that didn’t just work—it had to scale, self-heal, and guarantee delivery without compromising latency. That’s when we redesigned the entire streaming architecture around Apache Kafka.

The Shift: Kafka as the Real-Time Data Bus

Kafka gave us exactly what the old system lacked: a decoupled, durable, and horizontally scalable event backbone. Instead of streaming telemetry directly to processing engines or pilot dashboards, each drone now pushes structured telemetry packets to a Kafka topic. Each packet contains GPS data, altitude, drone ID, timestamp, and contextual metadata encoded in a compact binary or JSON format.

This change transformed the architecture. Telemetry ingestion became an append-only log, resilient to downstream failures. Kafka’s partitioning model allowed us to scale across fleets by routing drone streams to partitions by ID or region. If a service reading from Kafka goes down, the drone continues publishing—nothing is lost. When the service comes back up, it simply resumes from the last committed offset.

Processing at Scale: Spark Structured Streaming

Raw telemetry on its own isn’t always useful—it needs to be validated, ordered, enriched, and routed. For this, we rely on Apache Spark Structured Streaming. Spark consumers pick up events from Kafka in near real time, enforce schemas, correct timestamp skew, and enrich events with mission metadata or external flags like no-fly zones or fleet ownership tags.

Spark allows us to define end-to-end workflows using familiar data frame APIs, but executed continuously over streaming data. It gives us strong guarantees on processing, supports exactly-once semantics, and integrates well with our internal alerting and notification systems. This is the layer where mission logic can be embedded, flight anomalies can be flagged, and data is fanned out to multiple storage systems.

Streaming to the Pilot: DCIS Real-Time Interface

Once processed, telemetry is streamed to our internal DCIS (Drone Cloud Intelligence System) platform, which provides pilots and operations teams with a live dashboard of their fleet. Unlike our initial architecture where the drone was directly connected to the UI, we now fan out data to DCIS from backend stream relays subscribed to Spark outputs.

Each DCIS session initiates a secure WebSocket handshake and subscribes only to the drones the operator is authorized to monitor. Thanks to Kafka and Spark, data arrives on the pilot's dashboard within ~150–200 milliseconds of being emitted by the drone—even under peak load. The separation of ingestion, processing, and delivery has made the system both fast and resilient.

Storage and Historical Access

Telemetry isn't only consumed in real time—it must also be stored for auditing, analytics, and replay. We store structured telemetry in two layers: a fast time-series database for recent missions and a long-term archival system in Amazon S3 for historical retention.

The time-series DB supports fast queries, dashboard filtering, and incident review. S3, on the other hand, holds partitioned, encrypted mission logs grouped by drone ID and flight window. This storage model supports ML training on flight data, enables regulatory compliance, and gives our analytics teams access to months of high-fidelity telemetry without bloating operational storage costs.

What Changed: From Fragile Pipes to Reliable Infrastructure

The transition from a WebSocket-based setup to a Kafka-Spark-backed pipeline fundamentally improved our system. We moved from a fragile, tightly coupled architecture to an event-driven model that is resilient to failures, easily scalable across thousands of drones, and flexible enough to support multiple consumers independently. Most importantly, it ensures durable, lossless data transmission—even in the face of network or processing disruptions.

Most importantly, this system has held up in the real world. We’ve streamed telemetry from thousands of drones, both in live missions and simulated load environments, without recording a single incident of pipeline downtime. Operators have access to live, accurate information at all times, and our backend teams have complete visibility into the system’s performance and state.

Conclusion

What started as a reactive system tied closely to UI needs has become a mature, production-grade streaming platform. The use of Kafka and Spark has given us the confidence to scale, and the flexibility to innovate. As TSAW continues to push forward in autonomous operations and scaled drone logistics, this backbone will serve not just as a telemetry pipeline—but as the real-time nervous system that connects our drones, operators, and cloud intelligence in a single, seamless loop.

Related Blogs

World Image
Hero Image