Built the computer vision and ML stack behind an industrial drone analytics platform for construction, mining, and energy. A near-real-time activity recognition pipeline fuses YOLO aerial detection, LSTM temporal modeling, and DeepSORT multi-object tracking to classify what equipment and crews are actually doing — loading, hauling, dumping, idling — directly from moving drone video. Paired with 3D point-cloud segmentation for stockpile volumetrics and TensorRT-optimized edge inference for remote off-grid sites.
The Challenge
Construction and open-pit mining sites are dynamic — stakeholders need to know not just where equipment and crews are, but what they are actually doing. Extracting that temporal intelligence from a moving drone camera adds shifting perspectives, occlusion, and scale variance, and the systems running it have to live at the edge: remote off-grid locations where cloud processing is not an option.
Our Approach
We built the CV and ML stack end-to-end. A custom-trained YOLO detects excavators, haul trucks, dozers, and personnel from aerial perspectives; DeepSORT keeps stable identities across drone motion and crossing paths; an LSTM head classifies activities — loading, hauling, dumping, idling — over short temporal windows of bounding-box and feature sequences. A parallel 3D pipeline runs PointNet- and VoxelNet-based semantic segmentation over LiDAR and photogrammetry point clouds for ground separation and automated stockpile volumetrics. A CNN-based safety layer flags PPE non-compliance and dynamic-proximity breaches into machinery danger zones. The full suite is FP16/INT8-quantized and TensorRT-optimized to run under 250 ms per frame on on-board edge servers, deployed via Docker to remote sites worldwide.
Results
Raw drone video becomes near-real-time activity intelligence across mining and construction sites — equipment cycle-time analytics, automated stockpile volumetrics, and continuous safety monitoring on a single edge pipeline that runs reliably at off-grid locations. What previously took hours of manual point-cloud cropping and reporting now flows out of the live drone feed as structured, actionable data.


