System-Level Bottlenecks in Edge-Based Video Analytics Workloads

Silas Whitaker; Silas Whitaker

pdf

Published: 2024-09-01

Silas Whitaker

Victoria University

Silas Whitaker

Victoria University

Abstract

AI acceleration in edge data centers is often evaluated in terms of model throughput alone. Yet practical deployments reveal that inference time constitutes only a fraction of end-to-end latency. Preprocessing, decoding, storage I/O, and metadata synchronization introduce significant overhead. We instrumented a face recognition service pipeline deployed across a three-node edge cluster and analyzed resource utilization at millisecond granularity. As GPU inference latency decreased through hardware acceleration, network traffic and SSD queue depth emerged as primary limiting factors. The imbalance produced diminishing returns from further accelerator scaling. A redesigned storage-aware scheduling scheme was introduced to co-locate preprocessing and inference tasks while limiting redundant frame transfers. The resulting system achieved improved throughput stability under burst workloads. These findings reinforce the notion that accelerator-centric evaluation overlooks infrastructure coupling effects.

Issue

Vol. 4 No. 3 (2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section