Adaptive Scheduling for Multi-Model Collaborative Distributed Inference under Resource Heterogeneity and Dynamic Workloads

Main Article Content

Sijia Li

Abstract

This paper addresses the scheduling complexity of distributed inference systems in multi-model collaborative scenarios under resource heterogeneity, model diversity, and dynamic request conditions. An adaptive scheduling method for distributed inference systems with multi-model collaboration is proposed. From a system-level perspective, the method integrates model selection and resource allocation into a unified scheduling decision process. Coordinated management of multi-model inference workflows is achieved through comprehensive characterization of system states. The proposed scheduling mechanism dynamically adjusts its decisions in response to changes in the operating environment and maintains a reasonable execution order under shared resource constraints, thereby improving overall system performance. Comparative experiments conducted in multi-model collaborative inference settings demonstrate that the proposed method achieves clear advantages in resource utilization efficiency, load distribution rationality, system stability, and scheduling responsiveness. The method effectively mitigates performance degradation caused by resource contention and model heterogeneity during parallel multi-model execution. The results indicate that a unified adaptive scheduling design plays an important role in supporting complex intelligent inference services and provides valuable guidance for the engineering implementation and optimization of distributed inference systems.

Article Details

Section

Articles

References

[1]A. Symons, L. Mei, S. Colleman, P. Houshmand, S. Karl and M. Verhelst, "Towards Heterogeneous Multi-Core Accelerators Exploiting Fine-Grained Scheduling of Layer-Fused Deep Neural Networks," arXiv preprint arXiv:2212, 2022.

[2]Z. Wang, "Federated Multi-Scale Representation Learning for Privacy-Aware Log Anomaly Detection in Distributed Cloud Environments," 2024.

[3]Z. Wang, Y. Yu, W. Zheng, W. Ma, and M. Zhang, "Macrec: A multi-agent collaboration framework for recommendation," in Proc. 47th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Jul. 2024, pp. 2760-2764.

[4]N. Williams, S. K. Suresh, L. Hughes, B. Kileen, and T. Galanos, "Applications of an LLM to scale and automate computational workflows for civil structural design," in Proc. IASS Annual Symposia, vol. 2024, no. 11, Aug. 2024, pp. 1-9.

[5]Q. Zhang, "Adaptive Resource Scheduling in Distributed Computing via Multi-Agent Reinforcement Learning and Graph Convolutional Modeling," 2024.

[6]Weerasooriya A, Wanniarachchi D, Peiris S H, et al. Multi-Model System for Sustainable Coral Reef Conservation in Sri Lanka[C]//2024 6th International Conference on Advancements in Computing (ICAC). IEEE, 2024: 504-509.

[7]Y. Ma, "Anomaly detection in microservice environments via conditional multiscale GANs and adaptive temporal autoencoders," 2024.

[8]Z. Cui, X. Xiao, W. Qiong, P. Fang, Q. Feng, H. Zhang, and J. Wang, "Anti-Byzantine attacks enabled vehicle selection for asynchronous federated learning in vehicular edge computing," China Communications, vol. 21, no. 8, pp. 1-17, 2024.

[9]Odema M, Chen L, Kwon H, et al. Scar: Scheduling multi-model ai workloads on heterogeneous multi-chiplet module accelerators[C]//2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024: 565-579.

[10]Z. Qiu, "A Multi-Scale Deep Learning and Uncertainty Estimation Framework for Comprehensive Anomaly Detection in Cloud Environments," 2023.

[11]X. Yang, "Trend-Fluctuation Decomposition with Deep Residual Networks for System Forecasting," 2024.

[12]J. Wang, J. Wang, B. Athiwaratkun, C. Zhang, and J. Zou, "Mixture-of-agents enhances large language model capabilities," arXiv preprint arXiv:2406.04692, 2024.