Intelligent Backend Failure Detection with Uncertainty Quantification and Asymmetric Risk Optimization

Main Article Content

Yixue Liu
Zizhao Zhang
Shuyuan Liang

Abstract

Backend systems in cloud computing and microservice environments exhibit highly dynamic behavior, strong coupling, and substantial noise. These characteristics lead to challenges such as prediction instability and asymmetric risk in failure detection. To address the limitations of traditional approaches that rely on deterministic decisions, fail to represent predictive confidence, and ignore differences in error costs, this paper proposes a backend failure detection method that integrates uncertainty estimation with cost-sensitive learning. The method represents system states through probabilistic modeling. It outputs failure decisions while explicitly characterizing predictive uncertainty. Uncertainty information is further incorporated into a risk-weighted mechanism to suppress the influence of noisy samples and low-confidence predictions during model updates. In addition, a cost-sensitive learning objective is constructed so that the model explicitly accounts for the impact of different detection errors on system stability and service continuity during training. This design enables risk-aware decisions that better align with practical operational requirements. The proposed framework is evaluated under unified data and evaluation settings and compared with several representative methods. The results show consistent advantages in overall discriminative capability, decision stability, and risk control. The study demonstrates that joint modeling of predictive uncertainty and error cost improves the reliability and engineering applicability of backend failure detection in complex environments. It also provides an effective modeling approach for intelligent failure detection systems in real operational scenarios.

Article Details

Section

Articles