Explainable GNN-based Approach to Fault Forecasting in Cloud Service Debugging

Unyi, Dániel and Rigó, Ernő and Gyires-Tóth, Bálint Pál and Lovas, Róbert (2025) Explainable GNN-based Approach to Fault Forecasting in Cloud Service Debugging. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 22 (6). pp. 5640-5657. ISSN 1932-4537 10.1109/TNSM.2025.3602223

[img] Text
Unyi_5640_36301029_ny.pdf

Download (7MB)

Abstract

Debugging cloud services is increasingly challenging due to their distributed, dynamic, and scalable nature. Traditional methods struggle to handle large state spaces and the complex interactions between microservices, making it difficult to diagnose failures and identify critical components. This paper presents a Graph Neural Network (GNN)-based approach that enhances cloud service debugging by predicting system-level fault probabilities and providing interpretable insights into failure propagation. Our method models microservice interactions as graphs, where failures propagate probabilistically. Using Markov Decision Processes (MDPs), we simulate failure behaviors, capturing the probabilistic dependencies that influence system reliability. The trained GNN not only predicts fault probabilities but also identifies the most failure-prone microservices and explains their impact. We evaluate our approach on various service mesh structures, including feature-enriched, tree-structured, and general directed acyclic graph (DAG) architectures. Results indicate that our method is effective in the operational phase of cloud services, enabling proactive debugging and targeted optimization. This work represents a step toward more interpretable, reliable, and maintainable cloud infrastructures.

Item Type: Article
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Divisions: Department of Network Security and Internet Technologies
Laboratory of Parallel and Distributed Systems
SWORD Depositor: MTMT Injector
Depositing User: MTMT Injector
Date Deposited: 13 Jan 2026 07:26
Last Modified: 13 Jan 2026 07:26
URI: https://eprints.sztaki.hu/id/eprint/11032

Update Item Update Item