Explainable Deep Learning for Credit Risk Prediction via Multi-Modal Financial Representation Learning

Main Article Content

Alaric Wrenford

Abstract

Accurate credit risk prediction is critical for financial institutions in managing loan portfolios and mitigating systemic risk. This paper proposes an explainable deep learning framework that integrates structured financial records with unstructured textual data, including transaction descriptions and credit reports, through a multi-modal representation learning architecture. The model combines a Wide & Deep network with a transformer-based text encoder, enabling joint feature extraction and nonlinear interaction modeling. Experiments are conducted on a real-world dataset containing over 1.2 million loan records from U.S.-based lending platforms, with an observed default rate of 13.7%. The proposed model achieves an AUC of 0.921 and a KS statistic of 0.68, outperforming traditional logistic regression (AUC 0.842) and gradient boosting methods (AUC 0.895). Furthermore, an integrated SHAP-based interpretability module reveals key risk drivers such as debt-to-income ratio, transaction volatility, and semantic indicators of financial distress in borrower narratives. The results demonstrate that incorporating multi-modal information significantly enhances predictive performance while maintaining model transparency, making the approach suitable for regulatory-compliant financial decision-making systems.

Article Details

Section

Articles