Unified multi-modal neural network for improving credit scoring accuracy

Finance & Banking
Hyperautomation

Unified multi-modal neural network for improving credit scoring accuracy

Project snapshot

This case study reveals the story of helping a prominent international banking group dramatically improve credit scoring accuracy by redesigning their neural network architecture with a unified multi-modal approach for successful entry into high-risk international markets.

Client

International banking group with headquarters in NYC, offering digital banking, payments, wealth management, and personal and business loans across multiple markets.

Solution

Unified multi-modal neural network that processes embeddings from multiple data sources instead of separate mono-modal networks, enabling higher accuracy predictions for loan default assessment.

Business function

Risk assessment and credit scoring

Industry

Finance & banking

Challenge

Improve the reliability of AI-based credit scoring when entering a high-risk international market with limited credit histories and different regulatory requirements.

Result

1.8-point uplift of the target Gini metric with existing data sources
2.6-point Gini uplift after adding new data sources
Significantly improved loan default prediction accuracy

Client background

The client is a prominent international banking group with headquarters in New York City, offering digital banking, payments, wealth management, and personal and business loans across multiple markets.

The bank was planning to enter the Indian market with retail banking products, facing several critical challenges:

Thin credit histories – Large population segments have limited or nonexistent credit histories
Regulatory differences – Indian banking laws differ significantly from American credit scoring systems
Model translation issues – U.S.-built credit scoring models don’t translate well to emerging markets
High default risk – Analytics team forecasted elevated loan default risks in the new market

The company needed an advanced AI-driven credit scoring solution to accurately assess borrower reliability in markets with limited credit data while maintaining profitability.

Business challenge

The company sought an advanced AI-driven credit scoring solution to improve loan default prediction accuracy when entering high-risk international markets with limited credit data availability.

Potential threat: If the solution was not implemented, the client would face higher loan default rates, increased financial losses, and potential failure to establish successful operations in the new market.

Constraints:

Data: Sparse credit histories and limited financial data availability in emerging markets.
Regulatory: Different banking laws and risk assessment norms compared to established markets.
Technical: Existing mono-modal neural networks treating data sources as independent variables.
Market: Traditional U.S. credit scoring models don’t translate effectively to international markets.
Performance: Need for significantly higher accuracy than current production models.
Integration: New solution must work with existing data sources and infrastructure.
Risk: High financial impact of incorrect credit assessments in unfamiliar market conditions.

Problems and solutions

The unified multi-modal neural network presented challenges in integrating three independent data sources and optimizing model architecture. Xenoss team created solutions to improve training efficiency and prediction accuracy.

Independent data source treatment limiting accuracy

The original approach treated three data sources (card transactions, debit account transactions, credit history) as mutually independent, using separate neural networks and logistic regression for final scoring.

Solution: We developed a unified multi-modal architecture that processes embeddings from all data sources simultaneously, allowing the model to identify correlations between different data types that were previously missed in the independent approach.

Limited flexibility in end-to-end model training

Initial experiments with direct raw data input to a unified model resulted in long training times and difficulty identifying the impact of individual improvements under time constraints.

Solution: We implemented an embedding-based approach where each data source is transformed into vectorized representations before feeding to the main neural network, enabling faster training and better parallelization of development processes.

Handling missing data sources across clients

Not all clients have complete data across all three sources (card transactions, debit accounts, credit history), creating gaps in the scoring process.

Solution: We designed constant embedding mechanisms that use average embeddings for missing data sources, ensuring the model can still make accurate predictions even when some client data is unavailable.

Complex model architecture management

Supporting multiple models instead of a single end-to-end solution created development and maintenance complexity while requiring fast iteration cycles.

Solution: Our modular architecture allows for independent improvement of individual data source models while maintaining the unified prediction layer, enabling parallel development and faster experimentation with new approaches.

Scalability for new data sources integration

The original architecture made it difficult to incorporate additional data sources without major system redesign, limiting future expansion capabilities.

Solution: We built a flexible framework where new data sources can be easily added as additional embedding inputs, allowing the bank to enhance scoring accuracy by incorporating new data types without architectural changes.

Performance optimization for production deployment

Balancing model complexity with inference speed requirements while maintaining high accuracy standards for real-time credit scoring decisions.

Solution: We optimized the multilayer perceptron architecture for fast inference while preserving the quality improvements from multi-modal data fusion, ensuring production-ready performance with significant accuracy gains.

Discovery phase findings

The bank used three data sources to calculate the client’s default probability:
1. Card transaction: Credit score calculation based on card transaction data
2. Debit account transactions: Credit score calculation based on debit account data
3. Credit history: Credit score calculation based on credit history

Each data source had a separate neural network model to calculate credit scores based on the input data sequence. The bank team used logistic regression to mix the scores received from these three methods and get the final credit score:

Improvement idea

In the original approach, the data sources were treated as mutually independent. The client used each data source to calculate scores using alternative methods. For example, the overall score was first calculated based on card transactions, then alternatively based on credit history, etc.

What if a combination of factors from different data sources correlates with the likelihood of a user defaulting on a loan?
If this hypothesis is correct, treating all existing data sources as a unified input for a single neural network can give us higher precision.

Unified multi-modal neural network with embedded input

The alternative idea was to transform source data sets into embeddings and feed these embeddings to the main fully connected neural network. An embedding contains much more information than a single scalar, and with enough data, this approach allows for better model training.

Pros & Cons of the approach

Pros

Fast training: Such a model is trained quite quickly due to its architectural simplicity and the
absence of recurrent layers

Parallelization: This approach allows for the acceleration of the development of a general solution due to the parallelization of the processes of improving individual models

Cons

Multi-model support: The challenge is that we must support a set of models instead of just one end-to-end model.

Architectural highlights

Vectorized data source as input

The model takes the embeddings of each data source (card transactions, current account transactions, credit history, etc.) as input. There can be any number of sources.

Constant embedding for missed data sources

If there is no embedding for one of the data sources for the client, the model takes the average constant embedding corresponding to the data source as input.

MLP classification

The embedding of each data source is fed to the input of a multilayer perceptron (MLP). Then, their outputs are concatenated to obtain a generalized vector representation of the client. This resulting embedding is fed to the input of the MLP classification layer to form the final prediction.

Target metric: Gini

When evaluating a credit scoring model, we used the Gini metric. It shows us how well the model ranks clients in predicting client default.

Gini = 100% * (2 * ROC AUC – 1)

Increasing the Gini metric value allows the bank to issue more loans at the same risk level. Increasing the number of loans directly leads to an increase in profit, so when calculating the financial effect of the model, each Gini point is converted into profit. Depending on the size of the bank’s lending operations, the effect of one additional Gini point can be estimated from hundreds of thousands to hundreds of millions of dollars of extra profit.

Results

1.8-point Gini metric uplift (existing data)

The unified multi-modal neural network achieved significant improvement in credit scoring accuracy using the same three data sources (card transactions, debit account data, credit history) without adding new information.

2.6-point Gini uplift (new data sources)

When new data sources were integrated into the modular architecture, the model demonstrated even higher performance gains, showcasing the scalability and flexibility of the embedding-based approach.

Improved loan default prediction accuracy

The embedding model preserves the quality of individual models on data sources while producing higher output quality by mixing input data from different sources, enabling better risk assessment for high-risk markets.

Want to build your own solution?