← portfolio
Technical Report · Emerging Market Credit Infrastructure

Credit Risk Assessment Engine:
Alternative Data Scoring for
SME Lending in Sub-Saharan Africa

Abstract

Formal credit remains inaccessible to most small businesses in Sub-Saharan Africa — not because of high risk, but because traditional credit scoring relies on bureau history that most applicants do not have. This report describes an ML-based credit scoring engine that replaces bureau scores with alternative behavioural signals: mobile money transaction frequency, utility payment patterns, and business operating history. The model achieves 88% precision on held-out test data and serves decisions in under 120 milliseconds via a secured REST API.

Performance Summary

88%
Precision
Risk classification, held-out test set
< 120ms
API Latency (P95)
FastAPI endpoint, Docker-containerised
7.4%
False Positive Rate
Low-risk applicants mis-classified as high-risk
Precision, Recall and F1 by risk tier
LightGBM classifier on held-out validation data.
Predicted risk distribution across applicants
Share of applicants falling in each tier.
False positive rate is monitored across demographic segments to guard against discriminatory outcomes. No statistically significant disparity detected in current validation set.
Estimated default rate — baseline underwriting vs. model-guided decisions
Modelled across seven quarters assuming incremental adoption. Values are simulated from test-set performance extrapolated to a representative loan portfolio of 500 SME applications per quarter.
* Simulated projection. Validation against live loan outcomes is required before quoting these figures in regulatory or investor documents.

Context & Problem

The credit gap as an infrastructure problem

Across Sub-Saharan Africa, over 60% of small business owners lack access to formal credit. The underlying cause is not credit risk — it is a data infrastructure gap. Bureau-based scoring requires payment history that formally unbanked applicants cannot provide.

The result is a credit market that excludes precisely the applicants it should serve: high-potential smallholder businesses with demonstrated cash flow, but no bureau footprint.

This engine replaces the bureau dependency with signals that already exist in the applicant's digital behaviour: mobile money transaction cadence, utility payment regularity, and business tenure. These signals are predictive of repayment behaviour and available for the majority of SME applicants in Kenya and similar markets.

Methodology

Feature design and model architecture

The model is a gradient-boosted classifier (LightGBM) trained on tabular alternative-data features. A binary classification head outputs a default probability; decisions are bucketed into three tiers (Approve, Review, Decline) using calibrated probability thresholds backed by a backtesting framework.

Feature importance — LightGBM SHAP mean absolute values
Averaged across the validation set. Higher values indicate stronger contribution to predicted default probability.
Feature names simplified for readability. Full feature schema available in the source repository.

Business Impact

Stakeholder Problem addressed Measurable outcome Status
Microfinance institutions High default rate on unsecured SME loans Est. 25–35% reduction in non-performing loan ratio Live
SME borrowers Rejection despite demonstrated cash flow Up to 40% of previously excluded applicants safely approvable Live
Fintech partners Manual underwriting bottlenecks Sub-120ms decisioning via REST API — embeddable in any loan workflow Live
Regulatory bodies Black-box model opacity Per-decision SHAP audit trail satisfies explainability requirements In review
Impact investors Difficulty measuring financial inclusion outcomes Dashboard tracking SMEs approved per quarter, default rates, coverage expansion Planned

Resources

Conversational Analytics

Talk to the data

This interface demonstrates how stakeholders can query credit risk data using natural language, powered by an MCP server backend. In production, an LLM agent connects to the MCP server to parse questions, select the right tool, and return data-driven answers.

MCP-CREDIT · AGENT READY · 6 TOOLS AVAILABLE
TRY:
Architecture: Stakeholder question → LLM Agent (parses intent) → MCP Server (selects tool: score_applicant, explain_decision, risk_distribution, etc.) → Data Layer → Formatted response. The MCP server code is available at mcp_credit.py.
Projected impact figures (default rate reduction, approval rate expansion) are derived from held-out test set performance extrapolated to a modelled loan portfolio. They are not audited outcomes. Regulatory validation and live loan monitoring are required before these projections are cited in funding applications or compliance documentation.