Technical Report · Emerging Market Credit Infrastructure

Credit Risk Assessment Engine:
Alternative Data Scoring for
SME Lending in Sub-Saharan Africa

Author Christopher Okech

Domain Emerging Market Fintech

Date March 2026

Status Operational

Abstract

Formal credit remains inaccessible to most small businesses in Sub-Saharan Africa — not because of high risk, but because traditional credit scoring relies on bureau history that most applicants do not have. This report describes an ML-based credit scoring engine that replaces bureau scores with alternative behavioural signals: mobile money transaction frequency, utility payment patterns, and business operating history. The model achieves 88% precision on held-out test data and serves decisions in under 120 milliseconds via a secured REST API.

Performance Summary

88%

Precision

Risk classification, held-out test set

< 120ms

API Latency (P95)

FastAPI endpoint, Docker-containerised

7.4%

False Positive Rate

Low-risk applicants mis-classified as high-risk

Precision, Recall and F1 by risk tier

LightGBM classifier on held-out validation data.

Predicted risk distribution across applicants

Share of applicants falling in each tier.

False positive rate is monitored across demographic segments to guard against discriminatory outcomes. No statistically significant disparity detected in current validation set.

Estimated default rate — baseline underwriting vs. model-guided decisions

Modelled across seven quarters assuming incremental adoption. Values are simulated from test-set performance extrapolated to a representative loan portfolio of 500 SME applications per quarter.

* Simulated projection. Validation against live loan outcomes is required before quoting these figures in regulatory or investor documents.

Context & Problem

The credit gap as an infrastructure problem

Across Sub-Saharan Africa, over 60% of small business owners lack access to formal credit. The underlying cause is not credit risk — it is a data infrastructure gap. Bureau-based scoring requires payment history that formally unbanked applicants cannot provide.

The result is a credit market that excludes precisely the applicants it should serve: high-potential smallholder businesses with demonstrated cash flow, but no bureau footprint.

This engine replaces the bureau dependency with signals that already exist in the applicant's digital behaviour: mobile money transaction cadence, utility payment regularity, and business tenure. These signals are predictive of repayment behaviour and available for the majority of SME applicants in Kenya and similar markets.

Methodology

Feature design and model architecture

The model is a gradient-boosted classifier (LightGBM) trained on tabular alternative-data features. A binary classification head outputs a default probability; decisions are bucketed into three tiers (Approve, Review, Decline) using calibrated probability thresholds backed by a backtesting framework.

Mobile money transaction frequency is the single highest-weight feature — outperforming income stated on application forms in backtesting
Utility payment regularity captures a proxy for household stability that is strongly predictive for microenterprise borrowers
Business age introduces a non-linear risk floor: enterprises under 6 months receive a manual review flag regardless of other signals
SHAP values are computed per decision for regulatory audit trail. Each decision is individually explainable
A CAPIE-style API perimeter handles input validation, rate-limiting, and adversarial probe detection before any features reach the model

Feature importance — LightGBM SHAP mean absolute values

Averaged across the validation set. Higher values indicate stronger contribution to predicted default probability.

Feature names simplified for readability. Full feature schema available in the source repository.

Business Impact

Stakeholder	Problem addressed	Measurable outcome	Status
Microfinance institutions	High default rate on unsecured SME loans	Est. 25–35% reduction in non-performing loan ratio	Live
SME borrowers	Rejection despite demonstrated cash flow	Up to 40% of previously excluded applicants safely approvable	Live
Fintech partners	Manual underwriting bottlenecks	Sub-120ms decisioning via REST API — embeddable in any loan workflow	Live
Regulatory bodies	Black-box model opacity	Per-decision SHAP audit trail satisfies explainability requirements	In review
Impact investors	Difficulty measuring financial inclusion outcomes	Dashboard tracking SMEs approved per quarter, default rates, coverage expansion	Planned

Resources

Live scoring console: huggingface.co/spaces/okechobonyo/sme-credit-scoring
Source repository: github.com/okech-christopher/-sme-credit-scoring
API is containerised via Docker and deployable to any cloud provider without modification
SHAP explainability reports can be exported per application batch for regulatory submission

Conversational Analytics

Talk to the data

This interface demonstrates how stakeholders can query credit risk data using natural language, powered by an MCP server backend. In production, an LLM agent connects to the MCP server to parse questions, select the right tool, and return data-driven answers.

MCP-CREDIT · AGENT READY · 6 TOOLS AVAILABLE

TRY:

Architecture: Stakeholder question → LLM Agent (parses intent) → MCP Server (selects tool: score_applicant, explain_decision, risk_distribution, etc.) → Data Layer → Formatted response. The MCP server code is available at mcp_credit.py.

Projected impact figures (default rate reduction, approval rate expansion) are derived from held-out test set performance extrapolated to a modelled loan portfolio. They are not audited outcomes. Regulatory validation and live loan monitoring are required before these projections are cited in funding applications or compliance documentation.