Differentially private synthetic data generation for mobile money fraud detection
Differentially private synthetic data generation for mobile money fraud detection
| dc.contributor.author | Azamuke, Denish | |
| dc.date.accessioned | 2026-01-02T12:32:38Z | |
| dc.date.available | 2026-01-02T12:32:38Z | |
| dc.date.issued | 2025 | |
| dc.description | A thesis submitted to the Directorate of Graduate Training in fulfillment of the requirements for the award of the Degree of Doctor of Philosophy in Computer Science of Makerere University | |
| dc.description.abstract | We live in an era where routine transactions ranging from paying domestic bills to buying groceries are carried out using mobile financial services. However, the rapid growth and uptake of these services has led to amplified security and privacy risks, including SIM swap attacks, identity fraud, data theft, refund fraud, and unauthorized fees. Advances in machine learning (ML) show potential for detecting financial fraud in mobile money transactions, yet this requires access to large volumes of transaction data. Research on mobile money fraud has been hindered by data sensitivity and privacy concerns that restrict access to such datasets. In addition, real mobile money datasets are class-imbalanced, with far fewer frauds than legitimate transactions, biasing ML models against the minority class. This thesis presents a differentially private synthetic data generation approach for mobile money transaction datasets to support financial modeling and fraud detection. Developing a synthetic data generation model for tabular data that preserves the intricate, high-order correlations that drive fraud while guaranteeing differential privacy remains notoriously difficult. This challenge stems from calibration fragility in high-dimensional spaces and a parameter search space that expands exponentially, requiring thousands of stochastic runs for model convergence. Existing synthetic data generation methods do not accurately model sparse, event-driven features, while simpler resampling techniques risk leaking private information and struggle to capture evolving fraud tactics in real mobile money ecosystems. This thesis develops synthetic data generation techniques to investigate these limitations. This study introduces a multi-agent-based simulation model, MoMTSim, which simulates interactions among clients, merchants, and banks. MoMTSim is calibrated using transaction aggregates derived from a real mobile money transaction dataset. Its fidelity is assessed using the sum of squared errors, Kolmogorov-Smirnov tests, and visual diagnostics such as Bland-Altman plots and kernel density estimates. The results show a close resemblance to real data, with a total error of 2.0010 at the 100 000-client benchmark. We present MoMTSimDP, a differentially private extension of MoMTSim that applies the Gaussian mechanism and satisfies a (1.0, 10 ^ -6) privacy guarantee. MoMTSimDP maintains high fidelity, achieving a comparable total error of 2.0070 at 100 000 clients. Inference fidelity analysis shows that ML models trained on MoMTSim and MoMTSimDP data preserve key structural and multivariate relationships. Random forest and XGBoost maintain high feature-importance agreement with real-data models, even under differential privacy. Classification results also show that both models remain resilient, achieving AUCs of at least 0.79. The simulation model is embodied in MoMTLab, a graph-based platform built to enable visual analysis of mobile money transaction patterns. | |
| dc.description.sponsorship | Center for Effective Global Action (CEGA), JPMorgan Chase & Company, Google PhD Fellowship Program, and Makerere University. | |
| dc.identifier.citation | Azamuke, D. (2025). Differentially private synthetic data generation for mobile money fraud detection; Unpublished PhD Thesis, Makerere University, Kampala | |
| dc.identifier.uri | https://makir.mak.ac.ug/handle/10570/16129 | |
| dc.language.iso | en | |
| dc.publisher | Makerere University | |
| dc.title | Differentially private synthetic data generation for mobile money fraud detection | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- AZAMUKE-CoCIS-PhD-2025.pdf
- Size:
- 5.3 MB
- Format:
- Adobe Portable Document Format
- Description:
- PhD Thesis
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 462 B
- Format:
- Item-specific license agreed upon to submission
- Description: