Differentially private synthetic data generation for mobile money fraud detection

dc.contributor.author Azamuke, Denish
dc.date.accessioned 2026-01-02T12:32:38Z
dc.date.available 2026-01-02T12:32:38Z
dc.date.issued 2025
dc.description A thesis submitted to the Directorate of Graduate Training in fulfillment of the requirements for the award of the Degree of Doctor of Philosophy in Computer Science of Makerere University
dc.description.abstract We live in an era where routine transactions ranging from paying domestic bills to buying groceries are carried out using mobile financial services. However, the rapid growth and uptake of these services has led to amplified security and privacy risks, including SIM swap attacks, identity fraud, data theft, refund fraud, and unauthorized fees. Advances in machine learning (ML) show potential for detecting financial fraud in mobile money transactions, yet this requires access to large volumes of transaction data. Research on mobile money fraud has been hindered by data sensitivity and privacy concerns that restrict access to such datasets. In addition, real mobile money datasets are class-imbalanced, with far fewer frauds than legitimate transactions, biasing ML models against the minority class. This thesis presents a differentially private synthetic data generation approach for mobile money transaction datasets to support financial modeling and fraud detection. Developing a synthetic data generation model for tabular data that preserves the intricate, high-order correlations that drive fraud while guaranteeing differential privacy remains notoriously difficult. This challenge stems from calibration fragility in high-dimensional spaces and a parameter search space that expands exponentially, requiring thousands of stochastic runs for model convergence. Existing synthetic data generation methods do not accurately model sparse, event-driven features, while simpler resampling techniques risk leaking private information and struggle to capture evolving fraud tactics in real mobile money ecosystems. This thesis develops synthetic data generation techniques to investigate these limitations. This study introduces a multi-agent-based simulation model, MoMTSim, which simulates interactions among clients, merchants, and banks. MoMTSim is calibrated using transaction aggregates derived from a real mobile money transaction dataset. Its fidelity is assessed using the sum of squared errors, Kolmogorov-Smirnov tests, and visual diagnostics such as Bland-Altman plots and kernel density estimates. The results show a close resemblance to real data, with a total error of 2.0010 at the 100 000-client benchmark. We present MoMTSimDP, a differentially private extension of MoMTSim that applies the Gaussian mechanism and satisfies a (1.0, 10 ^ -6) privacy guarantee. MoMTSimDP maintains high fidelity, achieving a comparable total error of 2.0070 at 100 000 clients. Inference fidelity analysis shows that ML models trained on MoMTSim and MoMTSimDP data preserve key structural and multivariate relationships. Random forest and XGBoost maintain high feature-importance agreement with real-data models, even under differential privacy. Classification results also show that both models remain resilient, achieving AUCs of at least 0.79. The simulation model is embodied in MoMTLab, a graph-based platform built to enable visual analysis of mobile money transaction patterns.
dc.description.sponsorship Center for Effective Global Action (CEGA), JPMorgan Chase & Company, Google PhD Fellowship Program, and Makerere University.
dc.identifier.citation Azamuke, D. (2025). Differentially private synthetic data generation for mobile money fraud detection; Unpublished PhD Thesis, Makerere University, Kampala
dc.identifier.uri https://makir.mak.ac.ug/handle/10570/16129
dc.language.iso en
dc.publisher Makerere University
dc.title Differentially private synthetic data generation for mobile money fraud detection
dc.type Thesis
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
AZAMUKE-CoCIS-PhD-2025.pdf
Size:
5.3 MB
Format:
Adobe Portable Document Format
Description:
PhD Thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
462 B
Format:
Item-specific license agreed upon to submission
Description: