School of Computing and Informatics Technology (CIT) Collection

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 604
  • Item
    Differentially private synthetic data generation for mobile money fraud detection
    (Makerere University, 2025) Azamuke, Denish
    We live in an era where routine transactions ranging from paying domestic bills to buying groceries are carried out using mobile financial services. However, the rapid growth and uptake of these services has led to amplified security and privacy risks, including SIM swap attacks, identity fraud, data theft, refund fraud, and unauthorized fees. Advances in machine learning (ML) show potential for detecting financial fraud in mobile money transactions, yet this requires access to large volumes of transaction data. Research on mobile money fraud has been hindered by data sensitivity and privacy concerns that restrict access to such datasets. In addition, real mobile money datasets are class-imbalanced, with far fewer frauds than legitimate transactions, biasing ML models against the minority class. This thesis presents a differentially private synthetic data generation approach for mobile money transaction datasets to support financial modeling and fraud detection. Developing a synthetic data generation model for tabular data that preserves the intricate, high-order correlations that drive fraud while guaranteeing differential privacy remains notoriously difficult. This challenge stems from calibration fragility in high-dimensional spaces and a parameter search space that expands exponentially, requiring thousands of stochastic runs for model convergence. Existing synthetic data generation methods do not accurately model sparse, event-driven features, while simpler resampling techniques risk leaking private information and struggle to capture evolving fraud tactics in real mobile money ecosystems. This thesis develops synthetic data generation techniques to investigate these limitations. This study introduces a multi-agent-based simulation model, MoMTSim, which simulates interactions among clients, merchants, and banks. MoMTSim is calibrated using transaction aggregates derived from a real mobile money transaction dataset. Its fidelity is assessed using the sum of squared errors, Kolmogorov-Smirnov tests, and visual diagnostics such as Bland-Altman plots and kernel density estimates. The results show a close resemblance to real data, with a total error of 2.0010 at the 100 000-client benchmark. We present MoMTSimDP, a differentially private extension of MoMTSim that applies the Gaussian mechanism and satisfies a (1.0, 10 ^ -6) privacy guarantee. MoMTSimDP maintains high fidelity, achieving a comparable total error of 2.0070 at 100 000 clients. Inference fidelity analysis shows that ML models trained on MoMTSim and MoMTSimDP data preserve key structural and multivariate relationships. Random forest and XGBoost maintain high feature-importance agreement with real-data models, even under differential privacy. Classification results also show that both models remain resilient, achieving AUCs of at least 0.79. The simulation model is embodied in MoMTLab, a graph-based platform built to enable visual analysis of mobile money transaction patterns.
  • Item
    Classification of waste using region-based convolutional neural network
    (Makerere University, 2025) Rwothomio, Innocent Kercan
    The rapid increase in municipal solid waste poses significant environmental and public health challenges, particularly in rapidly urbanizing regions such as Kampala, Uganda. Traditional manual classification methods remain inefficient, error-prone, and costly, leading to low recycling rates and over-reliance on landfills. This study addresses these challenges by developing and evaluating a machine learning framework for automated waste classification. A secondary dataset of waste images was preprocessed and subjected to a two-phase training pipeline, integrating convolutional feature extraction with optimized classifiers. To ensure fairness and robustness, the study implemented bias mitigation techniques such as Synthetic Minority Oversampling (SMOTE) and applied hyperparameter optimization to improve model generalization. Benchmark comparisons with alternative architectures (VGG16, ResNet50, EfficientNet-B0) and classifiers (Support Vector Machines, Random Forests, Neural Networks) were conducted. Results demonstrate that the optimized Support Vector Machine achieved the best classification accuracy at 99.55%, outperforming other models across accuracy, F1-score, and real-world validation. The system was deployed through an interactive Streamlit interface, providing real-time prediction, visual performance analysis, and production-ready usability. These findings confirm the viability of machine learning– driven waste classification in resource-constrained contexts, offering a scalable solution to improve recycling efficiency, reduce operational costs, and support Uganda‘s transition toward sustainable waste management practices.
  • Item
    A model for streamlining and systemising the management of data in KCCA Primary Schools
    (Makerere University, 2025) Nabutto, Josephine
    Public primary schools in Kampala face a significant challenge of fragmented and inconsistent data management practices. This challenge undermines effective planning, service delivery, and policy implementation, and limits progress toward achieving the Sustainable Development Goals, particularly inclusive and equitable quality education. The study aimed to develop a data management model to streamline and Systemise planning processes and enhance service delivery in Kampala’s public primary schools. The objective was to address inefficiencies arising from varying data management practices across schools and to propose a coherent system that supports reliable decision-making. A mixed-methods research design was adopted, combining surveys and interviews with teachers, headteachers, and officials from the Ministry of Education and the Kampala Capital City Authority (KCCA). This approach enabled a comprehensive understanding of existing data management practices, challenges, and stakeholder requirements for a model for streamlining and systemizing the management of data. Findings revealed significant barriers to effective data management, including low ICT literacy, inadequate infrastructure, insufficient training, limited financing, and weak policy enforcement at the school level. In response, the study developed a three-tier data management model comprising People, Process, and Technology. The People tier emphasises stakeholder roles, collaboration, and trust; the Process tier introduces a structured workflow for planning, data collection, analysis, sharing, and archiving; and the Technology tier focuses on enabling infrastructure, policy frameworks, and data security. Validation by education stakeholders confirmed the model’s clarity, relevance, and potential impact, while noting challenges related to resources and capacity. Overall, the study demonstrates that streamlined and systemised data management is both necessary and feasible for KCCA primary schools and provides a practical foundation for improving educational planning, policy execution, and outcomes, with potential for replication across Uganda’s education sector.
  • Item
    Application of weak supervision in breast cancer detection using ultrasound images
    (Makerere University, 2025) Tibingana, Winfred
    Breast cancer, the most frequently diagnosed cancer and the fifth leading cause of cancer-related deaths worldwide, underscores the importance of early detection for improved survival rates and treatment outcomes. The highly regarded mammography faces limitations in accessibility, par ticularly in resource-limited settings and younger patients with dense breast tissue. Ultrasound imaging emerges as a promising alternative due to its accessibility, cost-e!ectiveness, and real-time imaging capabilities. However, the manual interpretation of ultrasound images is challenging and prone to errors, highlighting need for automated methods. This research explored the application of weak supervision for the detection and classification of breast cancer using breast ultrasound images, comparing fully supervised (FS) and weakly super vised (WS) learning approaches. Models like YOLOv8, ResNet50, MobileNet, and VGG19 were trained and evaluated on public and local datasets. The weakly supervised models yielded com petitive and commendable results, nearly matching those of the fully supervised models. The ResNet50 model emerged as the top performer for both the FS and WS classification tasks. The fully supervised ResNet50 model achieved an AUC score of 64%, precision of 66.67%, and recall of 57.1%. The weakly supervised ResNet50 model achieved an AUC score of 56%, precision of 58.1%, and recall of 51.4%. The di!erence metrics were 8% higher in the AUC score, 8.6% higher in precision, and 5.74% higher in recall, but the performance di!erences were not statistically sig nificant. These weakly supervised performance metrics for the ResNet50 model outpaced those of other weakly supervised models and were either comparable to or marginally di!erent from the fully supervised results of YOLOv8, MobileNet, and VGG19. This highlighted weak supervision as a viable alternative when exhaustive annotations are impractical. This research also incorporated advanced visualization techniques like Grad-CAM to interpret model predictions, enhancing the understanding of model decision-making processes. Key limita tions included challenges in parameter optimization due to computational resource constraints and the insu”cient exploration of alternative weak supervision techniques. These approaches aimed to improve the robustness, accuracy, and applicability of deep learning models in breast cancer detection and classification
  • Item
    Explainable machine learning for antimalarial activity prediction in drug discovery
    (Makerere University, 2025) Namulinda, Hellen
    Malaria remains a significant global health burden, causing substantial morbidity and mortality, particularly in tropical and subtropical regions. While effective antimalarial drugs exist, such as quinine, chloroquine, antifolates and artemisinin, the emergence of drug-resistant strains of Plasmodium falciparum emphasises the need for ongoing drug discovery efforts. One of the primary challenges in drug discovery is the high failure rate, with over 90% of candidate drugs failing to reach clinical trials. To address these challenges, the pharmaceutical industry and research institutions have explored alternative approaches to drug discovery, including artificial intelligence (AI) and machine learning (ML) techniques. Despite the growing array of ML methods for drug discovery, these techniques often demand expertise. Furthermore, there is a limited exploration into the rationale behind predictions, which is essential for understanding why a specific compound shows potential as an antimalarial agent. Understanding the types of molecular representations and relationships between chemical structure and activity prediction is necessary for researchers to refine molecules and design more effective drugs. This dissertation explored the application of ML models for predicting antimalarial activity in chemical compounds, with a focus on enhancing the interpretability of these models through Explainable Artificial Intelligence (XAI) techniques. A key contribution of this work is the development of the XAI4Chem tool, which integrates interpretability into the ML workflow for cheminformatics, allowing researchers to better understand the factors influencing predictions. Using data from the ChEMBL database, models were trained on molecular descriptors (RDKit, Datamol, and Mordred) and fingerprints (RDKit and Morgan) to predict the percentage inhibition and classify compounds as active or inactive. Models trained on RDKit descriptors with 64 selected features achieved a higher performance in regression (R² of 0.563), outperforming Morgan Fingerprints (R² of 0.5012). Both RDKit descriptors and Morgan fingerprints achieved 97% test accuracy in classification. SHAP (Shapley Additive exPlanations) value analysis identified key molecular features such as the compound’s lipophilicity (MolLogP), polar surface area (TPSA), number of amide functional groups (fr_amide), and the estimated drug-likeness (QED) as significant drivers of predictions.