Whole genome assembly and functional significance of nonsynonymous genetic variants of mycobacterium tuberculosis Uganda genotype isolated from Kampala, Uganda.
Abstract
Background: Mycobacterium tuberculosis complex (MTC) was initially regarded as a highly homogeneous population; however, recent data suggest the causative agents of tuberculosis are more genetically and functionally diverse than previously appreciated. This genetic diversity maybe responsible for phenotypic differences in drug susceptibility, pathogenicity, host tropism, transmissibility, immune response and geographical distribution of the strains. In light of this, the MTC are classified into seven lineages globally distributed with some lineages being more predominant than others in certain geographical regions. Particularly for Uganda, M. tuberculosis Uganda genotype has been reported to be the most predominant strain comprising of M. tuberculosis Uganda I and Uganda II subfamilies with the latter being more prevalent. However little is known about its actual genetic diversity which may have implications for diagnostics and vaccines. Materials and Methods: This was a cross sectional study where bacterial genomic DNA was extracted and library preparation for Illumina sequencing done using the Nextera XT DNA Library Preparation Kit. FastQ files were quality controlled using FastQC and Kvaq used to screen for M. tuberculosis Uganda genotype using the gyrA T80A SNP and consequently using customized bash scripts, we further differentiated the two sub families based on presence of the H37Rv chromosomal position '874787' SNP “'G>A" for UG-I and „Rv1332' 'G>C" for UG-II. Genome assembly was done using Unicycler (SPAdes-optimiser) and consequently functional annotation using BASys. The fastQ reads were further aligned to the indexed reference in order to facilitate the process of variant calling. In addition, the top statistically significant non-synonymous variants were analyzed further using SIFT for potential functional consequences Results: Functional annotation of the draft genome revealed a total of 4,583 genes and a sequence length of 4,262,180 base pairs for M. tuberculosis Uganda genotype I while M. tuberculosis Uganda genotype II was reported to have a total of 4701 genes and a sequence length of 4,353,558 base pairs. Most notably variant analysis revealed a novel nonsynonymous mutation in the eccD1 gene of M. tuberculosis Uganda genotype I that was deleterious and a novel nonsynonymous SNP in the adenylate cyclase gene that was predicted to be tolerated by SIFT. Conclusions: The novel nonsynonymous SNP in the adenylate cyclase gene was found to be exclusive to 97% of all Uganda II genotype strains. This may therefore offer a potential genotyping genetic marker for Uganda genotype II.