Non-word error corrector for Luganda using neural networks
Abstract
People in Uganda use electronic devices to keep records, write comments and chatting messages. During the writing, spelling errors are persistently made which requires correct spelling aids to simplify information processing. In this research, a non-word error corrector model for Luganda was developed to aid spelling correction. The model was developed using neural networks. A study on spelling correction techniques for non-word errors for Luganda and other languages was carried out. The approach used for detection and correction was word2vec embeddings and trained neural network model. The model was firstly developed and then trained on the Luganda word data set of non-word errors mapped to correct words. Before model training, data cleansing, tokenization and sequence padding were performed on the dataset. Dataset was split into training, validation, and test datasets. The words were arranged in a sequence which was represented as word vectors before they were input into the model for training. During the implementation, the Recurrent Neural Network with Long Short-Term Memory (RNN-LSTM) was used for model training. Evaluation results showed that the RNN-LSTM achieved correction rates of 99% on trained dataset and 0.005% on untrained dataset.