SNMP alert & prediction system for bandwidth utilization of critical data links
Abstract
The steady growth of internet usage in Uganda has led to a healthy competitive internet business market that has resulted in continuous improvement in the reliability and uptime of the service. Service providers have setup redundant links, highly available and fault tolerant systems to mitigate any outages or degradation in service. End to end monitoring of the various parameters particularly bandwidth utilization has become a critical aspect in detecting and mitigating faults that cause degradation of service or outages. In this project we look at the problem of anomaly detection in bandwidth utilization which is very indicative of fault that needs to be addressed immediately. The traditional monitoring tools available do not fully solve this problem as they use the threshold approach to determine a fault i.e. an upper bound and lower bound thresholds are set and if utilization goes beyond or below the respective bound, an alert is triggered. This however is not exhaustive because anomalies can occur even within the bounds of those thresholds. A machine learning approach is employed as the alternative solution where various datasets of bandwidth utilization are picked from production devices like Internet Edge switches and routers and are used to train the model on anomaly detection. The main problem faced was false positives/low accuracy. Multiple iterations were carried out and with various optimization techniques being applied to the model which greatly improve results.