Today COVID-19 information provided by organizations can impact groups’ and peoples’ decisions and actions. While COVID-19 data is assumed to be randomly teleported, many countries have been questioned on the validity and legitimacy of their reports. By utilizing Benford's Law alongside multivariate regressions, extensive COVID-19 datasets were effectively analyzed for manipulation. Benford's Law can mathematically determine if information in large datasets occurs randomly by comparing probabilities of leading digits paired with chi-square tests. A Python script was coded to calculate the Benford Law values, create machine learning models, and multivariate regressions. Over 195 countries' COVID-19 cases and death data were investigated from WHO and JHU. Benford’s Law analysis indicated large improvements in a majority of countries in their COVID-19 reporting from the previous year. Few countries such as Russia and Belarus produced significant results with p-values under 0.05 indicating a significant change of manipulation. On the other end, over 75% of countries reported p-values greater than 0.90 for cases and deaths demonstrating a low probability of manipulation in their reports. For the linear and multivariate regressions factors including population, density, age distributions, HDI, vaccination percentage, and more were plotted to compare p-values. The regressions found correlations and trends within reliability and percentages with age distribution and vaccination records being examples when working alone or together. The methods of this project can be employed in past and future pandemics such as Influenza or Ebola, expand further to examine election data, or be utilized as evidence in court and worldwide validating information.
Research Paper by Abhishek Tripathi
#Benford’sLaw #Manipulation #Data #COVID-19 #Countries #Validity #Regression #CDC