CORD-19 Research Dataset Challenge Kaggle.

Project overview and inspiration.

The main inspiration for this project was from one of my modules conducted at the University of Hull on Datamining and to stretch my understanding on Natural Language Processing using machine learning whilst taking part in solving a real world problem. The main goal of the project is to obtain relevant data from research papers from the Kaggle Dataset website about the Corona virus. Several data cleaning techniques will be applied and various machine learning toolkits such as NLTK will be applied. The project goals can be broken down into several tasks as stated on the Kaggle website found here. There are various tasks to be done. I however picked 3 tasks to carry out which are as follows:

  • Range of incubation periods of the disease in humans across age/health status and how long individuals stay contagious even after recovery.
  • Prevelance of transmission.
  • Seasonality of transmission.

How it operates.

This project is currently underway and I will post an update as soon as it is complete! The progress of the project can be found here