Journal Articles (All Issues)

TWEET-BASED SENTIMENT ANALYSIS AND FORECASTING FOR THE COVID-19 PANDEMIC

Authors

Maninder Singh

Keyword COVID-19; sentiment analysis; machine learning; neural network; natural language processing

Abstract

The global impact of the novel coronavirus disease (COVID-19) has significantly affected people worldwide. Each nation has implemented necessary precautions against this highly contagious disease due to limited vaccine access and the absence of a straightforward, effective COVID-19 therapy. Consequently, individuals are increasingly turning to online social networking platforms (e.g., Facebook, Reddit, LinkedIn, and Twitter) to share their perspectives on COVID-19. This study focused on analyzing user sentiments related to COVID-19 using a Twitter dataset. For a period of 36 days (from 25 July to 29 August 2020), I acquired a dataset of COVID-19-related Twitter posts from Kaggle to conduct sentiment analysis. Multiple machine learning (ML) strategies were employed to classify user sentiments about COVID-19. The dataset was initially categorized into three sentiment ratings: positive, negative, and neutral, to train various ML algorithms for predicting user concerns regarding COVID-19. Feature extraction methods such as Word2Vec and TF-IDF were utilized in this study. Results indicated that Word2Vec, coupled with a random forest classifier, yielded superior outcomes.

References

    1. Abbasi, A., Javed, A. R., Chakraborty, C., Nebhen, J., Zehra, W., & Jalil, Z. E.,"An Ensemble Learning Approach for Concept Drift Detection in Dynamic Social Big Data Stream Learning," IEEE Access, vol. 9, pp. 66408-66419, 2021. 2. E. Chen, K. Lerman, and E. Ferrara, "Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Dataset," JMIR Public Health and Surveillance, vol. 6, no. 2, p. e19273, 2020. 3. M. Cinelli, W. Quattrociocchi, A. Galeazzi, C. M. Valensise, E. Brugnoli, and A. L. Schmidt, "The COVID-19 Social Media Infodemic," Scientific Reports, vol. 10, pp. 1–10, 2020. 4. N. Fernandes, "Economic Effects of Coronavirus Outbreak (COVID-19) on the World Economy," Available at SSRN 3557504, 2020. 5. S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta, and W. Z. Khan, "An Ensemble Machine Learning Approach through Effective Feature Extraction to Classify Fake News," Future Generation Computer Systems, vol. 117, pp. 47–58, 2021. 6. A. Jain and P. Dandannavar, "Application of Machine Learning Techniques to Sentiment Analysis," in Proceedings of the 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Bangalore, India, July 21–23, pp. 628–63, 2016. 7. H. Jelodar, Y. Wang, R. Orji, and S. Huang, "Deep Sentiment Classification and Topic Discovery on Novel Coronavirus or Covid-19 Online Discussions: NLP Using LSTM Recurrent Neural Network Approach," IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 10, pp. 2733–2742, 2020. [Online]. Available: https://doi.org/10.1109/JBHI.2020.3001216 8. A. Kumar, P. K. Roy, and J. P. Singh, "Working Notes of FIRE - 13th Forum for Information Retrieval Evaluation," Fire-WN, vol. 3159, pp. 1216–1220, 2021. 9. A. Kumar, G. S. Shankar, S. Gautham, P. K. Reddy, and G. T. Reddy, "A Two-Stage Text Feature Selection Algorithm for Improving Text Classification," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 20, pp. 1–19, 2021. 10. M. Kabir and M. S. CoronaVis, "A Real-Time COVID-19 Tweets Analyzer," arXiv, 2020, arXiv:2004.13932. 11. S. Loria, "TextBlob: Simplified Text Processing Release ver. 0.15.2," Available online. https://textblob.readthedocs.org/en/dev/index.html. 12. A. Mittal and S. Patidar, "Sentiment Analysis on Twitter Data: A Survey," in Proceedings of the 2019 7th International Conference on Computer and Communications Management, Bangkok, pp. 91–95, 2019. 13. A. Mondal, S. Mahata, M. Dey, and D. Das, "Classification of COVID19 Tweets Using Machine Learning Approaches," in Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, Mexico City, pp. 135–137, 2021. 14. U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim, "COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis," IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 1003–1015, 2021. [Online]. Available: https://doi.org/10.1109/TCSS.2021.3051189 15. A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, "Sentiment Analysis and Classification of Indian Farmers’ Protest Using Twitter Data," International Journal of Information Management Data Insights, vol. 1, p. 100019, 2021. 16. A. L. Pedrosa, L. Bitencourt, A. C. F. Fróes, M. L. B. Cazumbá, R. G. B. Campos, S. B. C. S. de Brito, and A. C. Simões E Silva, "Emotional Behavioral and Psychological Impact of the COVID-19 Pandemic," Frontiers in Psychology, vol. 11, p. 566212, 2020. [Online]. Available: https://doi.org/10.3389/fpsyg.2020.566212 17. B. Pokharel, "Twitter Sentiment Analysis During COVID-19 Outbreak in Nepal," Available at SSRN 3624719, 2020. 18. J. Samuel, G. Ali, M. Rahman, E. Esawi, and Y. Samuel, "Covid-19 Public Sentiment Insights and Machine Learning for Tweets Classification," Information Retrieval, vol. 11, p. 314, 2020. 19. M. Sethi, S. Pandey, P. Trar, and P. Soni, "Sentiment Identification in COVID-19 Specific Tweets," in Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, July 2–4, pp. 509–516, 2020. https://doi.org/10.1109/ICESC48915.2020.9155674. 20. R. B. Shamantha, S. M. Shetty, and P. Rai, "Sentiment Analysis Using Machine Learning Classifiers: Evaluation of Performance," in Proceedings of the 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, pp. 21–25, February 23–25, 2019. https://doi.org/10.1109/CCOMS.2019.8821650 21. M. K. Sharma, N. V. Dhiman, V. N. Vandana, and V. N. Mishra, "Mediative Fuzzy Logic Mathematical Model: A Contradictory Management Prediction in COVID-19 Pandemic," Applied Soft Computing, vol. 105, p. 107285, 2021. doi: 10.1016/j.asoc.2021.107285. 22. M. K. Sharma, N. V. Dhiman, V. N. Vandana, and V. N. Mishra, "Mediative Fuzzy Logic Mathematical Model: A Contradictory Management Prediction in COVID-19 Pandemic," Applied Soft Computing, vol. 105, p. 107285, 2021. https://doi.org/10.1016/j.asoc.2021.107285 23. C. Shofiya and S. Abidi, "Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data," International Journal of Environmental Research and Public Health, vol. 18, no. 11, p. 5993, 2021. https://doi.org/10.3390/ijerph18115993 24. M. Straka and J. Straková, "Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with Udpipe," Association for Computational Linguistics, pp. 88–99, 2017. https://doi.org/10.18653/v1/K17-3009. 25. J. Lovins, "Development of a Stemming Algorithm," Mech. Transl. Computational Linguistics, vol. 11, pp. 22–31, 1968. 26. G. Stringhini, C. Kruegel, and G. Vigna, "Detecting Spammers on Social Networks," in Proceedings of the 26th Annual Computer Security Applications Conference, Austin, TX, United States, December 6–10, 2010, pp. 1–9. https://doi.org/10.1145/1920261.1920263 27. S. Tuli, S. Tuli, R. Tuli, and S. S. Gill, "Predicting the Growth and Trend of COVID-19 Pandemic Using Machine Learning and Cloud Computing," Internet of Things, vol. 11, p. 100222, 2020. https://doi.org/10.1016/j.iot.2020.100222 28. J. C. Stoltzfus, "Logistic Regression: A Brief Primer," Academic Emergency Medicine, vol. 18, no. 10, pp. 1099–1104, 2011. https://doi.org/10.1111/j.1553-2712.2011.01185.x 29. W. S. Noble, "What Is a Support Vector Machine?" Nature Biotechnology, vol. 24, no. 12, pp. 1565–1567, 2006. https://doi.org/10.1038/nbt1206-1565 30. S. Tan, "An Effective Refinement Strategy for KNN Text Classifier," Expert Systems with Applications, vol. 30, no. 2, pp. 290–298, 2006. https://doi.org/10.1016/j.eswa.2005.07.019 31. I. Rish, "An Empirical Study of the Naive Bayes Classifier," in IJCAI 2001 Workshop Empirical Methods Artificial Intelligence, pp. 41–46, 2001. 32. W. Dai, G. Xue, Q. Yang, and Y. Yu, "Transferring Naive Bayes Classifiers for Text Classification," AAAI, vol. 7, pp. 540–545, 2007. 33. A. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, "Multinomial Naive Bayes for Text Categorization Revisited," in Proceedings of 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, pp. 488–499, December 4–6, 2004. [Online]. Available: https://doi.org/10.1007/978-3-540-30549-1_40. 34. A. Priyam, A. Abhijeeta, R. Rathee, and S. Srivastava, "Comparative Analysis of Decision Tree Classification Algorithms," International Journal of Current Engineering and Technology, vol. 3, pp. 334-337, 2013. 35. B. Xu, X. Guo, Y. Ye, and J. Cheng, "An Improved Random Forest Classifier for Text Categorization," Journal of Computers, vol. 7, no. 12, pp. 2913–2920, 2012, https://doi.org/10.4304/jcp.7.12.2913-2920. 36. Z. Chen, F. Jiang, Y. Cheng, X. Gu, W. Liu, and J. Peng, "XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud," in Proceedings of the IEEE International Conference On Big Data And Smart Computing (BigComp), Shanghai, China, pp. 251–256, 15–17 January 2018.

Downloads

View/Download PDF

PDF



Published

2022-04-30

Issue

Vol. 41 No. 04 (2022)