Corpus of African Digital News from 600 Websites Formatted for Text Mining / Computational Text Analysis [Data set].
This dataset includes a corpus 200,000+ news articles published by 600 African news organizations between December 4, 2020 and January 3, 2021. The texts have been pre-processed (punctuation and English stopwords have been removed, features have been lowercased, lemmatized and POS-tagged) and stored in commonly used formats for text mining/computational text analysis. Users are advised to read the documentation for an explanation of the data collection process.