Natural Language Processing
552 learners
Collecting and Preparing Textual Data for Classification
Learn how to collect and prepare specific textual datasets essential for your text classification project. You'll delve into the practices of gathering and cleaning text data, and explore advanced textual processing techniques.
NLTK
Python
Scikit-learn
5 lessons
25 practices
5 hours
Text Data Collection and Preparation
Lessons and practices
Explore More of the 20 Newsgroups Dataset
Uncover the End of 20 Newsgroups Dataset
Fetch Specific Categories from Dataset
Fetching the Third Article from Dataset
Exploring Text Length in Newsgroups Dataset
Update String and Clean Text
Filling in Python Functions and Regex Patterns
Mastering Text Cleaning with Python Regex
Implement Text Cleaning on Dataset
Mastering Text Cleaning with Python Regex on a Dataset
Switch from LancasterStemmer to PorterStemmer
Removing Stop Words and Punctuation from Text
Stemming Words with PorterStemmer
Implementing Stopword Removal and Stemming Function
Cleaning and Processing the First Newsgroup Article
Generating Bigrams and Trigrams with NLP
Generating Bigrams and Trigrams from Text Data
Generating Bigrams and Trigrams from Two Texts
Creating Bigrams from Preprocessed Text Data
Unigrams and Bigrams from Clean 20 Newsgroups Dataset
Changing the Sentence for Named Entity Recognition
Implementing Tokenization and POS Tagging
Applying Named Entity Recognition to a Sentence
Implementing a Named Entity Extraction Function
Applying NER and POS Tagging to Dataset
Meet Cosmo:
The smartest AI guide in the universe
Our built-in AI guide and tutor, Cosmo, prompts you with challenges that are built just for you and unblocks you when you get stuck.

Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal