Exploring NLP Preprocessing Techniques: Stopwords, Bag of Words, and Word Cloud
Natural Language Processing (NLP) is a fascinating field that bridges the gap between human communication and machine understanding. One of the fundamental steps in NLP is text preprocessing, which transforms raw text data into a format that can be effectively analyzed and utilized by algorithms. In this blog, we’ll delve into three essential NLP preprocessing techniques: stopwords removal, bag of words, and word cloud generation. We’ll explore what each technique is, why it’s used, and how to implement it using Python. Let’s get started!
Stopwords Removal: Filtering Out the Noise
What Are Stopwords?
Stopwords are common words that carry little meaningful information and are often removed from text data during preprocessing. Examples include “the,” “is,” “in,” “and,” etc. Removing stopwords helps in focusing on the more significant words that contribute to the meaning of the text.
Why remove stopwords?
Stopwords are removed from:
- Reduce the dimensionality of the text data.
- Improve the efficiency and performance of NLP models.
- Enhance the relevance of features extracted from the text.
Pros and Cons
Pros:
- Simplifies the text data.
- Reduces computational complexity.
- Focuses on meaningful words.
Cons:
- Risk of removing words that may carry context-specific importance.
- Some NLP tasks may require stopwords for better understanding.
Implementation
Let’s see how we can remove stopwords using Python:
import nltk
from nltk.corpus import stopwords
# Download the stopwords dataset
nltk.download('stopwords')
# Sample text
text = "This is a simple example to demonstrate stopword removal in NLP."
Load the set of stopwords in English
stop_words = set(stopwords.words('english'))
Tokenize the text into individual words
words = text.split()
Remove stopwords from the text
filtered_text = [word for word in words if word.lower() is not in stop_words]
print("Original Text:", text)
print("Filtered Text:", " ".join(filtered_text))
Code Explanation
Importing Libraries:
import nltk from nltk.corpus import stopwords
We import thenltk library and the stopwords module fromnltk.corpus.
Downloading Stopwords:
nltk.download('stopwords')
This line downloads the stopwords dataset from the NLTK library, which includes a list of common stopwords for multiple languages.
Sample Text:
text = "This is a simple example to demonstrate stopword removal in NLP."
We define a sample text that we want to preprocess by removing stopwords.
Loading Stopwords:
stop_words = set(stopwords.words(‘english’))
We load the set of English stopwords into the variable stop_words.
Tokenizing Text:
words = text.split()
The split() method tokenizes the text into individual words.
Removing Stopwords:
filtered_text = [word for word in words if word.lower() is not in stop_words]
We use a list comprehension to filter out stopwords from the tokenized words. The lower() method ensures case insensitivity.
Printing Results:
print("Original Text:", text) print("Filtered Text:", ""). join(filtered_text))
Finally, we print the original text and the filtered text after removing stopwords.
Bag of Words: Representing Text Data as Vectors
What Is Bag of Words?
The Bag of Words (BoW) model is a technique to represent text data as vectors of word frequencies. Each document is represented as a vector where each dimension corresponds to a unique word in the corpus, and the value indicates the word’s frequency in the document.
Why Use Bag of Words?
bag of Words is used to:
- Convert text data into numerical format for machine learning algorithms.
- Capture the frequency of words, which can be useful for text classification and clustering tasks.
Pros and Cons
Pros:
- Simple and easy to implement.
- Effective for many text classification tasks.
Cons:
- Ignores word order and context.
- Can result in high-dimensional sparse vectors.
Implementation
Here’s how to implement the Bag of Words model using Python:
from sklearn.feature_extraction.text import CountVectorizer
# Sample documents
documents = [
'This is the first document',
'This document is the second document',
'And this is the third document.',
'Is this the first document?'
]
# Initialize CountVectorizer
vectorizer = CountVectorizer()
Fit and transform the documents
X = vectorizer.fit_transform(documents)
# Convert the result to an array
X_array = X.toarray()
# Get the feature names
feature_names = vectorizer.get_feature_names_out()
# Print the feature names and the Bag of Words representation
print("Feature Names:", feature_names)
print (Bag of Words: n", X_array)
Code Explanation
- Importing Libraries:
from sklearn.feature_extraction.text import CountVectorizer
We import the CountVectorizer from the sklearn.feature_extraction.text module.
Sample Documents:
documents = [ ‘This is the first document’, ‘This document is the second document’, ‘And this is the third document.’, ‘Is this is the first document?’ ]
We define a list of sample documents to be processed.
Initializing CountVectorizer:
vectorizer = CountVectorizer()
We create an instance ofCountVectorizer.
Fitting and Transforming:
X = vectorizer.fit_transform(documents)
Thefit_transform method is used to fit the model and transform the documents into a bag of words.
Converting to an array:
X_array = X.toarray()
We convert the sparse matrix result to a dense array for easy viewing.
Getting Feature Names:
feature_names = vectorizer.get_feature_names_out()
The get_feature_names_out method retrieves the unique words identified in the corpus.
Printing Results:
print(“Feature Names:”, feature_names) print(“Bag of Words: n”, X_array)

Finally, we print the feature names and the bag of words.
Word Cloud: Visualizing Text Data
What Is a Word Cloud?
A word cloud is a visual representation of text data where the size of each word indicates its frequency or importance. It provides an intuitive and appealing way to understand the most prominent words in a text corpus.
Why Use Word Cloud?
Word clouds are used to:
- Quickly grasp the most frequent terms in a text.
- Visually highlight important keywords.
- Present text data in a more engaging format.
Pros and Cons
Pros:
- Easy to interpret and visually appealing.
- Highlights key terms effectively.
Cons:
- Can oversimplify the text data.
- May not be suitable for detailed analysis.
Implementation
Here’s how to create a word cloud using Python:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Sample text
df = pd.read_csv('/content/AmazonReview.csv')
comment_words = ""
stopwords = set(STOPWORDS)
for val in df.Review:
val = str(val)
tokens = val.split()
for i in range(len(tokens)):
tokens[i] = tokens[i].lower()
comment_words += "".join(tokens) + ""
pic = np.array(Image.open(requests.get('https://www.clker.com/cliparts/a/c/3/6/11949855611947336549home14.svg.med.png', stream = True).raw))
# Generate word clouds
wordcloud = WordCloud(width=800, height=800, background_color='white', mask=pic, min_font_size=12).generate(comment_words)
Display the word cloud
plt.figure(figsize=(8,8), facecolor=None)
plt.imshow(wordcloud)
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
Code Explanation
- Importing Libraries:
from wordcloud import WordCloud import matplotlib.pyplot as plt
We import the WordCloud class from the wordcloud library and matplotlib.pyplot for displaying the word cloud.
Generating Word Clouds:
wordcloud = WordCloud(width=800, height=800, background_color=’white’).generate(comment_words)
We create an instance of WordCloud with specified dimensions and background color and generate the word cloud using the sample text.

Conclusion
In this blog, we’ve explored three essential NLP preprocessing techniques: stopwords removal, bag of words, and word cloud generation. Each technique serves a unique purpose in the text preprocessing pipeline, contributing to the overall effectiveness of NLP tasks. By understanding and implementing these techniques, we can transform raw text data into meaningful insights and powerful features for machine learning models. Happy coding and exploring the world of NLP!
This brings us to the end of this article. I hope you have understood everything clearly. Make sure you practice as much as possible.
If you wish to check out more resources related to Data Science, Machine Learning and Deep learning, you can refer to my Github account.
You can connect with me on LinkedIn — RAVJOT SINGH.
I hope you like my article. From a future perspective, you can try other algorithms or choose different values of parameters to improve the accuracy even further. Please feel free to share your thoughts and ideas.
P.S. Claps and follows are highly appreciated.
Exploring NLP Preprocessing Techniques: Stopwords, Bag of Words, and Word Cloud was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.
MOL Unveils Next-Gen Coastal Tanker with AI and IoT Technology
AI Startup Pivot Robots, Founded by CMU Alumni, Lands Funding from NuVentures
Tech MSMEs Highlight Skills, Finance, AI Access Issues, Nasscom Report Reveals
Unveiling Machine Learning Algorithms Behind AI Chatbots
India Excels in AI Research, Bengaluru Ranks 7th Among Leading AI Hubs
AI and Deepfakes: Challenges and Regulatory Responses in India
Top Free AI Chatbots: The Best Free ChatGPT Alternatives
I’ve tested dozens of AI chatbots since ChatGPT’s debut. Here’s my new top pick

Since the launch of ChatGPT, AI chatbots have been all of the rage because of their ability to do a wide range of tasks that can help you with your personal and work life.
The list details everything you need to know before choosing your next AI assistant, including what it’s best for, pros and cons, cost, its large language model (LLM), and more.
Not only this, but most of these tools are free and great alternatives to ChatGPT and outperform it in certain cases.
I have used and spent weeks and months on almost all of these AI bots, so you don’t have to waste time trying them.
But first, let me give you top tools you can leverage to improve brainstorming and content writing.
Miro is an AI-native app designed to streamline the process of brainstorming, studying, organizing, note-taking and presenting ideas.
Create stunning visual content (mind-maps, flowcharts, presentations, etc.) simply by chatting.
Miro helps convert your notes and structured essays into beautiful mind maps. It can create an easy-to-understand visual presentation from any idea or prompt.
Just enter a prompt, and you get a beautiful chart of your choice amongst the 2500+ free concept map templates. It makes me and my team understand everything faster,more efficient, and save a tonne of time.
I use it to create stunning mind maps, visual brainstorming, creating flowcharts and other presentations from my unorganized notes and ideas especially for my work, and studies.
This app has completely revolutionized the way I take notes and record my ideas, as someone who enjoys taking notes and jotting down every idea, this app is truly a game-changer.
It is another value-for-money tool that is dirt cheap compared to the amazing features it provides. Trust me, you will absolutely fall in love with this app’s simplicity, user experience, and ease of use.
Pricing: Freemium
I strongly recommend it to everyone. Definitely a must-have visual productivity tool in your list.
MIRO is truly your perfect day-to-day visual study/brainstorming/ideation buddy.

One great AI Productivity Writing tool I recently started using for day-to-day writing and tasks such as plagiarism checker, grammar checker, QuillBot-Flow , QuillBot AI Content Detector, Paraphraser, Summariser, and translator is QuillBot .
It is a great paraphrasing tool and can easily beat all the AI-content detectors out there.
I wanted to try something similar and cheaper than Grammarly(12$ per month).
I took up its yearly premium for around $4/month (58% off) . The price was literally dirt cheap compared to other writing tools I have used in the past.
I personally love QuillBot Flow, and the whole set of amazing writing tools it offers.
Personally, it’s UI and UX is very simple and easy to use. So I just wanted to share this awesome, productive tool with you all. Do check it out and use it in your day-to-day writing tasks.
It is literally a one-stop shop writing productivity tool for everyone.

I really insist you to go try the above tools out. Trust me, you won’t regret using these tools and will thank me later.
Let’s get started and check out these amazing AI bots that are the best alternatives to ChatGPT —
INDEX
- Miro
- Claude
- Taskade
- Perplexity
- Notion
- Jasper
- ChatSonic
1) Miro

MIRO helps convert your notes, ideas, and structured essays into beautiful mind maps. It can create easy-to-understand visual content from any idea or prompt. Create stunning visual content (mind-maps, flowcharts, graphs for data analysis, presentations, etc) simply by chatting.
Pros
- Visual Tools: Excellent for brainstorming, flowcharts, and presentations. One of the best out there in my opinion.
- Templates: Thousands of free concept map templates are available.
- Note-Taking: Revolutionizes note-taking and idea recording.
- Versatility: Ideal for work, research, brainstorming, and study-related projects.
Cons
- None found, to be honest. It’s an excellent tool overall a great visual content creation tool, and an awesome alternative to ChatGPT.
Try it here — https://miro.com/brainstorming/
2) Claude

Best AI chatbot for image interpretation. I think the biggest advantage of this chatbot is its visual assistance. Even though ChatGPT can accept image and document inputs, I noticed that Claude can assist with interpreting images in a much faster manner.
Pros
- Upload document support
- Chat controls
- Light and dark mode
Cons
- Unclear usage cap
- Knowledge cutoff
Try it here — https://claude.ai/
3) Taskade

All-in-one AI productivity, ideation, writing, coding and mind-mapping, task/Project management app. Free to use with a value-for-money pro plan.
Pros
- Productivity Tool: Comprehensive AI-everything tool for writing and task management.
- AI Prompt Templates: Over 1000 templates for academic and productivity tasks.
- Versatile AI Agents: Research, coding, summarizing, tutoring, and content creation.
- One-Stop Shop: Integrated tool for all writing and productivity needs.
Cons
- Learning Curve: Requires time to explore and utilize all features.
Try it here — https://www.taskade.com/
4) Perplexity

Focused on providing accurate and detailed answers, Perplexity AI is a go-to for research-based queries and in-depth explanations. Great tool for research with very low hallucinations and limited free usage.
Pros
- Links to sources
- Access to internet
- simple UI
- Provides prompt suggestions to get chats started
Cons
- Paid subscription required for GPT-4 access
- Some irrelevant suggestions
Try it here — https://www.perplexity.ai/
5) Notion

Pros
- All-in-One Tool: Comprehensive productivity and task management in one place.
- AI Integration: Competes with Google Docs and Microsoft Office, enhancing productivity.
- Knowledge Management: The industry leader in combining knowledge management and AI.
- Cost-Effective: Affordable with a wide range of features.
- Top-Ranked: Recognized as a leading productivity tool.
Cons
- Complexity: It may have a steep learning curve due to extensive features.
- Overwhelming: It can be overwhelming for new users to navigate all functionalities.
Try it here — https://www.notion.so/
6) Jasper AI

Jasper offers extensive tools to produce better results. It can check for grammar and plagiarism and write in over 50 templates, including blog posts, Twitter threads, video scripts, and more. It also offers SEO insights and can even remember your brand voice.
Pros
- 50 different writing templates
- Copyediting features
- Plagiarism checker
Cons
- Need a subscription to try
- Steep cost
Try it here — https://www.jasper.ai/
7) ChatSonic

The Writesonic platform offers tools that help generate stories, including Instant Article Writer, which creates an article from a single click; Article Rewriter, which rephrases existing content; and Article Writer 5 & 6, which generates articles using ranking competitors and are SEO optimized.
Pros
- SEO tools — SEO Checker and Optimizer inbuilt
- Integration with Google Search
- Multiple Templates
- Creative Capabilities
- AI Personalities
Cons
- Word Limits
- Image Quality
Try it here — https://writesonic.com/chat
CONCLUSION
I hope you enjoyed reading this blog about some amazing free alternatives to ChatGPT out there, which can help you save some bucks and be super productive.
Do check out these AI ChatBots, save this post in your reading list, and bookmark them.
Awesome, you have reached the end and have already become smarter, more effective, and more productive just by learning about these awesome ChatGPT-4 alternatives and tools. The next step is to use them. Good luck!
The cheat sheet,save this, and keep it as a reference:
https://medium.com/media/0d0ac6f70de97d8c6f92bad88f4abd20/href
Please take something of value from this blog post and this cheat sheet.
Let’s harness the power of AI and technology to create a better future.
Top Free AI Chatbots: The Best Free ChatGPT Alternatives was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.
Understanding Tokenization, Stemming, and Lemmatization in NLP
Natural Language Processing (NLP) involves various techniques to handle and analyze human language data. In this blog, we will explore three essential techniques: tokenization, stemming, and lemmatization. These techniques are foundational for many NLP applications, such as text preprocessing, sentiment analysis, and machine translation. Let’s delve into each technique, understand its purpose, pros and cons, and see how they can be implemented using Python’s NLTK library.

1. Tokenization
What is Tokenization?
Tokenization is the process of splitting a text into individual units, called tokens. These tokens can be words, sentences, or subwords. Tokenization helps break down complex text into manageable pieces for further processing and analysis.
Why is Tokenization Used?
Tokenization is the first step in text preprocessing. It transforms raw text into a format that can be analyzed. This process is essential for tasks such as text mining, information retrieval, and text classification.
Pros and Cons of Tokenization
Pros:
- Simplifies text processing by breaking text into smaller units.
- Facilitates further text analysis and NLP tasks.
Cons:
- Can be complex for languages without clear word boundaries.
- May not handle special characters and punctuation well.
Code Implementation
Here is an example of tokenization using the NLTK library:
# Install NLTK library
!pip install nltk
Explanation:
- !pip install nltk: This command installs the NLTK library, which is a powerful toolkit for NLP in Python.
# Sample text
tweet = "Sometimes to understand a word's meaning you need more than a definition. you need to see the word used in a sentence."
Explanation:
- tweet: This is a sample text we will use for tokenization. It contains multiple sentences and words.
# Importing required modules
import nltk
nltk.download('punkt')
Explanation:
- import nltk: This imports the NLTK library.
- nltk.download(‘punkt’): This downloads the ‘punkt’ tokenizer models, which are necessary for tokenization.
from nltk.tokenize import word_tokenize, sent_tokenize
Explanation:
- from nltk.tokenize import word_tokenize, sent_tokenize: This imports the word_tokenize and sent_tokenize functions from the NLTK library for word and sentence tokenization, respectively.
# Word Tokenization
text = "Hello! how are you?"
word_tok = word_tokenize(text)
print(word_tok)
Explanation:
- text: This is a simple sentence we will tokenize into words.
- word_tok = word_tokenize(text): This tokenizes the text into individual words.
- print(word_tok): This prints the list of word tokens. Output: [‘Hello’, ‘!’, ‘how’, ‘are’, ‘you’, ‘?’]
# Sentence Tokenization
sent_tok = sent_tokenize(tweet)
print(sent_tok)
Explanation:
- sent_tok = sent_tokenize(tweet): This tokenizes the tweet into individual sentences.
- print(sent_tok): This prints the list of sentence tokens. Output: [‘Sometimes to understand a word’s meaning you need more than a definition.’, ‘you need to see the word used in a sentence.’]
2. Stemming
What is Stemming?
Stemming is the process of reducing a word to its base or root form. It involves removing suffixes and prefixes from words to derive the stem.
Why is Stemming Used?
Stemming helps in normalizing words to their root form, which is useful in text mining and search engines. It reduces inflectional forms and derivationally related forms of a word to a common base form.
Pros and Cons of Stemming
Pros:
- Reduces the complexity of text by normalizing words.
- Improves the performance of search engines and information retrieval systems.
Cons:
- Can lead to incorrect base forms (e.g., ‘running’ to ‘run’, but ‘flying’ to ‘fli’).
- Different stemming algorithms may produce different results.
Code Implementation
Let’s see how to perform stemming using different algorithms:
Porter Stemmer:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()
word = 'danced'
print(stemming.stem(word))
Explanation:
- from nltk.stem import PorterStemmer: This imports the PorterStemmer class from NLTK.
- stemming = PorterStemmer(): This creates an instance of the PorterStemmer.
- word = ‘danced’: This is the word we want to stem.
- print(stemming.stem(word)): This prints the stemmed form of the word ‘danced’. Output: danc
word = 'replacement'
print(stemming.stem(word))
Explanation:
- word = ‘replacement’: This is another word we want to stem.
- print(stemming.stem(word)): This prints the stemmed form of the word ‘replacement’. Output: replac
word = 'happiness'
print(stemming.stem(word))
Explanation:
- word = ‘happiness’: This is another word we want to stem.
- print(stemming.stem(word)): This prints the stemmed form of the word ‘happiness’. Output: happi
Lancaster Stemmer:
from nltk.stem import LancasterStemmer
stemming1 = LancasterStemmer()
word = 'happily'
print(stemming1.stem(word))
Explanation:
- from nltk.stem import LancasterStemmer: This imports the LancasterStemmer class from NLTK.
- stemming1 = LancasterStemmer(): This creates an instance of the LancasterStemmer.
- word = ‘happily’: This is the word we want to stem.
- print(stemming1.stem(word)): This prints the stemmed form of the word ‘happily’. Output: happy
Regular Expression Stemmer:
from nltk.stem import RegexpStemmer
stemming2 = RegexpStemmer('ing$|s$|e$|able$|ness$', min=3)
word = 'raining'
print(stemming2.stem(word))
Explanation:
- from nltk.stem import RegexpStemmer: This imports the RegexpStemmer class from NLTK.
- stemming2 = RegexpStemmer(‘ing$|s$|e$|able$|ness$’, min=3): This creates an instance of the RegexpStemmer with a regular expression pattern to match suffixes and a minimum stem length of 3 characters.
- word = ‘raining’: This is the word we want to stem.
- print(stemming2.stem(word)): This prints the stemmed form of the word ‘raining’. Output: rain
word = 'flying'
print(stemming2.stem(word))
Explanation:
- word = ‘flying’: This is another word we want to stem.
- print(stemming2.stem(word)): This prints the stemmed form of the word ‘flying’. Output: fly
word = 'happiness'
print(stemming2.stem(word))
Explanation:
- word = ‘happiness’: This is another word we want to stem.
- print(stemming2.stem(word)): This prints the stemmed form of the word ‘happiness’. Output: happy
Snowball Stemmer:
nltk.download("snowball_data")
from nltk.stem import SnowballStemmer
stemming3 = SnowballStemmer("english")
word = 'happiness'
print(stemming3.stem(word))
Explanation:
- nltk.download(“snowball_data”): This downloads the Snowball stemmer data.
- from nltk.stem import SnowballStemmer: This imports the SnowballStemmer class from NLTK.
- stemming3 = SnowballStemmer(“english”): This creates an instance of the SnowballStemmer for the English language.
- word = ‘happiness’: This is the word we want to stem.
- print(stemming3.stem(word)): This prints the stemmed form of the word ‘happiness’. Output: happy
stemming3 = SnowballStemmer("arabic")
word = 'تحلق'
print(stemming3.stem(word))
Explanation:
- stemming3 = SnowballStemmer(“arabic”): This creates an instance of the SnowballStemmer for the Arabic language.
- word = ‘تحلق’: This is an Arabic word we want to stem.
- print(stemming3.stem(word)): This prints the stemmed form of the word ‘تحلق’. Output: تحل
3. Lemmatization
What is Lemmatization?
Lemmatization is the process of reducing a word to its base or dictionary form, known as a lemma. Unlike stemming, lemmatization considers the context and converts the word to its meaningful base form.
Why is Lemmatization Used?
Lemmatization provides more accurate base forms compared to stemming. It is widely used in text analysis, chatbots, and NLP applications where understanding the context of words is essential.
Pros and Cons of Lemmatization
Pros:
- Produces more accurate base forms by considering the context.
- Useful for tasks requiring semantic understanding.
Cons:
- Requires more computational resources compared to stemming.
- Dependent on language-specific dictionaries.
Code Implementation
Here is how to perform lemmatization using the NLTK library:
# Download necessary data
nltk.download('wordnet')
Explanation:
- nltk.download(‘wordnet’): This command downloads the WordNet corpus, which is used by the WordNetLemmatizer for finding the lemmas of words.
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
Explanation:
- from nltk.stem import WordNetLemmatizer: This imports the WordNetLemmatizer class from NLTK.
- lemmatizer = WordNetLemmatizer(): This creates an instance of the WordNetLemmatizer.
print(lemmatizer.lemmatize('going', pos='v'))
Explanation:
- lemmatizer.lemmatize(‘going’, pos=’v’): This lemmatizes the word ‘going’ with the part of speech (POS) tag ‘v’ (verb). Output: go
# Lemmatizing a list of words with their respective POS tags
words = [("eating", 'v'), ("playing", 'v')]
for word, pos in words:
print(lemmatizer.lemmatize(word, pos=pos))
Explanation:
- words = [(“eating”, ‘v’), (“playing”, ‘v’)]: This is a list of tuples where each tuple contains a word and its corresponding POS tag.
- for word, pos in words: This iterates through each tuple in the list.
- print(lemmatizer.lemmatize(word, pos=pos)): This prints the lemmatized form of each word based on its POS tag. Outputs: eat, play
Applications in NLP
- Tokenization is used in text preprocessing, sentiment analysis, and language modeling.
- Stemming is useful for search engines, information retrieval, and text mining.
- Lemmatization is essential for chatbots, text classification, and semantic analysis.
Conclusion
Tokenization, stemming, and lemmatization are crucial techniques in NLP. They transform the raw text into a format suitable for analysis and help in understanding the structure and meaning of the text. By applying these techniques, we can enhance the performance of various NLP applications.
Feel free to experiment with the provided code snippets and explore these techniques further. Happy coding!
This brings us to the end of this article. I hope you have understood everything clearly. Make sure you practice as much as possible.
If you wish to check out more resources related to Data Science, Machine Learning and Deep Learning you can refer to my Github account.
You can connect with me on LinkedIn — RAVJOT SINGH.
I hope you like my article. From a future perspective, you can try other algorithms also, or choose different values of parameters to improve the accuracy even further. Please feel free to share your thoughts and ideas.
P.S. Claps and follows are highly appreciated.
Understanding Tokenization, Stemming, and Lemmatization in NLP was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.