Shark Tank India Startup Sees 10x Revenue Increase, Sparks Judges’ Dispute
How Is the Sports Industry Utilising Blockchain Tech?
Sentiment Analysis of App Reviews: A Comparison of BERT, spaCy, TextBlob, and NLTK

BERT vs spaCy vs TextBlob vs NLTK in Sentiment Analysis for App Reviews
Sentiment analysis is the process of identifying and extracting opinions or emotions from text. It is a widely used technique in natural language processing (NLP) with applications in a variety of domains, including customer feedback analysis, social media monitoring, and market research.
There are a number of different NLP libraries and tools that can be used for sentiment analysis, including BERT, spaCy, TextBlob, and NLTK. Each of these libraries has its own strengths and weaknesses, and the best choice for a particular task will depend on a number of factors, such as the size and complexity of the dataset, the desired level of accuracy, and the available computational resources.
In this post, we will compare and contrast the four NLP libraries mentioned above in terms of their performance on sentiment analysis for app reviews.
BERT (Bidirectional Encoder Representations from Transformers)
BERT is a pre-trained language model that has been shown to be very effective for a variety of NLP tasks, including sentiment analysis. BERT is a deep learning model that is trained on a massive dataset of text and code. This training allows BERT to learn the contextual relationships between words and phrases, which is essential for accurate sentiment analysis.
BERT has been shown to outperform other NLP libraries on a number of sentiment analysis benchmarks, including the Stanford Sentiment Treebank (SST-5) and the MovieLens 10M dataset. However, BERT is also the most computationally expensive of the four libraries discussed in this post.
spaCy
spaCy is a general-purpose NLP library that provides a wide range of features, including tokenization, lemmatization, part-of-speech tagging, named entity recognition, and sentiment analysis. spaCy is also relatively efficient, making it a good choice for tasks where performance and scalability are important.
spaCy’s sentiment analysis model is based on a machine learning classifier that is trained on a dataset of labeled app reviews. spaCy’s sentiment analysis model has been shown to be very accurate on a variety of app review datasets.
TextBlob
TextBlob is a Python library for NLP that provides a variety of features, including tokenization, lemmatization, part-of-speech tagging, named entity recognition, and sentiment analysis. TextBlob is also relatively easy to use, making it a good choice for beginners and non-experts.
TextBlob’s sentiment analysis model is based on a simple lexicon-based approach. This means that TextBlob uses a dictionary of words and phrases that are associated with positive and negative sentiment to identify the sentiment of a piece of text.
TextBlob’s sentiment analysis model is not as accurate as the models offered by BERT and spaCy, but it is much faster and easier to use.
NLTK (Natural Language Toolkit)
NLTK is a Python library for NLP that provides a wide range of features, including tokenization, lemmatization, part-of-speech tagging, named entity recognition, and sentiment analysis. NLTK is a mature library with a large community of users and contributors.
NLTK’s sentiment analysis model is based on a machine learning classifier that is trained on a dataset of labeled app reviews. NLTK’s sentiment analysis model is not as accurate as the models offered by BERT and spaCy, but it is more efficient and easier to use.
The best NLP library for sentiment analysis of app reviews will depend on a number of factors, such as the size and complexity of the dataset, the desired level of accuracy, and the available computational resources.
BERT is the most accurate of the four libraries discussed in this post, but it is also the most computationally expensive. spaCy is a good choice for tasks where performance and scalability are important. TextBlob is a good choice for beginners and non-experts, while NLTK is a good choice for tasks where efficiency and ease of use are important.
Recommendation
If you are looking for the most accurate sentiment analysis results, then BERT is the best choice. However, if you are working with a large dataset or you need to perform sentiment analysis in real time, then spaCy is a better choice. If you are a beginner or non-expert, then TextBlob is a good choice. If you need a library that is efficient and easy to use, then NLTK is a good choice.
Sentiment Analysis of App Reviews: A Comparison of BERT, spaCy, TextBlob, and NLTK was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.
CInA: A New Technique for Causal Reasoning in AI Without Needing Labeled Data

Causal reasoning has been described as the next frontier for AI. While today’s machine learning models are proficient at pattern recognition, they struggle with understanding cause-and-effect relationships. This limits their ability to reason about interventions and make reliable predictions. For example, an AI system trained on observational data may learn incorrect associations like “eating ice cream causes sunburns,” simply because people tend to eat more ice cream on hot sunny days. To enable more human-like intelligence, researchers are working on incorporating causal inference capabilities into AI models. Recent work by Microsoft Research Cambridge and Massachusetts Institute of Technology has shown progress in this direction.
About the paper
Recent foundation models have shown promise for human-level intelligence on diverse tasks. But complex reasoning like causal inference remains challenging, needing intricate steps and high precision. Tye researchers take a first step to build causally-aware foundation models for such tasks. Their novel Causal Inference with Attention (CInA) method uses multiple unlabeled datasets for self-supervised causal learning. It then enables zero-shot causal inference on new tasks and data. This works based on their theoretical finding that optimal covariate balancing equals regularized self-attention. This lets CInA extract causal insights through the final layer of a trained transformer model. Experiments show CInA generalizes to new distributions and real datasets. It matches or beats traditional causal inference methods. Overall, CInA is a building block for causally-aware foundation models.
Key takeaways from this research paper:
- The researchers proposed a new method called CInA (Causal Inference with Attention) that can learn to estimate the effects of treatments by looking at multiple datasets without labels.
- They showed mathematically that finding the optimal weights for estimating treatment effects is equivalent to using self-attention, an algorithm commonly used in AI models today. This allows CInA to generalize to new datasets without retraining.
- In experiments, CInA performed as good as or better than traditional methods requiring retraining, while taking much less time to estimate effects on new data.
My takeaway on Causal Foundation Models:
- Being able to generalize to new tasks and datasets without retraining is an important ability for advanced AI systems. CInA demonstrates progress towards building this into models for causality.
- CInA shows that unlabeled data from multiple sources can be used in a self-supervised way to teach models useful skills for causal reasoning, like estimating treatment effects. This idea could be extended to other causal tasks.
- The connection between causal inference and self-attention provides a theoretically grounded way to build AI models that understand cause and effect relationships.
- CInA’s results suggest that models trained this way could serve as a basic building block for developing large-scale AI systems with causal reasoning capabilities, similar to natural language and computer vision systems today.
- There are many opportunities to scale up CInA to more data, and apply it to other causal problems beyond estimating treatment effects. Integrating CInA into existing advanced AI models is a promising future direction.
This work lays the foundation for developing foundation models with human-like intelligence through incorporating self-supervised causal learning and reasoning abilities.
CInA: A New Technique for Causal Reasoning in AI Without Needing Labeled Data was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.