Shark Tank India Startup Sees 10x Revenue Increase, Sparks Judges’ Dispute

Shark Tank India’s recent episode was exiting and beyond expectation with amazed judges, who were caught in a fierce bidding war over sanitary pad disposal startup Padcare. Founder Ajinkya Dhariya led his company to a dramatic increase and the revenue was up by ten times in a period of just 15 months. The annual revenue […]

How Is the Sports Industry Utilising Blockchain Tech?

Everyone is using blockchain tech – can you name an industry that is not? If you can, we can almost guarantee they will begin using it soon. One of the industries starting to use blockchain tech heavily is sports. Read on to learn how. Sports Industry Overview According to The Business Research Company’s Sports Global […]

Sentiment Analysis of App Reviews: A Comparison of BERT, spaCy, TextBlob, and NLTK

Kenyan Bank Sentiment Analysis Dashboard — Tableau

BERT vs spaCy vs TextBlob vs NLTK in Sentiment Analysis for App Reviews

Sentiment analysis is the process of identifying and extracting opinions or emotions from text. It is a widely used technique in natural language processing (NLP) with applications in a variety of domains, including customer feedback analysis, social media monitoring, and market research.

There are a number of different NLP libraries and tools that can be used for sentiment analysis, including BERT, spaCy, TextBlob, and NLTK. Each of these libraries has its own strengths and weaknesses, and the best choice for a particular task will depend on a number of factors, such as the size and complexity of the dataset, the desired level of accuracy, and the available computational resources.

In this post, we will compare and contrast the four NLP libraries mentioned above in terms of their performance on sentiment analysis for app reviews.

BERT (Bidirectional Encoder Representations from Transformers)

BERT is a pre-trained language model that has been shown to be very effective for a variety of NLP tasks, including sentiment analysis. BERT is a deep learning model that is trained on a massive dataset of text and code. This training allows BERT to learn the contextual relationships between words and phrases, which is essential for accurate sentiment analysis.

BERT has been shown to outperform other NLP libraries on a number of sentiment analysis benchmarks, including the Stanford Sentiment Treebank (SST-5) and the MovieLens 10M dataset. However, BERT is also the most computationally expensive of the four libraries discussed in this post.

spaCy

spaCy is a general-purpose NLP library that provides a wide range of features, including tokenization, lemmatization, part-of-speech tagging, named entity recognition, and sentiment analysis. spaCy is also relatively efficient, making it a good choice for tasks where performance and scalability are important.

spaCy’s sentiment analysis model is based on a machine learning classifier that is trained on a dataset of labeled app reviews. spaCy’s sentiment analysis model has been shown to be very accurate on a variety of app review datasets.

TextBlob

TextBlob is a Python library for NLP that provides a variety of features, including tokenization, lemmatization, part-of-speech tagging, named entity recognition, and sentiment analysis. TextBlob is also relatively easy to use, making it a good choice for beginners and non-experts.

TextBlob’s sentiment analysis model is based on a simple lexicon-based approach. This means that TextBlob uses a dictionary of words and phrases that are associated with positive and negative sentiment to identify the sentiment of a piece of text.

TextBlob’s sentiment analysis model is not as accurate as the models offered by BERT and spaCy, but it is much faster and easier to use.

NLTK (Natural Language Toolkit)

NLTK is a Python library for NLP that provides a wide range of features, including tokenization, lemmatization, part-of-speech tagging, named entity recognition, and sentiment analysis. NLTK is a mature library with a large community of users and contributors.

NLTK’s sentiment analysis model is based on a machine learning classifier that is trained on a dataset of labeled app reviews. NLTK’s sentiment analysis model is not as accurate as the models offered by BERT and spaCy, but it is more efficient and easier to use.

The best NLP library for sentiment analysis of app reviews will depend on a number of factors, such as the size and complexity of the dataset, the desired level of accuracy, and the available computational resources.

BERT is the most accurate of the four libraries discussed in this post, but it is also the most computationally expensive. spaCy is a good choice for tasks where performance and scalability are important. TextBlob is a good choice for beginners and non-experts, while NLTK is a good choice for tasks where efficiency and ease of use are important.

Recommendation

If you are looking for the most accurate sentiment analysis results, then BERT is the best choice. However, if you are working with a large dataset or you need to perform sentiment analysis in real time, then spaCy is a better choice. If you are a beginner or non-expert, then TextBlob is a good choice. If you need a library that is efficient and easy to use, then NLTK is a good choice.


Sentiment Analysis of App Reviews: A Comparison of BERT, spaCy, TextBlob, and NLTK was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

CInA: A New Technique for Causal Reasoning in AI Without Needing Labeled Data

AI Robot

Causal reasoning has been described as the next frontier for AI. While today’s machine learning models are proficient at pattern recognition, they struggle with understanding cause-and-effect relationships. This limits their ability to reason about interventions and make reliable predictions. For example, an AI system trained on observational data may learn incorrect associations like “eating ice cream causes sunburns,” simply because people tend to eat more ice cream on hot sunny days. To enable more human-like intelligence, researchers are working on incorporating causal inference capabilities into AI models. Recent work by Microsoft Research Cambridge and Massachusetts Institute of Technology has shown progress in this direction.

About the paper

Recent foundation models have shown promise for human-level intelligence on diverse tasks. But complex reasoning like causal inference remains challenging, needing intricate steps and high precision. Tye researchers take a first step to build causally-aware foundation models for such tasks. Their novel Causal Inference with Attention (CInA) method uses multiple unlabeled datasets for self-supervised causal learning. It then enables zero-shot causal inference on new tasks and data. This works based on their theoretical finding that optimal covariate balancing equals regularized self-attention. This lets CInA extract causal insights through the final layer of a trained transformer model. Experiments show CInA generalizes to new distributions and real datasets. It matches or beats traditional causal inference methods. Overall, CInA is a building block for causally-aware foundation models.

Key takeaways from this research paper:

  • The researchers proposed a new method called CInA (Causal Inference with Attention) that can learn to estimate the effects of treatments by looking at multiple datasets without labels.
  • They showed mathematically that finding the optimal weights for estimating treatment effects is equivalent to using self-attention, an algorithm commonly used in AI models today. This allows CInA to generalize to new datasets without retraining.
  • In experiments, CInA performed as good as or better than traditional methods requiring retraining, while taking much less time to estimate effects on new data.

My takeaway on Causal Foundation Models:

  • Being able to generalize to new tasks and datasets without retraining is an important ability for advanced AI systems. CInA demonstrates progress towards building this into models for causality.
  • CInA shows that unlabeled data from multiple sources can be used in a self-supervised way to teach models useful skills for causal reasoning, like estimating treatment effects. This idea could be extended to other causal tasks.
  • The connection between causal inference and self-attention provides a theoretically grounded way to build AI models that understand cause and effect relationships.
  • CInA’s results suggest that models trained this way could serve as a basic building block for developing large-scale AI systems with causal reasoning capabilities, similar to natural language and computer vision systems today.
  • There are many opportunities to scale up CInA to more data, and apply it to other causal problems beyond estimating treatment effects. Integrating CInA into existing advanced AI models is a promising future direction.

This work lays the foundation for developing foundation models with human-like intelligence through incorporating self-supervised causal learning and reasoning abilities.


CInA: A New Technique for Causal Reasoning in AI Without Needing Labeled Data was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Hyderabad’s Hyperleap AI Unveils Generative AI Platform After $225K Pre-Seed Round

Hyderabad-based startup Hyperleap AI has come up with an advanced Generative AI platform that is specifically designed for businesses. Prior to the release, it successfully secured a capital of $225,000 in pre-seed funding round. The new investment was secured from several angel investors including Anil Kommineni, who is a Senior Vice President at Zenoti. The […]

AbleCredit, an AI credit company, raises seed funding led by Merak Ventures

Merak Ventures, a sector-agnostic early-stage venture capital (VC) firm, has announced its fourth significant investment this year, participating in a $1.25Mn (~INR 10 CR) round in AbleCredit, an AI credit underwriting company. The seed round was led by Merak Ventures, with participation from Venture Catalysts and Helios Holdings (Suraj Nalin). There is a credit gap […]

WiMi Hologram Cloud Enhances IoT Security with New Blockchain-Based Data System

WiMi Hologram Cloud has lately come up with a new system to enhance Internet of Thing (IoT) data security. It mainly uses blockchain technology for the purpose to offer decentralized storage, data verification, automated management with smart contracts and secure data exchange. WiMi Hologram Cloud is a global leader in Hologram Augmented Reality (AR) technology. […]

Insurtech Startup CoverSure Secures $4 Million in Pre-Series A Funding Round

The insurance sector is undergoing major transformation with the ever-evolving technology. A new term has been coined to merge the two. It is called insurtech. CoverSure is one of the newest players in the new sector and has lately raised $4 million in a pre-series A funding round that was led by Enam Holdings. The […]

Four Dimensions of Responsible AI and How They Can Address the Global Challenges

AI has moved beyond its narrow roots and penetrated every aspect of human life. The recent convergence of copious data and extraordinary processing power has sparked an AI revolution, catapulting it from computer science laboratories to the forefront of global innovation. As data storage and transit technologies progress, the amount of data handled by AI […]

How Experiential Etc is Shaping the Future with AI

Digital marketing was first introduced by Google in the late 1990s and since then the industry has followed similar practices. However, things are changing now. The rise of Artificial Intelligence (AI) is the catalyst to the major change that is soon to be witnessed. User experiences are set to witness a big transformation and leading […]