Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
- Length: 320 pages
- Edition: 1
- Language: English
- Publisher: BPB Publications
- Publication Date: 2022-03-26
- ISBN-10: 9389898781
- ISBN-13: 9789389898781
- Sales Rank: #1200796 (See Top 100 Books)
Make use of the most advanced machine learning techniques to perform NLP and feature extraction
Key Features
- Learn about pre-trained models, deep learning, and transfer learning for NLP applications.
- All-in-one knowledge guide for feature engineering, NLP models, and pre-processing techniques.
- Includes use cases, enterprise deployments, and a range of Python based demonstrations.
Description
Natural Language Processing (NLP) has proven to be useful in a wide range of applications. Because of this, extracting information from text data sets requires attention to methods, techniques, and approaches.
‘Python Text Mining’ includes a number of application cases, demonstrations, and approaches that will help you deepen your understanding of feature extraction from data sets. You will get an understanding of good information retrieval, a critical step in accomplishing many machine learning tasks. We will learn to classify text into discrete segments solely on the basis of model properties, not on the basis of user-supplied criteria. The book will walk you through many methodologies, such as classification, that will enable you to rapidly construct recommendation engines, subject segmentation, and sentiment analysis applications. Toward the end, we will also look at machine translation and transfer learning.
By the end of this book, you’ll know exactly how to gather web-based text, process it, and then apply it to the development of NLP applications.
What you will learn
- Practice how to process raw data and transform it into a usable format.
- Best techniques to convert text to vectors and then transform into word embeddings.
- Unleash ML and DL techniques to perform sentiment analysis.
- Build modern recommendation engines using classification techniques.
Who this book is for
This book is a good place to start with examples, explanations, and exercises for anyone interested in learning more about advanced text mining and natural language processing techniques. It is suggested but not required that you have some prior programming experience.
Cover Page Title Page Copyright Page Dedication Page About the Author About the Reviewer Acknowledgement Preface Errata Table of Contents 1. Basic Text Processing Techniques Introduction Structure Objectives Data preparation Project 1: Twitter data analysis Scraping the data Data pre-processing Importing necessary packages HTML parsing Removing accented characters Expanding contractions Lemmetization and stemming Fail case Removing special characters Removing stop words Handling emojis or emoticons Emoji removal Text acronym abbreviation Twitter data processing Extracting usertags and hashtags Project 2: In-shots data pre-processing Importing the necessary packages Setting the urls for data extraction Function to scrape data from the urls Importing packages Conclusion Questions Multiple choice questions Answers 2. Text to Numbers Introduction Structure Objectives Feature encoding or engineering One-hot encoding Corpus Code Creating the text corpus Some basic pre-processings Min_df Max_df Limitations Bag of words Code Performing bag-of-words using sklearn Difference between one-hot encoding and bag of words Limitations N-gram model Limitations TF-IDF Code Performing TF-IDF using sklearn Project -1 Solution Loading the dataset Some basic pre-processings One-hot encoding Bag of words Bag of N-grams model Project -2 Loading the dataset Some basic pre-processings TF-IDF Comparison of One-Hot, BOW, and TF-IDF Conclusion Questions Multiple choice questions Answers 3. Word Embeddings Introduction Structure Objective Word vectors or word embeddings Difference between word embeddings and TF-IDF Feature engineering with word embeddings Word2Vec Code t-SNE Word similarity dataframe Global Vector (GloVe) Model The GloVe Model using Spacy Loading the downloaded vector model Word vector dataframe t-SNE visualization Word similarity dataframe fastText fastText using Gensim t-SNE visualization Finding Odd word out using FastText Difference between Word2Vec, GloVe, and FastText Using pre-trained word embeddings Importing necessary libraries Loading the Word2Vec model Sample data initialization Pre-processings and word tokenizations Extracting list of unique words t-SNE visualization Project Solution Importing necessary libraries Loading the Word2Vec model Scrapping data from inshots Pre-processings and word tokenizations Extracting list of unique words Removing words not in vocab t-SNE visualization Conclusion Project 4. Topic Modeling Introduction Structure Objectives Topic modeling Identity a matrix Unitary matrix Eigen values and Eigen vectors Singular value decomposition Latent semantic indexing TF-IDF vectorization Building an SVD model Looking at the topics and the words contributing to the topic Advantages and disadvantages of LSI Latent Dirichlet Allocation Introduction Working About the data Some pre-processing Looking at the top 20 frequently used words Some EDA Generating Bi-grams (BoW) LDA model fitting LDA using Gensim and its visualization Importing the data Some pre-processing Extending stop words and building ngram models Creating term document frequency and the LDA model Dominant topic identification PyLDAvis Disadvantages of LDA Non-Negative Matrix Factorization (NMF) Importing necessary libraries Some pre-processing Looking at the top 20 frequently used words Some EDA Generating Bi-grams (BoW) Building TF-IDF vectorizer Visualizing ranks with the TF-IDF weights NMF modelling Disadvantages of NMF Conclusion Questions Answers Projects 5. Unsupervised Sentiment Classification Introduction Structure Objective Lexicon-based approach About the dataset Loading necessary libraries Importing the dataset Some pre-processings Defining a function to perform the following Opinion lexicon Importing the opinion lexicon Tokenize the reviews into a sentence and form the sentence and review the ID Sentiment classification Converting the sentiments to a review level Converting the sentiment codes from the dataset to sentiments Senti WordNet lexicon Function to perform SentiWordNet Sentiment classification Evaluation TextBlob Importing libraries Predicting a sentiment of sample reviews Prediction and evaluation AFINN Importing necessary libraries Sentiment classification and evaluation VADER Importing necessary libraries Sentiment classification and evaluation Sample prediction Drawbacks of lexicon-based sentiment classification Conclusion Questions Answers 6. Text Classification Using ML Introduction Structure Objectives Supervised learning About the dataset Loading the necessary libraries Importing the dataset Pre-processings Performing TF-IDF Model fitting Logistic regression Lasso regularization Ridge regularization Elastic-net classifier Naïve Bayes algorithm K – Nearest Neighbors Decision tree Random forest Ada Boost Gradient boosting machine XG-Boost Grid Search Conclusion Questions Answers Project 7. Text Classification Using Deep Learning Introduction Structure Objectives Learning about the Neural Networks Neural networks for sentiment classification Neural networks with TF-IDF Installing libraries Importing libraries Importing the dataset Pre-processings Train, test, and validation set Performing TF-IDF Model building Linear regression Increasing the dimensionality Activation functions Model fitting Cross – validation Neural networks with word2vec: Data splitting Creating a Word2Vec model Word2Vec model fitting Creating word vectors Padding sequences ANN model building Model fitting Cross-validation Sentiment analysis using LSTM Importing the dataset Pre-processings Data splitting and padding LSTM model building Cross-validation Comparison of results Conclusion Questions Answers 8. Recommendation Engine Introduction Structure Objective Applications Classification of a recommendation system Simple rule-based recommenders About the dataset Installing and loading necessary libraries Importing the dataset Building a simple rule-based recommendation system Weighted ratings calculation Applying the calculation on the filtered records Content based Using document similarity About the dataset Installing and loading necessary libraries Importing the dataset Some pre-processing Extract TF-IDF features Computing pairwise document similarity Building a movie recommender Using word embedding FastText Generate document-level embeddings Collaborative-based User-based About the dataset Installing and loading necessary libraries Importing the dataset Advantages of a recommendation system Conclusion Questions Answers 9. Machine Translation Introduction Structure Objectives Application Types of MT Readily available libraries TextBlob LangDetect Fasttext Sequence-to-sequence modeling About the dataset Installing and loading necessary libraries: Importing the dataset Preprocessing Model building (using LSTM) Conclusion Exercise Questions Answers 10. Transfer Learning Introduction Structure Objectives Universal Sentence Encoder Goal What is a transformer and do we need it? Deep Averaging Network (DAN) About the data Data pre-processing Bidirectional Encoder Representation from Transformer (BERT) What is the necessity of BERT? The main idea behind BERT Why is BERT so powerful? BERT architecture Text processing Pre-training tasks Fine tuning Drawbacks Conclusion Multiple choice questions Answers Project Index
Donate to keep this site alive
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.