
Web Data Mining with Python: Discover and extract information from the web using Python
- Length: 304 pages
- Edition: 1
- Language: English
- Publisher: BPB Publications
- Publication Date: 2023-01-31
- ISBN-10: 9355513631
- ISBN-13: 9789355513632
- Sales Rank: #0 (See Top 100 Books)
Explore different web mining techniques to discover patterns, structures, and information from the web
Key Features
- A complete overview of the basic and advanced concepts of Web mining.
- Work with easy-to-use open-source Python libraries for Web mining.
- Get familiar with the various beneficial areas and applications of Web mining.
Description
Data Science is the fastest growing job across the globe and is predicted to create 11.5 million jobs by 2026, so job seekers with this skill set have a lot of opportunities. One of the most sought areas in the field of Data Science is mining information from the web. If you are an aspiring Data Scientist looking to learn different Web mining techniques, then this book is for you.
This book starts by covering the key concepts of Web mining and its taxonomy. It then explores the basics of Web scraping, its uses and components followed by topics like legal aspects related to scraping, data extraction and pre-processing, scraping dynamic websites, and CAPTCHA. The book also introduces you to the concept of Opinion mining and Web structure mining. Furthermore, it covers Web graph mining, Web information extraction, Web search and hyperlinks, Hyperlink Induced Topic Search (HITS) search, and partitioning algorithms that are used for Web mining. Towards the end, the book will teach you different mining techniques to discover interesting usage patterns from Web data.
By the end of the book, you will master the art of data extraction using Python.
What you will learn
- Learn how to scrape data from any website with Python.
- Get familiar with the concepts of Opinion Mining and Sentiment Analysis.
- Use Web structure mining to discover structure information from the web.
- Learn how to collect and analyze social media data using Python.
- Use Web usage mining for predicting users’ browsing behaviors.
Who this book is for
The book is for anyone who wants to learn Web mining. Aspiring Data Scientists, Data Engineers, and Data Analysts who want to master Web mining will find this book very helpful.
Cover Page Title Page Copyright Page Dedication Page About the Authors About the Reviewer Acknowledgements Preface Errata Table of Contents 1. Web Mining—An Introduction Introduction Structure Objectives Introduction to Web mining World Wide Web Evolution of the World Wide Web Internet and Web 2.0 An overview of data mining, modeling, and analysis Basics of Web mining Categories of Web mining Difference between data mining and Web mining Applications of Web mining Web mining and Python Essential Python libraries for Web mining How Python is helpful in Web mining? Conclusion Points to Remember Multiple Choice Questions Answer Questions Key terms 2. Web Mining Taxonomy Introduction Structure Objective Introduction to Web mining Web content mining Basic application areas of Web content mining Contents of a web page Content pre-processing Web content analysis Web structure mining Web usage mining Key concepts Ranking metrics Page rank Hubs and Authorities Web Robots Information Scent User Profile Online bibliometrics Types of Bibliometric measures Conclusion Points to remember Multiple Choice Questions Answers Questions Key terms 3. Prominent Applications with Web Mining Introduction Structure Objectives Personalized customer applications—E-commerce Web search Most common methods of website tracking IP tracking Cookies Fingerprinting Tracking pixels Personalized portal and Web Web service performance optimization Bounce rate Average time on page Unique visitors Process mining Concepts of association rules Association rule mining Components of Apriori algorithm Support and frequent itemsets Confidence Lift Steps in apriori algorithm Concepts of sequential pattern Sequence database Subsequence versus supersequence Minimum support Prefix and suffix Projection Association rule mining and python libraries Pandas Mlxtend Conclusion Points to remember Multiple Choice Questions Answer Questions Key terms 4. Python Fundamentals Introduction Structure Objectives Introduction to Python Basics of Python Python programming Writing “Hello World”, the first Python script Conditional/selection statements Looping/iterative constructs While loop For Loop Functions Lists Basics of HTML: inspecting a Web page Basics of Python libraries Installation of Python Unix and Linux platform Windows Platform Macintosh Introduction to commonly used IDE’s and PDE Integrated development learning environment (IDLE) Atom Sublime text PyDev Spyder (the scientific Python development environment) PyCharm Google Colab Installation of Anaconda Conclusion Points to remember Multiple choice questions Answers 5. Web Scraping Introduction Structure Objectives Introduction to Web scraping Web scraping Uses of Web scraping Working of Web scraper Challenges Of Web Scraping Python modules used for scraping Legality of Web scraping Robots.txt Public content Terms of use Crawl delay Authentication rules Data extraction and preprocessing Handling text, image, and videos Handling text Handling images Extracting videos from a Web page Scraping dynamic websites Dealing with CAPTCHA Case study: Implementing Web scraping to develop a scraper for finding the latest news Conclusion Points to remember Multiple choice questions Answers Questions Key terms 6. Web Opinion Mining Introduction Structure Objectives Concepts of opinion mining NLTK for sentiment analysis Opinion Mining/Sentiment Analysis at different levels Word level Document level Sentence level Feature-based Collection of review Data sources for opinion mining Blogs Review sites Forums Social networking sites Working with data Pre-processing of data Tokenization Sentence tokenization Word tokenization Part of Speech tagging Feature extraction Bag-of-Words TF-IDF Case study for Sentiment Analysis Conclusion Points to remember Multiple choice questions Answers Questions Key terms 7. Web Structure Mining Introduction Structure Objectives Introduction to Web structure mining Concepts of Web structure mining Web structure mining Web graph mining Web information extraction Deep Web mining Web Search and Hyperlinks Hyperlink analysis on the Web Hyperlink Induced Topic Search (HITS) Partitioning algorithm Implementation in Python Conclusion Points to remember MCQs Answers Questions Key terms 8. Social Network Analysis in Python Introduction Structure Objectives Introduction to Social Network Analysis Creating a network Types of graphs Symmetric/undirected networks Asymmetric/directed networks Signed networks Weighted networks Multigraphs Analyzing network Distance measures in network connectivity Distance Average distance Eccentricity Diameter Radius Periphery Center Network influencers Case study on Facebook dataset Conclusion Points to remember Multiple choice questions Answers Questions Key terms 9. Web Usage Mining Introduction Structure Objectives Process of Web usage mining Sources of data Types of data Usage data Content data Structure data User data Key elements of Web usage data pre-processing Data cleaning User identification Session identification Path identification Data modeling Association rule mining Sequential pattern Clustering Classification mining Discovery and analysis of pattern Association rule for knowledge discovery Pattern discovery through clustering Sequential pattern mining for knowledge discovery Learning through classification Pattern analysis Predictions on transaction pattern Building a content-based recommendation system Item profile User profile Conclusion Points to remember Multiple choice questions Answers Questions Key terms Index
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.