Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products

by Mathangi Sri Ramachandran

Length: 254 pages
Edition: 1
Language: English
Publisher: BPB Publications
Publication Date: 2022-12-03
ISBN-10: 9355511582
ISBN-13: 9789355511584
Sales Rank: #0 (See Top 100 Books)

0 ratings

Print Book Look Inside

Unlock the Potential of Data Science and Machine Learning to Your Business and Organization

Key Features

Includes today’s most popular applications powered by data science and machine learning technology.
A solid primer on the entire data science lifecycle, detailed with examples.
An integrated approach to demonstrating the use of Image Processing, Natural Language Processing, and Neural Networks in business.

Description

Can you foresee how your company and its products will benefit from data science? How can the results of using AI and ML in business be tracked and questioned? Do questions like ‘how do you build a data science team?’ keep popping into your head?

All these strategic concerns and challenges are addressed in this book.

Firstly, the book explores the evolution of decision-making based on empirical evidence. The book then helps compare the data-supported era with the current data-led era. It also discusses how to successfully run a data science project, the lifecycle of a data science project, and what it looks like. The book dives fairly in-depth into various today’s data-led applications, highlights example datasets, discusses obstacles, and explains machine learning models and algorithms intuitively.

This book covers structural and organizational considerations for making a data science team. The book helps recommend the use of optimal data science organization structure based on the company’s level of development. Finally, the book explains data science’s effects on businesses by assisting technological leaders.

What you will learn

Learn the entire data science lifecycle and become fluent in each phase.
Discover the world of supervised and unsupervised learning applications and structured and unstructured datasets.
Discuss NLP’s function, its potential, and the application of well-known methods like BERT and GPT3.
Explain practical applications like automatic captioning, machine translation, and emotion recognition.
Provide a framework for evaluating your team’s data science skills and resources.

Who this book is for

Startups, investors, small businesses, product management teams, CxO and all developing businesses desiring to leverage a data science team to gain the most from this book. The book also discusses the potential of practical applications of machine learning and AI for the future of businesses in banking and e-commerce.

Cover Page
Title Page
Copyright Page
Dedication Page
About the Author
About the Reviewer
Acknowledgement
Preface
Errata
Table of Contents
1. Data-Driven Decisions from Beginning to Now
    Introduction
    Data-driven decisions and their phases
        Human-led and data-supported decisions
        Data-led and human-guided
    Applications of data science
        Initiation
        Acquisition
        Maintenance and expansion
        Retention
        Exit
        Regain
    Challenges that need to be solved
    Conclusion
2. Data Science Life Cycle —Part 1
    Introduction
    What is considered a data science project?
        Explanatory
        Predictive
        The key players
    The stages of a data science project
        Phase 1—The business problem phase
        Phase 2—The math problem phase
        Ownership of the overall solution
        Problem understanding stage
            Key stakeholders and accountability
        Consulting stage
            Typical outputs of this stage
            Key stakeholders and accountability
        Solution blueprint
            Power ABC
            Key stakeholders and accountability
            Typical outputs of this stage
        Finalizing success criteria
            Typical outputs of this stage
            Key stakeholders and accountability
            Finalizing reporting requirements
            Key stakeholders and accountability
            Implementation decisions
            Typical outputs of this stage
            Key stakeholders and accountability
        Roll-out roadmap
            Typical outputs of this stage
            Key stakeholders and accountability
    Conclusion
3. Data Science Life Cycle —Part 2
    Introduction
    Data understanding
        Key stakeholders and accountability
        Typical outputs of this stage
    Data validation
        Accuracy
        Availability
        Completeness
        Reliability
        Key stakeholders and accountability
        Typical outputs of this stage
    Algorithmic solution
        Training and validation
        Data processing
        Missing value treatment
        Outlier treatment
        Variable or feature transformations
        Feature selection
        Algorithm development
        Validation
        Estimating business impact
        Key stakeholders and accountability
        Typical Outputs of this Stage
    Data and model governance
        Technical robustness
        Data robustness
        Compliance criteria
        Experimentation methodology
        Model governance teams
        Key stakeholders and accountability
        Typical outputs of this stage
    Model deployment and go-live
        Role of ML engineers
        Latency
            Throughput
            Memory
            CPU
            Network latencies
            Latencies across different steps
        Alerting and monitoring mechanisms
        Key stakeholders and accountability
        Typical outputs of this stage
    Measurement of production results
        Key stakeholders and accountability
        Typical outputs of this stage
    Optimization
        Optimizing algorithms
        Key stakeholders and accountability
        Typical outputs of this stage
    Roll-out
        Key stakeholders and accountability
        Typical outputs of this stage
    Conclusion
4. Deep Dive into AI
    Introduction
    Difference between AI, data science, and machine learning
        Ux& Interfaces
        Experimenter
        Data collectors/storage
        Decisioning system
        Intervention system
        Machine learning
            Supervised machine learning
            Unsupervised learning
            Reinforcement learning
        Data classification
    Conclusion
    References
5. Applying AI with Structured Data—Banking
    Introduction
    Structure
    Banking
    Credit scoring
    Credit bureaus
        Data in the credit bureaus
        Using the credit score from the bureau by the financial institutions
    Fraud detection systems
        Account takeover frauds
        Anomaly detection and the need for machine learning
        Example of anomaly dataset
        Anomaly detection with unsupervised
        Anomalies using a supervised method
        Offline validations
        Interventions
    Anti-Money Laundering (AML)
        AML with heuristics
        Problem with heuristics
        AML dataset
        Machine Learning for AML
    Conclusion
    References
6. Applying AI with Structured Data—Ecommerce
    Introduction
    Personalization
        Types of personalization
            Personalization types based on presentation
            Personalization types based on components
            Deep dive on content personalization
    Build an ML-based ranker
        Demand forecasting
    Conclusion
    References
7. Applying AI with Structured Data—On-Demand Deliveries
    Introduction
    On-demand hyperlocal deliveries
    AI use-cases
        Predicting ETA
            Predicting time components (t1)
            Predicting time component (t2)
            Predicting time component (t3)
            The metrics
        Surge pricing
            Supply-demand curves
            Solving for dynamic surge
            ML surge pricing
    Conclusion
8. AI in Natural Language Processing
    Introduction
    NLP overview
    Popular NLP use cases
        Searches
        Machine translations
        Reviews
        Social media listening
        Intelligent agents
        Content recommendations
        Automated insights
    NLP applications by verticals
        NLP in law firms
        NLP in e-commerce
        NLP in journalism
        NLP in customer service
    Algorithms and linguistics
        ML algorithms for NLP
        Supervised algorithms
            Term—document matrix with a bag of words
            Pre-trained embeddings
        Unsupervised algorithms
            Rule-based
            Distance-based
            Topic modeling
    Common NLP techniques
        Opinion mining
        Named Entity Recognition (NER)
        Information retrieval (finding the needle in the haystack)
            Crawling
            Indexing
            Retrieval
            Ranking
        Text summarization
        Intent mining
            Intent Mining in speech
        Dialog systems
            General purpose dialog systems
            Goal-oriented dialog systems
            NLP components
            State of the art
        A word of caution on sequence-sequence models
    Current challenges in NLP
        General context or “Knowledge of the world”
        Ambiguity in language
            Lexical ambiguity
            Syntax level ambiguity
            Referential ambiguity
            Sarcasm
        NLP for non-English
    Conclusion
    References
9. Bringing It All Together
    Introduction
    Where should you start the AI journey for your organization?
        Complexity
        The scale of data
        Actionability
    The team
        Data engineering
        Business intelligence (BI)
        Analytics
        Data science (DS)
        ML engineering
        The skillmap
    Building the data organization
        When AI is the backbone
            Special case of analytics services organization
    When AI is not the backbone
    The org structure
        Central data teams
        The pod approach
        The hybrid approach
    Common pitfalls
        Applying AI to everything
        Believing in the black box
        Tweaking it too much
        Delivery versus research
    Hesitation to experiment
    The success of data science solutions
    Conclusion
Index