The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation
- Length: 200 pages
- Edition: 1
- Language: English
- Publisher: O'Reilly Media
- Publication Date: 2023-03-28
- ISBN-10: 149209871X
- ISBN-13: 9781492098713
- Sales Rank: #288017 (See Top 100 Books)
Combing the web is simple, but how do you search for data at work? It’s difficult and time-consuming, and can sometimes seem impossible. This book introduces a practical solution: the data catalog. Data analysts, data scientists, and data engineers will learn how to create true data discovery in their organizations, making the catalog a key enabler for data-driven innovation and data governance.
Author Ole Olesen-Bagneux explains the benefits of implementing a data catalog. You’ll learn how to organize data for your catalog, search for what you need, and manage data within the catalog. Written from a data management perspective and from a library and information science perspective, this book helps you:
- Learn what a data catalog is and how it can help your organization
- Organize data and its sources into domains and describe them with metadata
- Search data using very simple-to-complex search techniques and learn to browse in domains, data lineage, and graphs
- Manage the data in your company via a data catalog
- Implement a data catalog in a way that exactly matches the strategic priorities of your organization
- Understand what the future has in store for data catalogs
Preface Who Should Read This Book Navigating This Book Conventions Used in This Book O’Reilly Online Learning How to Contact Us Acknowledgments I. Organizing Data So You Can Search for It 1. Introduction to Data Catalogs The Core Functionality of a Data Catalog Create an Overview of the IT Landscape Organize Data Enable Search of Company Data Data Discovery The Data Discovery Team Data Architects Data Engineers Data Discovery Team Setup End-User Roles and Responsibilities Summary 2. Organize Data: Design a Robust Architecture for Search Organizing Domains in the Data Catalog Domain Architecture in a Data Catalog Understanding Domains Processes and Capabilities Data Sources Getting Assets into the Data Catalog Pull Push Organizing Assets in the Domains Asset Metadata Metadata derived from the data source Metadata added in the data catalog Metadata derived from the data source or added in the data catalog Metadata Quality Classification Summary 3. Understand Search: Concepts, Features, and Mechanics Why Do You Search in a Data Catalog? Search Features in a Data Catalog Searching in Data Versus Searching for Data How Do You Search a Data Catalog? Data Catalog Query Language The Search Features in a Data Catalog Explained Simple search Browsing Complex search Searching for Everything? The Mechanics of Search Recall and Precision Zipf’s Law Serendipity Summary 4. Apply Search: From Simple to Advanced Patterns Search Like Librarians—Not Like Data Scientists Search Patterns Basic Simple Search Detailed Simple Search Flexible Simple Search Range Search Block Search Statement Search Browsing Patterns Glossary Browsing Domain Browsing Lineage Browsing Graph Browsing Searching a Graph-Based Data Catalog Summary II. Democratizing Data with a Data Catalog 5. Discover Data: Empower End Users and Engage Stakeholders A Data Catalog Is a Social Network Active Metadata Ensure Stakeholder Engagement Engage Data Governance Leaders Engage Data Analytics Leaders Engage Domain Leaders Seeing All Data Through One Lens The Operational Backbone and the Data Platform Summary 6. Access Data: The Keys to Successful Implementation Choosing a Data Catalog Vendor Analysis Some Key Vendors Catalog of Catalogs How to Access Data Data Providers and Data Consumers Centralized Approach Decentralized Approach Combined Approach Building Domains Questionnaire No. 1: Domain Owner Description of Domain and Assets Questionnaire No. 2: Asset Steward Description of Assets in the Domain Questionnaire No. 3: Asset Steward Description of the Glossary Terms of Their Assets Summary 7. Manage Data: Improve Lifecycle Management The Value of Data Lifecycle Management and Why the Data Catalog Is a Game Changer Various Lifecycles Data Lifecycle Using the Data Catalog for Data Lifecycle Management The Data Asset Lifecycle in the Data Catalog Glossary Term Lifecycle Data Source Lifecycle Lifecycle Influence and Support Applied Search Based on Lifecycles Applied Search for Regulatory Compliance Maintenance Best Practices Maintenance of the Data Outside the Data Catalog Maintenance of Metadata Inside the Data Catalog Improved Data Lifecycle Management Summary III. Envisioning the Future of Data Catalogs 8. Looking Ahead: The Company Search Engine and Improved Data Management The Company Search Engine The Company Search Engine in Hugin & Munin From Data to Knowledge A Medium Theoretical Take on the Company Search Engine Is the Company Search Engine New? Will the Company Search Engine Become Reality? An ever-growing set of connectors The emergence of the data platform Summary Afterword Consider Implementing a Data Catalog Follow Me A. Data Catalog Query Language Index
Donate to keep this site alive
How to download source code?
1. Go to: https://www.oreilly.com/
2. Search the book title: The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation
, sometime you may not get the results, please search the main title
3. Click the book title in the search results
3. Publisher resources
section, click Download Example Code
.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.