Data Engineering with Alteryx: Helping data engineers apply DataOps practices with Alteryx
- Length: 366 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2022-06-30
- ISBN-10: 1803236485
- ISBN-13: 9781803236483
- Sales Rank: #356655 (See Top 100 Books)
Build and deploy data pipelines with Alteryx by applying practical DataOps principles
Key Features
- Learn DataOps principles to build data pipelines with Alteryx
- Build robust data pipelines with Alteryx Designer
- Use Alteryx Server and Alteryx Connect to share and deploy your data pipelines
Book Description
Alteryx is a GUI-based development platform for data analytic applications.
Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have.
This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process.
By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
What you will learn
- Build a working pipeline to integrate an external data source
- Develop monitoring processes for the pipeline example
- Understand and apply DataOps principles to an Alteryx data pipeline
- Gain skills for data engineering with the Alteryx software stack
- Work with spatial analytics and machine learning techniques in an Alteryx workflow Explore Alteryx workflow deployment strategies using metadata validation and continuous integration
- Organize content on Alteryx Server and secure user access
Who this book is for
If you’re a data engineer, data scientist, or data analyst who wants to set up a reliable process for developing data pipelines using Alteryx, this book is for you. You’ll also find this book useful if you are trying to make the development and deployment of datasets more robust by following the DataOps principles. Familiarity with Alteryx products will be helpful but is not necessary.
Data Engineering with Alteryx Contributors About the author About the reviewer Preface Who this book is for What this book covers To get the most out of this book Download the example workflow files Download the color images Conventions used Get in touch Share Your Thoughts Part 1: Introduction Chapter 1: Getting Started with Alteryx Understanding the Alteryx platform The software that makes the Alteryx platform Using the Alteryx platform in a business scenario How Alteryx benefits data engineers Using Alteryx Designer Why is Alteryx Designer suitable for data engineering? Building a workflow in Designer What can the InDB tools do? Best practices for Designer workflows Leveraging Alteryx Server and Alteryx Connect How can you use Alteryx Server to orchestrate a data pipeline? How does Connect help with discoverability? Using this book in your data engineering work How does the Alteryx platform come together for data engineering? Examples where Alteryx is used for data engineering Summary Chapter 2: Data Engineering with Alteryx What is a data engineer? Using Alteryx products as a data engineer Creating with Designer for data engineers Automating with Server for data engineers Connecting end users with Alteryx Connect Applying DataOps as an Alteryx data engineer Summary Chapter 3: DataOps and Its Benefits The benefits the DataOps framework brings to your organization Faster cycle times Faster access to actionable insights Improved robustness of data processes Provides an overview of the entire data flow Strong security and conformance Understanding DataOps principles The People pillar The Delivery pillar The Confidence pillar Applying DataOps to Alteryx Supporting the People pillar with Alteryx Using Alteryx to deliver data pipelines Building confidence with Alteryx Using Alteryx software with DataOps Alteryx Designer Alteryx Server Alteryx Connect General steps for deploying DataOps in your environment Summary Part 2: Functional Steps in DataOps Chapter 4: Sourcing the Data Technical requirements Accessing internal data sources Data source types Using the Alteryx Input Data tool Integrating public data sources with Download tool use Identifying whether the data structure has changed Creating our first validation test Leveraging external data sources from authenticated APIs Connecting to an API with URL parameters Connecting to an API in call headers Initial cleansing of datasets A simple cleansing process A consolidated cleansing process Constructing a data pipeline in Alteryx Designer Configuring the annotations and names Adding initial documentation in the workflow Calculating the first set of statistical values Saving the processed dataset Summary Chapter 5: Data Processing and Transformations Technical requirements The data cleansing process Selecting columns Filtering to relevant rows Generating features and modifying columns with formulas Summarizing the dataset Profiling data with summary and statistical aggregations Investigating the variation and size range of your dataset Investigating distributions in your dataset Correcting missing values in your dataset Transforming our data pipeline Transforming the downloaded data Profiling our dataset Summary Chapter 6: Destination Management Technical requirements Writing to destinations Writing to files Managing database connections Using standard connections How to load data faster with bulk loaders Leveraging a database's custom tools Accessing more data sources with custom connections Integrating data pipelines across environments Using a secrets file or environment variable Creating a DSN in the Windows ODBC manager Getting the most from a connection Publishing the external data to a Snowflake destination Installing the drivers Creating the Snowflake ODBC DSN Summary Chapter 7: Extracting Value Technical requirements Exploratory data analysis in Alteryx and surfacing the datasets for BI tools Identifying missing values and summarizing fields Understanding your value distribution Finding relationships between fields Identifying any outliers in the dataset Understanding the difference between the Interactive Chart tool and the Insights tool Making our datasets available for other BI tools Using Alteryx to deliver standard reports Creating a formatted table Adding visualizations to the report Adding styling to the report Outputting the report for consumption Summary Chapter 8: Beginning Advanced Analytics Technical requirements Implementing spatial analytics with Alteryx Creating a spatial point Geocoding addresses to make spatial points Generating trade areas for analysis Combining data streams with spatial information Summarizing spatial information Beginning the ML process in Alteryx Using the Intelligence Suite Building workflows with R-based predictive tools Creating a custom Python or R script in a workflow Summary Part 3: Governance of DataOps Chapter 9: Testing Workflows and Outputs Technical requirements Workflow tests and messages Monitoring workflows with the Message tool Monitoring workflows with the Test tool Using the Community CREW test macros Validating data outputs Automating the result monitoring actions Running tests on the output dataset Confirming the country of our place search Centralizing the monitoring outputs with Insights Building a control chart monitoring system Using insights on your Alteryx server Summary Chapter 10: Monitoring DataOps and Managing Changes Technical requirements Using the Alteryx Server monitoring workflow Accessing and installing the server monitoring workflow Reading the PDF report Using the data output Creating an insight dashboard for workflow monitoring Creating a monitoring dashboard Exporting the data output for external reports Exporting the MongoDB database for custom analysis The MongoDB schema Modifying the Server Monitoring workflow Using Git and GitHub Actions for continuous integration Saving workflow changes with Git Verifying the XML workflow Applying standards with GitHub Actions Summary Chapter 11: Securing and Managing Access Technical requirements Organizing content on Alteryx Server My Workspace Collections Districts Managing collections Creating collections Securing the data environment Alteryx Server architecture Summary Chapter 12: Making Data Easy to Use and Discoverable with Alteryx Technical requirements What is Alteryx Connect, and how does it help DataOps? What is Connect? Areas of the Connect interface Using Connect for DataOps Publishing the data lineage to Alteryx Connect Loading metadata directly from Connect Using the prebuilt workflow apps Creating a custom-built data source with the Connect APIs Data nexus Syncing the Connect data dictionary with other data catalogs Using the Connect API methods Tableau Data Dictionary API example Summary Chapter 13: Conclusion The Alteryx data engineer The functional steps in DataOps Sourcing the data Data processing and transformations Destination management Extracting value from data Beginning advanced analytics Governance of DataOps with Alteryx Testing workflows and outputs Monitoring DataOps and managing changes Securing and managing access Making data easy to use and discoverable with Alteryx Our Alteryx data pipeline Final summary Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts
Donate to keep this site alive
How to download source code?
1. Go to: https://github.com/PacktPublishing
2. In the Find a repository… box, search the book title: Data Engineering with Alteryx: Helping data engineers apply DataOps practices with Alteryx
, sometime you may not get the results, please search the main title.
3. Click the book title in the search results.
3. Click Code to download.
1. Disable the AdBlock plugin. Otherwise, you may not get any links.
2. Solve the CAPTCHA.
3. Click download link.
4. Lead to download server to download.