What is a Data Pipeline?
A data pipeline is a sequence of steps that collect, process, and move data between sources for storage, analytics, machine learning, or other uses. For example, data pipelines are often used to send data from applications to storage devices like data warehouses or data lakes. Data pipelines are also frequently used to pull data from storage and transform it into a format that is useful for analytics.
Data pipelines generally consist of three overall steps: extraction, transformation, and loading. These steps can be executed in different orders, such as ETL or ELT. In either case, pipelines are used to extra and transform data to achieve business goals. The series of transformations required to execute an effective pipeline can be very complex. For that reason, specialized software is often used to aid and automate the data pipelining process.
Next Term
Data OnboardingRelated Resources
Customer Story
Siemens Runs Through 50M Data Rows in Minutes
- Data Prep and Analytics
- Business Leader
- Professional
Customer Story
Global Tax Management Reduces Manual Tax Compliance Processes By 50% With Alteryx
- Data Prep and Analytics
- Business Leader
- Professional