Overview

What is Kubeflow Pipelines?

Kubeflow Pipelines (KFP) is a platform for building and deploying portable and scalable machine learning (ML) workflows using containers on Kubernetes-based systems. With KFP you can author components and pipelines using the KFP Python SDK, compile pipelines to an intermediate representation YAML, and submit the pipeline to run on a KFP-conformant backend such as the open source KFP backend, Google Cloud Vertex AI Pipelines, or KFP local.

The open source KFP backend is available as a core component of Kubeflow or as a standalone installation.

Why Kubeflow Pipelines?

KFP enables data scientists and machine learning engineers to:

  • Author end-to-end ML workflows natively in Python

  • Create fully custom ML components or leverage an ecosystem of existing components

  • Easily pass parameters and ML artifacts between pipeline components

  • Easily manage, track, and visualize pipeline definitions, runs, experiments, and ML artifacts

  • Efficiently use compute resources through parallel task execution and through caching to eliminate redundant executions

  • Keep experimentation and iteration light and Python-centric, minimizing the need to (re)build and maintain containers

  • Maintain cross-platform pipeline portability through a platform-neutral IR YAML pipeline definition

  • Abstract Kubernetes complexity while running pipelines on your organization’s existing infrastructure investments (on-prem, cloud, or hybrid)

What is a pipeline?

A pipeline is a definition of a workflow that composes one or more components together to form a computational directed acyclic graph (DAG). At runtime, each component execution corresponds to a single container execution, which may create ML artifacts. Pipelines may also feature control flow.

What is a component?

Components are the building blocks of KFP pipelines. A component is a remote function definition; it specifies inputs, has user-defined logic in its body, and can create outputs. When the component template is instantiated with input parameters, we call it a task.

KFP provides two high-level ways to author components: Python Components and Container Components.

Python Components are a convenient way to author components implemented in pure Python. There are two specific types of Python components: Lightweight Python Components and Containerized Python Components.

Container Components expose a more flexible, advanced authoring approach by allowing you to define a component using an arbitrary container definition. This is the recommended approach for components that are not implemented in pure Python.

Importer Components are a special “pre-baked” component provided by KFP which allows you to import an artifact into your pipeline when that artifact was not created by tasks within the pipeline.

What is a compiled pipeline?

A compiled pipeline, often referred to as an IR YAML, is an intermediate representation (IR) of a compiled pipeline or component. The IR YAML is not intended to be written directly.

While IR YAML is not intended to be easily human-readable, you can still inspect it if you know a bit about its contents: