kfp.components package

class kfp.components.ComponentStore(local_search_paths=None, url_search_prefixes=None, auth=None, uri_search_template=None)[source]

Bases: object

Component store.

Enables external components to be loaded by name and digest/tag.

local_search_paths

A list of local directories to include in the search.

url_seach_prefixes

A list of URL prefixes to include in the search.

uri_search_template

A URI template for components, which may include {name}, {digest} and {tag} variables.

default_store = <kfp.components._component_store.ComponentStore object>
list()[source]
load_component(name, digest=None, tag=None)[source]

Loads component local file or URL and creates a task factory function.

Search locations:

  • <local-search-path>/<name>/component.yaml
  • <url-search-prefix>/<name>/component.yaml

If the digest is specified, then the search locations are:

  • <local-search-path>/<name>/versions/sha256/<digest>
  • <url-search-prefix>/<name>/versions/sha256/<digest>

If the tag is specified, then the search locations are:

  • <local-search-path>/<name>/versions/tags/<digest>
  • <url-search-prefix>/<name>/versions/tags/<digest>
Parameters:
  • name – Component name used to search and load the component artifact containing the component definition. Component name usually has the following form: group/subgroup/component
  • digest – Strict component version. SHA256 hash digest of the component artifact file. Can be used to load a specific component version so that the pipeline is reproducible.
  • tag – Version tag. Can be used to load component version from a specific branch. The version of the component referenced by a tag can change in future.
Returns:

A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).

load_component_from_file(path)[source]

Loads a component from a path.

Parameters:path – The path of the component specification.
Returns:A factory function with a strongly-typed signature.
load_component_from_url(url)[source]

Loads a component from a URL.

Parameters:url – The url of the component specification.
Returns:A factory function with a strongly-typed signature.
search(name: str)[source]

Searches for components by name in the configured component store.

Prints the component name and URL for components that match the given name. Only components on GitHub are currently supported.

Example:

kfp.components.ComponentStore.default_store.search('xgboost')

# Returns results:
#     Xgboost train   https://raw.githubusercontent.com/.../components/XGBoost/Train/component.yaml
#     Xgboost predict https://raw.githubusercontent.com/.../components/XGBoost/Predict/component.yaml
class kfp.components.InputArtifact(type: Optional[str] = None)[source]

Bases: object

InputArtifact function parameter annotation.

When creating a component from a Python function, indicates to the system that function parameter with this annotation should be passed as a RuntimeArtifact.

class kfp.components.InputBinaryFile(type=None)[source]

Bases: object

When creating component from function, InputBinaryFile should be used as function parameter annotation to tell the system to pass the.

binary data stream object (io.BytesIO) to the function instead of passing the actual data.

class kfp.components.InputPath(type=None)[source]

Bases: object

When creating component from function, InputPath should be used as function parameter annotation to tell the system to pass the data file path to the function instead of passing the actual data.

class kfp.components.InputTextFile(type=None)[source]

Bases: object

When creating component from function, InputTextFile should be used as function parameter annotation to tell the system to pass the text data stream object (io.TextIOWrapper) to the function instead of passing the actual data.

class kfp.components.OutputArtifact(type: Optional[str] = None)[source]

Bases: object

OutputArtifact function parameter annotation.

When creating component from function. OutputArtifact indicates that the associated input parameter should be treated as an MLMD artifact, whose underlying content, together with metadata will be updated by this component

class kfp.components.OutputBinaryFile(type=None)[source]

Bases: object

When creating component from function, OutputBinaryFile should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a given binary file stream (io.BytesIO) instead of returning the data from the function.

class kfp.components.OutputPath(type=None)[source]

Bases: object

When creating component from function, OutputPath should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a file with the given path instead of returning the data from the function.

class kfp.components.OutputTextFile(type=None)[source]

Bases: object

When creating component from function, OutputTextFile should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a given text file stream (io.TextIOWrapper) instead of returning the data from the function.

kfp.components.create_component_from_airflow_op(op_class: type, base_image: str = 'apache/airflow:master-python3.6-ci', variable_output_names: List[str] = None, xcom_output_names: List[str] = None, modules_to_capture: List[str] = None)[source]

Creates component function from an Airflow operator class. The inputs of the component are the same as the operator constructor parameters. By default the component has the following outputs: “Result”, “Variables” and “XComs”. “Variables” and “XComs” are serialized JSON maps of all variables and xcoms produced by the operator during the execution. Use the variable_output_names and xcom_output_names parameters to output individual variables/xcoms as separate outputs.

Parameters:
  • op_class – Reference to the Airflow operator class (e.g. EmailOperator or BashOperator) to convert to componenent.
  • base_image – Optional. The container image to use for the component. Default is apache/airflow. The container image must have the same python version as the environment used to run create_component_from_airflow_op. The image should have python 3.5+ with airflow package installed.
  • variable_output_names – Optional. A list of Airflow “variables” produced by the operator that should be returned as separate outputs.
  • xcom_output_names – Optional. A list of Airflow “XComs” produced by the operator that should be returned as separate outputs.
  • modules_to_capture – Optional. A list of names of additional modules that the operator depends on. By default only the module containing the operator class is captured. If the operator class uses the code from another module, the name of that module can be specified in this list.
kfp.components.create_component_from_func(func: Callable, output_component_file: Optional[str] = None, base_image: Optional[str] = None, packages_to_install: List[str] = None, annotations: Optional[Mapping[str, str]] = None)[source]

Converts a Python function to a component and returns a task factory (a function that accepts arguments and returns a task object).

Parameters:
  • func – The python function to convert
  • base_image – Optional. Specify a custom Docker container image to use in the component. For lightweight components, the image needs to have python 3.5+. Default is the python image corresponding to the current python environment.
  • output_component_file – Optional. Write a component definition to a local file. The produced component file can be loaded back by calling load_component_from_file or load_component_from_uri.
  • packages_to_install – Optional. List of [versioned] python packages to pip install before executing the user function.
  • annotations – Optional. Allows adding arbitrary key-value data to the component specification.
Returns:

A factory function with a strongly-typed signature taken from the python function. Once called with the required arguments, the factory constructs a task instance that can run the original function in a container.

Examples

The function name and docstring are used as component name and description. Argument and return annotations are used as component input/output types:

def add(a: float, b: float) -> float:
    """Returns sum of two arguments"""
    return a + b

# add_op is a task factory function that creates a task object when given arguments
add_op = create_component_from_func(
    func=add,
    base_image='python:3.7', # Optional
    output_component_file='add.component.yaml', # Optional
    packages_to_install=['pandas==0.24'], # Optional
)

# The component spec can be accessed through the .component_spec attribute:
add_op.component_spec.save('add.component.yaml')

# The component function can be called with arguments to create a task:
add_task = add_op(1, 3)

# The resulting task has output references, corresponding to the component outputs.
# When the function only has a single anonymous return value, the output name is "Output":
sum_output_ref = add_task.outputs['Output']

# These task output references can be passed to other component functions, constructing a computation graph:
task2 = add_op(sum_output_ref, 5)

create_component_from_func function can also be used as decorator:

@create_component_from_func
def add_op(a: float, b: float) -> float:
    """Returns sum of two arguments"""
    return a + b

To declare a function with multiple return values, use the NamedTuple return annotation syntax:

from typing import NamedTuple

def add_multiply_two_numbers(a: float, b: float) -> NamedTuple('Outputs', [('sum', float), ('product', float)]):
    """Returns sum and product of two arguments"""
    return (a + b, a * b)

add_multiply_op = create_component_from_func(add_multiply_two_numbers)

# The component function can be called with arguments to create a task:
add_multiply_task = add_multiply_op(1, 3)

# The resulting task has output references, corresponding to the component outputs:
sum_output_ref = add_multiply_task.outputs['sum']

# These task output references can be passed to other component functions, constructing a computation graph:
task2 = add_multiply_op(sum_output_ref, 5)

Bigger data should be read from files and written to files. Use the kfp.components.InputPath parameter annotation to tell the system that the function wants to consume the corresponding input data as a file. The system will download the data, write it to a local file and then pass the path of that file to the function. Use the kfp.components.OutputPath parameter annotation to tell the system that the function wants to produce the corresponding output data as a file. The system will prepare and pass the path of a file where the function should write the output data. After the function exits, the system will upload the data to the storage system so that it can be passed to downstream components.

You can specify the type of the consumed/produced data by specifying the type argument to kfp.components.InputPath and kfp.components.OutputPath. The type can be a python type or an arbitrary type name string. OutputPath('CatBoostModel') means that the function states that the data it has written to a file has type CatBoostModel. InputPath('CatBoostModel') means that the function states that it expect the data it reads from a file to have type ‘CatBoostModel’. When the pipeline author connects inputs to outputs the system checks whether the types match. Every kind of data can be consumed as a file input. Conversely, bigger data should not be consumed by value as all value inputs pass through the command line.

Example of a component function declaring file input and output:

def catboost_train_classifier(
    training_data_path: InputPath('CSV'),            # Path to input data file of type "CSV"
    trained_model_path: OutputPath('CatBoostModel'), # Path to output data file of type "CatBoostModel"
    number_of_trees: int = 100,                      # Small output of type "Integer"
) -> NamedTuple('Outputs', [
    ('Accuracy', float),  # Small output of type "Float"
    ('Precision', float), # Small output of type "Float"
    ('JobUri', 'URI'),    # Small output of type "URI"
]):
    """Trains CatBoost classification model"""
    ...

    return (accuracy, precision, recall)
kfp.components.create_component_from_func_v2(func: Callable, base_image: Optional[str] = None, packages_to_install: List[str] = None, output_component_file: Optional[str] = None, install_kfp_package: bool = True, kfp_package_path: Optional[str] = None)[source]

Converts a Python function to a v2 lightweight component.

A lightweight component is a self-contained Python function that includes all necessary imports and dependencies.

Parameters:
  • func – The python function to create a component from. The function should have type annotations for all its arguments, indicating how it is intended to be used (e.g. as an input/output Artifact object, a plain parameter, or a path to a file).
  • base_image – The image to use when executing |func|. It should contain a default Python interpreter that is compatible with KFP.
  • packages_to_install – A list of optional packages to install before executing |func|.
  • install_kfp_package – Specifies if we should add a KFP Python package to |packages_to_install|. Lightweight Python functions always require an installation of KFP in |base_image| to work. If you specify a |base_image| that already contains KFP, you can set this to False.
  • kfp_package_path – Specifies the location from which to install KFP. By default, this will try to install from PyPi using the same version as that used when this component was created. KFP developers can choose to override this to point to a Github pull request or other pip-compatible location when testing changes to lightweight Python functions.
Returns:

A component task factory that can be used in pipeline definitions.

kfp.components.create_graph_component_from_pipeline_func(pipeline_func: Callable, output_component_file: str = None, embed_component_specs: bool = False, annotations: Optional[Mapping[str, str]] = None) → Callable[source]

Creates graph component definition from a python pipeline function. The component file can be published for sharing.

Pipeline function is a function that only calls component functions and passes outputs to inputs. This feature is experimental and lacks support for some of the DSL features like conditions and loops. Only pipelines consisting of loaded components or python components are currently supported (no manually created ContainerOps or ResourceOps).

Warning

Please note this feature is considered experimental!

Parameters:
  • pipeline_func – Python function to convert
  • output_component_file – Path of the file where the component definition will be written. The component.yaml file can then be published for sharing.
  • embed_component_specs – Whether to embed component definitions or just reference them. Embedding makes the graph component self-contained. Default is False.
  • annotations – Optional. Allows adding arbitrary key-value data to the component specification.
Returns:

A function representing the graph component. The component spec can be accessed using the .component_spec attribute. The function will have the same parameters as the original function. When called, the function will return a task object, corresponding to the graph component. To reference the outputs of the task, use task.outputs[“Output name”].

Example:

producer_op = load_component_from_file('producer/component.yaml')
processor_op = load_component_from_file('processor/component.yaml')

def pipeline1(pipeline_param_1: int):
    producer_task = producer_op()
    processor_task = processor_op(pipeline_param_1, producer_task.outputs['Output 2'])

    return OrderedDict([
        ('Pipeline output 1', producer_task.outputs['Output 1']),
        ('Pipeline output 2', processor_task.outputs['Output 2']),
    ])

create_graph_component_from_pipeline_func(pipeline1, output_component_file='pipeline.component.yaml')
kfp.components.func_to_component_text(func, extra_code='', base_image: str = None, packages_to_install: List[str] = None, modules_to_capture: List[str] = None, use_code_pickling=False)[source]

Converts a Python function to a component definition and returns its textual representation.

Function docstring is used as component description. Argument and return annotations are used as component input/output types.

To declare a function with multiple return values, use the NamedTuple return annotation syntax:

from typing import NamedTuple
def add_multiply_two_numbers(a: float, b: float) -> NamedTuple('DummyName', [('sum', float), ('product', float)]):
    """Returns sum and product of two arguments"""
    return (a + b, a * b)
Parameters:
  • func – The python function to convert
  • base_image – Optional. Specify a custom Docker container image to use in the component. For lightweight components, the image needs to have python 3.5+. Default is python:3.7
  • extra_code – Optional. Extra code to add before the function code. Can be used as workaround to define types used in function signature.
  • packages_to_install – Optional. List of [versioned] python packages to pip install before executing the user function.
  • modules_to_capture – Optional. List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured. The actual algorithm: Starting with the initial function, start traversing dependencies. If the dependency.__module__ is in the modules_to_capture list then it’s captured and it’s dependencies are traversed. Otherwise the dependency is only referenced instead of capturing and its dependencies are not traversed.
  • use_code_pickling – Specifies whether the function code should be captured using pickling as opposed to source code manipulation. Pickling has better support for capturing dependencies, but is sensitive to version mismatch between python in component creation environment and runtime image.
Returns:

Textual representation of a component definition

kfp.components.func_to_container_op(func: Callable, output_component_file: Optional[str] = None, base_image: Optional[str] = None, extra_code: Optional[str] = '', packages_to_install: List[str] = None, modules_to_capture: List[str] = None, use_code_pickling: bool = False, annotations: Optional[Mapping[str, str]] = None)[source]
Converts a Python function to a component and returns a task
(kfp.dsl.ContainerOp) factory.

Function docstring is used as component description. Argument and return annotations are used as component input/output types.

To declare a function with multiple return values, use the NamedTuple return annotation syntax:

from typing import NamedTuple
def add_multiply_two_numbers(a: float, b: float) -> NamedTuple('DummyName', [('sum', float), ('product', float)]):
    """Returns sum and product of two arguments"""
    return (a + b, a * b)
Parameters:
  • func – The python function to convert
  • base_image – Optional. Specify a custom Docker container image to use in the component. For lightweight components, the image needs to have python 3.5+. Default is tensorflow/tensorflow:1.13.2-py3
  • output_component_file – Optional. Write a component definition to a local file. Can be used for sharing.
  • extra_code – Optional. Extra code to add before the function code. Can be used as workaround to define types used in function signature.
  • packages_to_install – Optional. List of [versioned] python packages to pip install before executing the user function.
  • modules_to_capture – Optional. List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured. The actual algorithm: Starting with the initial function, start traversing dependencies. If the dependency.__module__ is in the modules_to_capture list then it’s captured and it’s dependencies are traversed. Otherwise the dependency is only referenced instead of capturing and its dependencies are not traversed.
  • use_code_pickling – Specifies whether the function code should be captured using pickling as opposed to source code manipulation. Pickling has better support for capturing dependencies, but is sensitive to version mismatch between python in component creation environment and runtime image.
  • annotations – Optional. Allows adding arbitrary key-value data to the component specification.
Returns:

A factory function with a strongly-typed signature taken from the python function. Once called with the required arguments, the factory constructs a pipeline task instance (kfp.dsl.ContainerOp) that can run the original function in a container.

kfp.components.load_component(filename=None, url=None, text=None, component_spec=None)[source]

Loads component from text, file or URL and creates a task factory function.

Only one argument should be specified.

Parameters:
  • filename – Path of local file containing the component definition.
  • url – The URL of the component file data.
  • text – A string containing the component file data.
  • component_spec – A ComponentSpec containing the component definition.
Returns:

A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).

kfp.components.load_component_from_file(filename)[source]

Loads component from file and creates a task factory function.

Parameters:filename – Path of local file containing the component definition.
Returns:A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).
kfp.components.load_component_from_text(text)[source]

Loads component from text and creates a task factory function.

Parameters:text – A string containing the component file data.
Returns:A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).
kfp.components.load_component_from_url(url: str, auth=None)[source]

Loads component from URL and creates a task factory function.

Parameters:
Returns:

A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).