kfp.components package

class kfp.components.ComponentStore(local_search_paths=None, url_search_prefixes=None)[source]

Bases: object

default_store = <kfp.components._component_store.ComponentStore object>
load_component(name, digest=None, tag=None)[source]

Loads component local file or URL and creates a task factory function

Search locations: <local-search-path>/<name>/component.yaml <url-search-prefix>/<name>/component.yaml

If the digest is specified, then the search locations are: <local-search-path>/<name>/versions/sha256/<digest> <url-search-prefix>/<name>/versions/sha256/<digest>

If the tag is specified, then the search locations are: <local-search-path>/<name>/versions/tags/<digest> <url-search-prefix>/<name>/versions/tags/<digest>

Parameters:
  • name – Component name used to search and load the component artifact containing the component definition. Component name usually has the following form: group/subgroup/component
  • digest – Strict component version. SHA256 hash digest of the component artifact file. Can be used to load a specific component version so that the pipeline is reproducible.
  • tag – Version tag. Can be used to load component version from a specific branch. The version of the component referenced by a tag can change in future.
Returns:

A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).

load_component_from_file(path)[source]
load_component_from_url(url)[source]
class kfp.components.InputBinaryFile(type=None)[source]

Bases: object

When creating component from function, InputBinaryFile should be used as function parameter annotation to tell the system to pass the binary data stream object (io.BytesIO) to the function instead of passing the actual data.

class kfp.components.InputPath(type=None)[source]

Bases: object

When creating component from function, InputPath should be used as function parameter annotation to tell the system to pass the data file path to the function instead of passing the actual data.

class kfp.components.InputTextFile(type=None)[source]

Bases: object

When creating component from function, InputTextFile should be used as function parameter annotation to tell the system to pass the text data stream object (io.TextIOWrapper) to the function instead of passing the actual data.

class kfp.components.OutputBinaryFile(type=None)[source]

Bases: object

When creating component from function, OutputBinaryFile should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a given binary file stream (io.BytesIO) instead of returning the data from the function.

class kfp.components.OutputPath(type=None)[source]

Bases: object

When creating component from function, OutputPath should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a file with the given path instead of returning the data from the function.

class kfp.components.OutputTextFile(type=None)[source]

Bases: object

When creating component from function, OutputTextFile should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a given text file stream (io.TextIOWrapper) instead of returning the data from the function.

kfp.components.create_component_from_airflow_op(op_class: type, base_image: str = 'apache/airflow:master-python3.6-ci', variable_output_names: List[str] = None, xcom_output_names: List[str] = None, modules_to_capture: List[str] = None)[source]

Creates component function from an Airflow operator class. The inputs of the component are the same as the operator constructor parameters. By default the component has the following outputs: “Result”, “Variables” and “XComs”. “Variables” and “XComs” are serialized JSON maps of all variables and xcoms produced by the operator during the execution. Use the variable_output_names and xcom_output_names parameters to output individual variables/xcoms as separate outputs.

Parameters:
  • op_class – Reference to the Airflow operator class (e.g. EmailOperator or BashOperator) to convert to componenent.
  • base_image – Optional. The container image to use for the component. Default is apache/airflow. The container image must have the same python version as the environment used to run create_component_from_airflow_op. The image should have python 3.5+ with airflow package installed.
  • variable_output_names – Optional. A list of Airflow “variables” produced by the operator that should be returned as separate outputs.
  • xcom_output_names – Optional. A list of Airflow “XComs” produced by the operator that should be returned as separate outputs.
  • modules_to_capture – Optional. A list of names of additional modules that the operator depends on. By default only the module containing the operator class is captured. If the operator class uses the code from another module, the name of that module can be specified in this list.
kfp.components.create_component_from_func(func: Callable, output_component_file: str = None, base_image: str = None, packages_to_install: List[str] = None)[source]

Converts a Python function to a component and returns a task factory (a function that accepts arguments and returns a task object).

Function name and docstring are used as component name and description. Argument and return annotations are used as component input/output types. Example:

def add(a: float, b: float) -> float:
    """Returns sum of two arguments"""
    return a + b

# add_op is a task factory function that creates a task object when given arguments
add_op = create_component_from_func(
    func=add,
    base_image='python:3.7', # Optional
    output_component_file='add.component.yaml', # Optional
    packages_to_install=['pandas==0.24'], # Optional
)

# The component spec can be accessed through the .component_spec attribute:
add_op.component_spec.save('add.component.yaml')

# The component function can be called with arguments to create a task:
add_task = add_op(1, 3)

# The resulting task has output references, corresponding to the component outputs.
# When the function only has a single anonymous return value, the output name is "Output":
sum_output_ref = add_task.outputs['Output']

# These task output references can be passed to other component functions, constructing a computation graph:
task2 = add_op(sum_output_ref, 5)

create_component_from_func function can also be used as decorator:

@create_component_from_func
def add_op(a: float, b: float) -> float:
    """Returns sum of two arguments"""
    return a + b

To declare a function with multiple return values, use the NamedTuple return annotation syntax:

from typing import NamedTuple

def add_multiply_two_numbers(a: float, b: float) -> NamedTuple('Outputs', [('sum', float), ('product', float)]):
    """Returns sum and product of two arguments"""
    return (a + b, a * b)

add_multiply_op = create_component_from_func(add_multiply_two_numbers)

# The component function can be called with arguments to create a task:
add_multiply_task = add_multiply_op(1, 3)

# The resulting task has output references, corresponding to the component outputs:
sum_output_ref = add_multiply_task.outputs['sum']

# These task output references can be passed to other component functions, constructing a computation graph:
task2 = add_multiply_op(sum_output_ref, 5)

Bigger data should be read from files and written to files. Use the InputPath parameter annotation to tell the system that the function wants to consume the corresponding input data as a file. The system will download the data, write it to a local file and then pass the path of that file to the function. Use the OutputPath parameter annotation to tell the system that the function wants to produce the corresponding output data as a file. The system will prepare and pass the path of a file where the function should write the output data. After the function exits, the system will upload the data to the storage system so that it can be passed to downstream components. You can specify the type of the consumed/produced data by specifying the type argument to InputPath and OutputPath. The type can be a python type or an arbitrary type name string. OutputPath(‘CatBoostModel’) means that the function states that the data it has written to a file has type ‘CatBoostModel’. InputPath(‘CatBoostModel’) means that the function states that it expect the data it reads from a file to have type ‘CatBoostModel’. When the pipeline author connects inputs to outputs the system checks whether the types match. Every kind of data can be consumed as a file input. Conversely, bigger data should not be consumed by value as all value inputs pass through the command line.

Example of a component function declaring file input and output:

def catboost_train_classifier(
    training_data_path: InputPath('CSV'),            # Path to input data file of type "CSV"
    trained_model_path: OutputPath('CatBoostModel'), # Path to output data file of type "CatBoostModel"
    number_of_trees: int = 100,                      # Small output of type "Integer"
) -> NamedTuple('Outputs', [
    ('Accuracy', float),  # Small output of type "Float"
    ('Precision', float), # Small output of type "Float"
    ('JobUri', 'URI'),    # Small output of type "URI"
]):
    """Trains CatBoost classification model"""
    ...

    return (accuracy, precision, recall)
Parameters:
  • func – The python function to convert
  • base_image – Optional. Specify a custom Docker container image to use in the component. For lightweight components, the image needs to have python 3.5+. Default is the python image corresponding to the current python environment.
  • output_component_file – Optional. Write a component definition to a local file. The produced component file can be loaded back by calling load_component_from_file or load_component_from_uri.
  • packages_to_install – Optional. List of [versioned] python packages to pip install before executing the user function.
Returns:

A factory function with a strongly-typed signature taken from the python function. Once called with the required arguments, the factory constructs a task instance that can run the original function in a container.

kfp.components.create_graph_component_from_pipeline_func(pipeline_func: Callable, output_component_file: str = None, embed_component_specs: bool = False) → Callable[source]

Experimental! Creates graph component definition from a python pipeline function. The component file can be published for sharing. Pipeline function is a function that only calls component functions and passes outputs to inputs. This feature is experimental and lacks support for some of the DSL features like conditions and loops. Only pipelines consisting of loaded components or python components are currently supported (no manually created ContainerOps or ResourceOps).

Parameters:
  • pipeline_func – Python function to convert
  • output_component_file – Path of the file where the component definition will be written. The component.yaml file can then be published for sharing.
  • embed_component_specs – Whether to embed component definitions or just reference them. Embedding makes the graph component self-contained. Default is False.
Returns:

A function representing the graph component. The component spec can be accessed using the .component_spec attribute. The function will have the same parameters as the original function. When called, the function will return a task object, corresponding to the graph component. To reference the outputs of the task, use task.outputs[“Output name”].

Example

producer_op = load_component_from_file(‘producer/component.yaml’) processor_op = load_component_from_file(‘processor/component.yaml’)

def pipeline1(pipeline_param_1: int):

producer_task = producer_op() processor_task = processor_op(pipeline_param_1, producer_task.outputs[‘Output 2’])

return OrderedDict([
(‘Pipeline output 1’, producer_task.outputs[‘Output 1’]), (‘Pipeline output 2’, processor_task.outputs[‘Output 2’]),

])

create_graph_component_from_pipeline_func(pipeline1, output_component_file=’pipeline.component.yaml’)

kfp.components.func_to_component_text(func, extra_code='', base_image: str = None, packages_to_install: List[str] = None, modules_to_capture: List[str] = None, use_code_pickling=False)[source]

Converts a Python function to a component definition and returns its textual representation

Function docstring is used as component description. Argument and return annotations are used as component input/output types. To declare a function with multiple return values, use the NamedTuple return annotation syntax:

from typing import NamedTuple def add_multiply_two_numbers(a: float, b: float) -> NamedTuple(‘DummyName’, [(‘sum’, float), (‘product’, float)]):

“”“Returns sum and product of two arguments”“” return (a + b, a * b)
Parameters:
  • func – The python function to convert
  • base_image – Optional. Specify a custom Docker container image to use in the component. For lightweight components, the image needs to have python 3.5+. Default is tensorflow/tensorflow:1.13.2-py3
  • extra_code – Optional. Extra code to add before the function code. Can be used as workaround to define types used in function signature.
  • packages_to_install – Optional. List of [versioned] python packages to pip install before executing the user function.
  • modules_to_capture – Optional. List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured. The actual algorithm: Starting with the initial function, start traversing dependencies. If the dependecy.__module__ is in the modules_to_capture list then it’s captured and it’s dependencies are traversed. Otherwise the dependency is only referenced instead of capturing and its dependencies are not traversed.
  • use_code_pickling – Specifies whether the function code should be captured using pickling as opposed to source code manipulation. Pickling has better support for capturing dependencies, but is sensitive to version mismatch between python in component creation environment and runtime image.
Returns:

Textual representation of a component definition

kfp.components.func_to_container_op(func, output_component_file=None, base_image: str = None, extra_code='', packages_to_install: List[str] = None, modules_to_capture: List[str] = None, use_code_pickling=False)[source]

Converts a Python function to a component and returns a task (ContainerOp) factory

Function docstring is used as component description. Argument and return annotations are used as component input/output types. To declare a function with multiple return values, use the NamedTuple return annotation syntax:

from typing import NamedTuple def add_multiply_two_numbers(a: float, b: float) -> NamedTuple(‘DummyName’, [(‘sum’, float), (‘product’, float)]):

“”“Returns sum and product of two arguments”“” return (a + b, a * b)
Parameters:
  • func – The python function to convert
  • base_image – Optional. Specify a custom Docker container image to use in the component. For lightweight components, the image needs to have python 3.5+. Default is tensorflow/tensorflow:1.13.2-py3
  • output_component_file – Optional. Write a component definition to a local file. Can be used for sharing.
  • extra_code – Optional. Extra code to add before the function code. Can be used as workaround to define types used in function signature.
  • packages_to_install – Optional. List of [versioned] python packages to pip install before executing the user function.
  • modules_to_capture – Optional. List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured. The actual algorithm: Starting with the initial function, start traversing dependencies. If the dependecy.__module__ is in the modules_to_capture list then it’s captured and it’s dependencies are traversed. Otherwise the dependency is only referenced instead of capturing and its dependencies are not traversed.
  • use_code_pickling – Specifies whether the function code should be captured using pickling as opposed to source code manipulation. Pickling has better support for capturing dependencies, but is sensitive to version mismatch between python in component creation environment and runtime image.
Returns:

A factory function with a strongly-typed signature taken from the python function. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp) that can run the original function in a container.

kfp.components.load_component(filename=None, url=None, text=None)[source]

Loads component from text, file or URL and creates a task factory function

Only one argument should be specified.

Parameters:
  • filename – Path of local file containing the component definition.
  • url – The URL of the component file data
  • text – A string containing the component file data.
Returns:

A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).

kfp.components.load_component_from_file(filename)[source]

Loads component from file and creates a task factory function

Parameters:filename – Path of local file containing the component definition.
Returns:A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).
kfp.components.load_component_from_text(text)[source]

Loads component from text and creates a task factory function

Parameters:text – A string containing the component file data.
Returns:A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).
kfp.components.load_component_from_url(url)[source]

Loads component from URL and creates a task factory function

Parameters:url – The URL of the component file data
Returns:A factory function with a strongly-typed signature. Once called with the required arguments, the factory constructs a pipeline task instance (ContainerOp).