Table of Contents

Error Handling

About

This articles talks about error handling in a pipeline (ie what happens when an error occurs).

Traffic Analogy

We are using the car traffic analogy. We use the following concepts:

Handling Process

When a error happens:

Stop

stop is the default action and will stop the pipeline.

Discard

discard will do nothing.

Park

park will gather the data resources implicated in the accident and move them to the parking

Note that in case of batch intermediate step, it's not possible to determine which data resource of the batch caused the error. We take the current batch and park them all.

Pipeline Arguments

Name Default Description
on-error-action stop The action to take if an error occurs:
* stop: stop the pipeline
* park: move the data resources in the parking
* discard: discard the error
parking-data-uri See parking A target uri that defines the location where to move data resources
implicated in an accident (ie error).
By default, the parking directory is located in the data-home directory
is-strict true Strict mode - Fail conditions that are ambiguous

Parking

Parking Data Uri

The parking data uri is a target data uri that defines a container data resource (ie a directory or schema) where data resources implicated in the accident will be placed.

By default, the value is:

parking/${pipeline_name}/${pipeline_start_time}@data-home

where:

In the target template, you can use the following variables names

Parking Name

The parking data resource depends on the system type of the input and of the parking

Input
Data Resource
System Type
Parking
System Type
Parking Target Name Example
file file input name File foo.csvfoo.csv
database file logical name and the extension
defined in the tabular file type parameter
(by default, csv)
Table or record foofoo.csv
file database logical name File foo.csv → Table foo

Parking Operation

A parking will move the data resource: