This articles talks about error handling in a pipeline (ie what happens when an error occurs).
We are using the car traffic analogy. We use the following concepts:
When a error happens:
stop is the default action and will stop the pipeline.
discard will do nothing.
park will gather the data resources implicated in the accident and move them to the parking
Note that in case of batch intermediate step, it's not possible to determine which data resource of the batch caused the error. We take the current batch and park them all.
| Name | Default | Description |
|---|---|---|
| on-error-action | stop | The action to take if an error occurs: * stop: stop the pipeline * park: move the data resources in the parking * discard: discard the error |
| parking-data-uri | See parking | A target uri that defines the location where to move data resources implicated in an accident (ie error). By default, the parking directory is located in the data-home directory |
| is-strict | true | Strict mode - Fail conditions that are ambiguous |
The parking data uri is a target data uri that defines a container data resource (ie a directory or schema) where data resources implicated in the accident will be placed.
By default, the value is:
parking/${pipeline_name}/${pipeline_start_time}@data-home
where:
In the target template, you can use the following variables names
The parking data resource depends on the system type of the input and of the parking
| Input Data Resource System Type | Parking System Type | Parking Target Name | Example |
|---|---|---|---|
| file | file | input name | File foo.csv → foo.csv |
| database | file | logical name and the extension defined in the tabular file type parameter (by default, csv) | Table or record foo → foo.csv |
| file | database | logical name | File foo.csv → Table foo |
A parking will move the data resource: