Error Handling
About
This articles talks about error handling in a pipeline (ie what happens when an error occurs).
Traffic Analogy
We are using the car traffic analogy. We use the following concepts:
- traffic: the data resources that flows in a pipeline
- accident: an error
- parking: the location where the data resources implicated in an accident are parked. You can then create additional error handling pipeline that uses this parking as source.
Handling Process
When a error happens:
- the step error counter is incremented
- an action is taken according to the on-error-action argument.
Stop
stop is the default action and will stop the pipeline.
Discard
discard will do nothing.
Park
park will gather the data resources implicated in the accident and move them to the parking
Note that in case of batch intermediate step, it's not possible to determine which data resource of the batch caused the error. We take the current batch and park them all.
Pipeline Arguments
| Name | Default | Description |
|---|---|---|
| on-error-action | stop | The action to take if an error occurs: * stop: stop the pipeline * park: move the data resources in the parking * discard: discard the error |
| parking-data-uri | See parking | A target uri that defines the location where to move data resources implicated in an accident (ie error). By default, the parking directory is located in the data-home directory |
| is-strict | true | Strict mode - Fail conditions that are ambiguous |
Parking
- The parking is a container (schema, directory)
- The parking name is the name of the resource created in the parking
Parking Data Uri
The parking data uri is a target data uri that defines a container data resource (ie a directory or schema) where data resources implicated in the accident will be placed.
By default, the value is:
parking/${pipeline_name}/${pipeline_start_time}@data-home
where:
- data-home is the data-home connection
In the target template, you can use the following variables names
- all pipeline derived attributes/arguments with the prefix pipeline_
- all data resource attribute with the prefix resource_
Parking Name
The parking data resource depends on the system type of the input and of the parking
| Input Data Resource System Type | Parking System Type | Parking Target Name | Example |
|---|---|---|---|
| file | file | input name | File foo.csv → foo.csv |
| database | file | logical name and the extension defined in the tabular file type parameter (by default, csv) | Table or record foo → foo.csv |
| file | database | logical name | File foo.csv → Table foo |
Parking Operation
A parking will move the data resource: