Tabulify - Pipeline
Pipeline
A pipeline defines a serie of step where:
- the first step called the supplier step:
- selects data resources
- and send them to the next step
- and the following steps called intermediate steps:
- accept the output (target) of the previous step as source
- and produce target (output) resources.
Syntax
A pipeline is a manifest file that follows the following syntax:
kind: pipeline
spec:
# A comment
comment: "A comment"
# Pipeline Arguments
arguments:
# strictness
strict-execution: true
xxx: xxx
# Pipeline steps
steps:
# first operation of a pipeline is a supplier step
- name: supplierStep1
operation: xxx
arguments:
xxx: xxx
.....
# next operation are called intermediate
# They works on the resources returned from the upstream (a supplier or intermediate step)
- name: intermediateStep2
operation: xxx
arguments:
xxx: xxx
.....
Arguments
Arguments are tabulify pipeline Parameters attribute
Duration Control
Duration Control Parameters control the duration of a pipeline.
| Name | Default | Description |
|---|---|---|
| max-cycle-count | Illimited | The maximum number of cycle (ie the count of data path send in the pipeline). |
| timeout | Illimited | A timeout duration |
| timeout-type | Error | A timeout type (duration or error) The duration value will not throw an error while error will. |
Stream
Error Control
See Error Handling
Strictness
strict-execution permits to set the execution strictness
Derived Attributes
attributes retrieved or computed.
| Name | Description |
|---|---|
| logical-name | The name of the pipeline (ie the file without extension) |
| processing-type | The processing type as determined by the type of the supplier (ie batch or stream) |
| start-time | The start time of the pipeline execution |
Steps
The steps are a series of step (ie operation and arguments)
Execution
You can execute a pipeline with the flow execute command
Attributes
Pipeline supports 2 kind of attributes:
Metrics
Pipeline Metrics
Pipelines shows the following metrics:
| Name | Description |
|---|---|
| Total Elapsed Time | The total elapsed time of the pipeline execution and completion |
| Execution Elapsed Time | The elapsed time of the pipeline execution to the timeout without the completion (ie last cycle) |
| For Stream Pipeline | |
| Total Poll Wait Time | The total time that the pipeline was waiting to poll due to poll-interval |
| Total Push Wait Time | The total time that the pipeline was waiting to push due to push-interval |
Note on:
- Total Elapsed time and Timeout: The timeout is the maximum duration of the main execution. Because the pipeline needs to close and complete the pending intermediate steps operation, the total elapsed time is always a little bit greater than the timeout.
- Total Poll Wait Time and Timeout: the Total Poll Wait Time is always greater than the Timeout because the pipeline may wait in the completion (ie last cycle)
Pipeline Step Metrics
- Input Counter: the number of data resource received
- Output Counter : the number of data resource supplied
- Execution Counter: the number of step execution. In a stream pipeline, the execution is:
- for a batch intermediate step, the number of time, the window interval was exhausted and the step was executed
- for a stream intermediate step, the step execution (ie one data resource, one execution)
- Error Counter: the number of errors