Tabulify - Pipeline

Undraw My Documents

Tabulify - Pipeline

Pipeline

A pipeline defines a serie of step where:

Syntax

A pipeline is a manifest file that follows the following syntax:

kind: pipeline
spec:
  # A comment
  comment: "A comment"
  # Pipeline Arguments
  arguments:
    # strictness
    strict-execution: true
    xxx: xxx
  # Pipeline steps
  steps:
      # first operation of a pipeline is a supplier step
      - name: supplierStep1
        operation: xxx
        arguments:
          xxx: xxx
          .....
      # next operation are called intermediate
      # They works on the resources returned from the upstream (a supplier or intermediate step)
      - name: intermediateStep2
        operation: xxx
        arguments:
          xxx: xxx
          .....

Arguments

Arguments are tabulify pipeline Parameters attribute

Duration Control

Duration Control Parameters control the duration of a pipeline.

Name Default Description
max-cycle-count Illimited The maximum number of cycle (ie the count of data path send in the pipeline).
timeout Illimited A timeout duration
timeout-type Error A timeout type (duration or error)
The duration value will not throw an error while error will.

Stream

See Stream pipeline Arguments

Error Control

See Error Handling

Strictness

strict-execution permits to set the execution strictness

Derived Attributes

attributes retrieved or computed.

Name Description
logical-name The name of the pipeline (ie the file without extension)
processing-type The processing type as determined by the type of the supplier (ie batch or stream)
start-time The start time of the pipeline execution

Steps

The steps are a series of step (ie operation and arguments)

Execution

You can execute a pipeline with the flow execute command

Attributes

Pipeline supports 2 kind of attributes:

Metrics

Pipeline Metrics

Pipelines shows the following metrics:

Name Description
Total Elapsed Time The total elapsed time of the pipeline execution and completion
Execution Elapsed Time The elapsed time of the pipeline execution to the timeout without the completion (ie last cycle)
For Stream Pipeline
Total Poll Wait Time The total time that the pipeline was waiting to poll due to poll-interval
Total Push Wait Time The total time that the pipeline was waiting to push due to push-interval

Note on:

  • Total Elapsed time and Timeout: The timeout is the maximum duration of the main execution. Because the pipeline needs to close and complete the pending intermediate steps operation, the total elapsed time is always a little bit greater than the timeout.
  • Total Poll Wait Time and Timeout: the Total Poll Wait Time is always greater than the Timeout because the pipeline may wait in the completion (ie last cycle)

Pipeline Step Metrics

  • Input Counter: the number of data resource received
  • Output Counter : the number of data resource supplied
  • Execution Counter: the number of step execution. In a stream pipeline, the execution is:
  • Error Counter: the number of errors



Related HowTo
Undraw My Documents
How to add information about the selected resources with the Enrich operation ?

enrich is an intermediate operation that will add virtual columns to its inputs thanks to data supplier. Enrich accepts only one argument data-def where you can define extra columns called virtual columns...
Undraw My Documents
How to diff a SQL table ?

This howto show you how to perform diff operation between 2 SQL Tables with the data diff command. In this step, we load the csv resources that we want to compare into sqlite Load the original...
Undraw My Documents
How to send data resources

This howto shows you how to send an email with a data resource attached via the sendmail operation. In the below pipeline, we use the following steps: select to select data resources that will be...
Undraw My Documents
How to use the define operation?

This howto shows you how to use the define data operation to create data resources in the pipeline file (ie inline data resources) This operation is only available as step in a pipeline (ie not in a tabul...

Task Runner