Step Operation - Transfer

Undraw Data Processing

About

transfer is a intermediate operation that executes all this data resources operations:

In general, if you need to move or process data, you will transfer it.

The data transfer action permits you to

Example

In a Pipeline

pipeline:
# ...
  - name: "Store"
    operation: transfer
    args:
      target-data-uri: '${logicalName}.csv@tmp'
      transfer-operation: insert
      target-operation: replace
      ...

Cli

The transfer operation is performed by the tabul data transfer command

Arguments

Name Default Description
target-data-uri - A template data uri (Mandatory)
target-name-sanitization true If true, the target name is sanitized to comply with the naming convention of the target system.
Example: for sql, the name foo-bar would become foo_bar
target-operation - A resource operation that will happens on the target before the transfer
source-operation - A resource operation that will happens on the source after the transfer
Pipeline Properties
output-type target The output
- targets, the target resources
- inputs, the inputs resources are passed
- results, the results of the transfer step is passed
processing-type batch if the target is not a pattern
stream otherwise
The type of processing
Cross Transfer Properties
source-fetch-size 10000 The number of record from the source for one fetch (ie network message size from the source system)
buffer-size 2 x target-worker x fetch-size The maximum number of record from stored in transit between the source and the target
target-batch-size 1000 The number of record send to the target system at once in a batch (ie network message size to the target system)
target-commit-frequency <math>2^{31}</math> (Infinite) The number of batch sends that will trigger a commit to the target system
target-worker 1 The number of thread against the target system that will send batches
with-parameters true If the target system is a sql database, SQL bind variables will be used
metrics-data-uri A data uri where the transfer metrics will be saved
Transfer Properties
transfer-operation copy for a file system
insert otherwise
A transfer operation (copy, insert, upsert, update, …)
transfer-upsert-type merge The transfer upsert type (merge, insert-update, update-insert, …)
transfer-mapping-method position if the target has no structure
name otherwise
how the source and target columns are mapped (by name, position, map-by-name or map-by-position)
transfer-mapping-columns the map of source target column when the mapping method is map-by-name or map-by-position
transfer-mapping-strict true if true, the transfer will fail if a source column cannot be mapped to a target column with the mapping method name or position

Transfer Operation

The transfer operation supports the following values for the transfer-operation argument.

Basic Operations

Name Source/Target with
Same Structure
Same Data
Additional Constraint
copy Yes target does not exist
or is empty
insert No -
upsert No -
update No -
delete No -

Alias Operations

We support also aliases operation. They are operation derived from the basic one.

Name Alias
concat alias of insert with only one target (ie a target-data-uri without target pattern)
move alias of copy with source drop
rename alias of move on the same connection (Metadata Operation)
replace alias of copy with target drop

Default Transfer Operation

The default transfer operation is system dependent:

Transfer mapping method

The transfer-mapping-method argument defines how the source columns are mapped to the target columns.

There are 4 values:

  • name, the column name of the source should match a column name of the target
    • by default, if a column of the source is not found in the target, the execution will stop
    • if you want a lose mapping without any error, you can set the transfer-mapping-strict attribute to false. The source column will then not be added to the transfer.
  • position, the column position of the source will match the column position of the target
    • by default, if the number of column in the source does not match the number of column in the target, the execution will stop
    • if you want a lose mapping without any error, you can set the transfer-strict-mapping attribute to false.
  • map-by-name, the argument transfer-mapping-columns should contains a map of column name where
    • the keys are the source column names
    • the values are the target column names
  • map-by-position, the argument transfer-mapping-columns should contains a map of column position where:
    • the keys are the source column positions
    • the values are the target column positions

Resource Operation

The following values may be used for the target-operation and source-operation arguments.

Value Description Operation Equivalence
Truncate Truncate the resource
Drop Drop the resource On the source, move
On the target, replace

Transfer Upsert Type

We support the upsert operation via 3 methods

Type Batch Mode Definition
merge Yes The merge represents an unique statement (generally a merge but if not supported, it may be an insert on conflict)
insert Yes When the target has no unique constraint, an insert is performed
insert-update No An insert is executed, on error an update is performed
update-insert No An update is executed, on error an insert is performed

Note that a batch mode cannot be applied to a insert-update or update-insert because we need to get the result of the first operation (update or insert) to apply the second conditionally.

If the merge statement is not implemented by the database or the driver, we switch to:

  • insert-update if the target resource is empty
  • update-insert if the target resource is not empty

Target Batch Size

The target-batch-size defines the size of a batch.

Batch mode

If the target system supports batch update, we don't send each SQL statement one by one but grouped in a unit called a batch.

This mode is supported only when

  • with-parameters is set to true
  • when the transfer operation is performed in one statement (not with upsert type of a insert/update or update/insert mode)

Reporting

Results

If you set the output arguments to results, a result data path will be returned (in place of targets) and it will contain the following columns:

  • input: the input
  • target: the target
  • latency: the latency in duration
  • record_count: the count of records transfered
  • error_code: the error code / exit status if any
  • error_message: the error message if any

Metrics

If the argument metrics-data-uri is given, a metrics data resource will be created with the following columns:

  • run - the run id
  • timestamp - the metrics timestamp
  • metric - the name of the metrics
  • value - the value of the metrics
  • worker - the name of the worker (thread)

The following metrics will be recorded:

  • BufferSize: the size of the memory buffer between the source and the target
  • BufferMaxSize: the maximum size of the buffer
  • BufferRatio: the ratio from Buffer Size against Buffer MaxSize
  • Commits: the number of commits
  • Records: the number of records

Cross Transfer

A transfer can happen:

Tabulify Optimization: If two connections share the same credentials and the same system URL, the transfer will be optimized and considered local. This way, the transfer is applied only on the system metadata and it happens faster because the data is not moved.




Related HowTo
Undraw Data Processing
How to load an HTLM table in a database?

This howto shows you how to load an HTML table in a SQL Table. Because the HTML resource is a file resource, it can be used as any source in any type of transfer. Example with the upsert operation...
Undraw Data Processing
Learning Tabulify - Step 6 - How to transfer Data Resources

In Tabulify, when you want to manipulate data, it's called a transfer. You want to move data, you transfer You want to copy data, you transfer You want to insert data, you transfer You want to...
Undraw Data Processing
Learning Tabulify - Step 8 - How to modify the content of a table

Tabulify learning guidetransfer step The transfer page should have made you familiar with the transfer concept where we have introduced the copy operation. The copy and move operation does not modify...

Task Runner