Table of Contents

Step Operation - Transfer

About

transfer is a intermediate operation that executes all this data resources operations:

In general, if you need to move or process data, you will transfer it.

The data transfer action permits you to

Example

In a Pipeline

pipeline:
# ...
  - name: "Store"
    operation: transfer
    args:
      target-data-uri: '${logicalName}.csv@tmp'
      transfer-operation: insert
      target-operation: replace
      ...

Cli

The transfer operation is performed by the tabul data transfer command

Arguments

Name Default Description
target-data-uri - A template data uri (Mandatory)
target-name-sanitization true If true, the target name is sanitized to comply with the naming convention of the target system.
Example: for sql, the name foo-bar would become foo_bar
target-operation - A resource operation that will happens on the target before the transfer
source-operation - A resource operation that will happens on the source after the transfer
Pipeline Properties
output-type target The output
- targets, the target resources
- inputs, the inputs resources are passed
- results, the results of the transfer step is passed
processing-type batch if the target is not a pattern
stream otherwise
The type of processing
Cross Transfer Properties
source-fetch-size 10000 The number of record from the source for one fetch (ie network message size from the source system)
buffer-size 2 x target-worker x fetch-size The maximum number of record from stored in transit between the source and the target
target-batch-size 1000 The number of record send to the target system at once in a batch (ie network message size to the target system)
target-commit-frequency <math>2^{31}</math> (Infinite) The number of batch sends that will trigger a commit to the target system
target-worker 1 The number of thread against the target system that will send batches
with-parameters true If the target system is a sql database, SQL bind variables will be used
metrics-data-uri A data uri where the transfer metrics will be saved
Transfer Properties
transfer-operation copy for a file system
insert otherwise
A transfer operation (copy, insert, upsert, update, …)
transfer-upsert-type merge The transfer upsert type (merge, insert-update, update-insert, …)
transfer-mapping-method position if the target has no structure
name otherwise
how the source and target columns are mapped (by name, position, map-by-name or map-by-position)
transfer-mapping-columns the map of source target column when the mapping method is map-by-name or map-by-position
transfer-mapping-strict true if true, the transfer will fail if a source column cannot be mapped to a target column with the mapping method name or position

Transfer Operation

The transfer operation supports the following values for the transfer-operation argument.

Basic Operations

Name Source/Target with
Same Structure
Same Data
Additional Constraint
copy Yes target does not exist
or is empty
insert No -
upsert No -
update No -
delete No -

Alias Operations

We support also aliases operation. They are operation derived from the basic one.

Name Alias
concat alias of insert with only one target (ie a target-data-uri without target pattern)
move alias of copy with source drop
rename alias of move on the same connection (Metadata Operation)
replace alias of copy with target drop

Default Transfer Operation

The default transfer operation is system dependent:

Transfer mapping method

The transfer-mapping-method argument defines how the source columns are mapped to the target columns.

There are 4 values:

Resource Operation

The following values may be used for the target-operation and source-operation arguments.

Value Description Operation Equivalence
Truncate Truncate the resource
Drop Drop the resource On the source, move
On the target, replace

Transfer Upsert Type

We support the upsert operation via 3 methods

Type Batch Mode Definition
merge Yes The merge represents an unique statement (generally a merge but if not supported, it may be an insert on conflict)
insert Yes When the target has no unique constraint, an insert is performed
insert-update No An insert is executed, on error an update is performed
update-insert No An update is executed, on error an insert is performed

Note that a batch mode cannot be applied to a insert-update or update-insert because we need to get the result of the first operation (update or insert) to apply the second conditionally.

If the merge statement is not implemented by the database or the driver, we switch to:

Target Batch Size

The target-batch-size defines the size of a batch.

Batch mode

If the target system supports batch update, we don't send each SQL statement one by one but grouped in a unit called a batch.

This mode is supported only when

Reporting

Results

If you set the output arguments to results, a result data path will be returned (in place of targets) and it will contain the following columns:

Metrics

If the argument metrics-data-uri is given, a metrics data resource will be created with the following columns:

The following metrics will be recorded:

Cross Transfer

A transfer can happen:

Tabulify Optimization: If two connections share the same credentials and the same system URL, the transfer will be optimized and considered local. This way, the transfer is applied only on the system metadata and it happens faster because the data is not moved.