transfer is a intermediate operation that executes all this data resources operations:
In general, if you need to move or process data, you will transfer it.
The data transfer action permits you to
In a Pipeline
pipeline:
# ...
- name: "Store"
operation: transfer
args:
target-data-uri: '${logicalName}.csv@tmp'
transfer-operation: insert
target-operation: replace
...
The transfer operation is performed by the tabul data transfer command
| Name | Default | Description |
|---|---|---|
| target-data-uri | - | A template data uri (Mandatory) |
| target-name-sanitization | true | If true, the target name is sanitized to comply with the naming convention of the target system. Example: for sql, the name foo-bar would become foo_bar |
| target-operation | - | A resource operation that will happens on the target before the transfer |
| source-operation | - | A resource operation that will happens on the source after the transfer |
| Pipeline Properties | ||
| output-type | target | The output - targets, the target resources - inputs, the inputs resources are passed - results, the results of the transfer step is passed |
| processing-type | batch if the target is not a pattern stream otherwise | The type of processing |
| Cross Transfer Properties | ||
| source-fetch-size | 10000 | The number of record from the source for one fetch (ie network message size from the source system) |
| buffer-size | 2 x target-worker x fetch-size | The maximum number of record from stored in transit between the source and the target |
| target-batch-size | 1000 | The number of record send to the target system at once in a batch (ie network message size to the target system) |
| target-commit-frequency | <math>2^{31}</math> (Infinite) | The number of batch sends that will trigger a commit to the target system |
| target-worker | 1 | The number of thread against the target system that will send batches |
| with-parameters | true | If the target system is a sql database, SQL bind variables will be used |
| metrics-data-uri | A data uri where the transfer metrics will be saved | |
| Transfer Properties | ||
| transfer-operation | copy for a file system insert otherwise | A transfer operation (copy, insert, upsert, update, …) |
| transfer-upsert-type | merge | The transfer upsert type (merge, insert-update, update-insert, …) |
| transfer-mapping-method | position if the target has no structure name otherwise | how the source and target columns are mapped (by name, position, map-by-name or map-by-position) |
| transfer-mapping-columns | the map of source target column when the mapping method is map-by-name or map-by-position | |
| transfer-mapping-strict | true | if true, the transfer will fail if a source column cannot be mapped to a target column with the mapping method name or position |
The transfer operation supports the following values for the transfer-operation argument.
We support also aliases operation. They are operation derived from the basic one.
| Name | Alias |
|---|---|
| concat | alias of insert with only one target (ie a target-data-uri without target pattern) |
| move | alias of copy with source drop |
| rename | alias of move on the same connection (Metadata Operation) |
| replace | alias of copy with target drop |
The default transfer operation is system dependent:
The transfer-mapping-method argument defines how the source columns are mapped to the target columns.
There are 4 values:
The following values may be used for the target-operation and source-operation arguments.
We support the upsert operation via 3 methods
| Type | Batch Mode | Definition |
|---|---|---|
| merge | Yes | The merge represents an unique statement (generally a merge but if not supported, it may be an insert on conflict) |
| insert | Yes | When the target has no unique constraint, an insert is performed |
| insert-update | No | An insert is executed, on error an update is performed |
| update-insert | No | An update is executed, on error an insert is performed |
Note that a batch mode cannot be applied to a insert-update or update-insert because we need to get the result of the first operation (update or insert) to apply the second conditionally.
If the merge statement is not implemented by the database or the driver, we switch to:
The target-batch-size defines the size of a batch.
If the target system supports batch update, we don't send each SQL statement one by one but grouped in a unit called a batch.
This mode is supported only when
If you set the output arguments to results, a result data path will be returned (in place of targets) and it will contain the following columns:
If the argument metrics-data-uri is given, a metrics data resource will be created with the following columns:
The following metrics will be recorded:
A transfer can happen:
Tabulify Optimization: If two connections share the same credentials and the same system URL, the transfer will be optimized and considered local. This way, the transfer is applied only on the system metadata and it happens faster because the data is not moved.