About
A data definition is a manifest file property or argument that defines the metadata of a data resource (ie attributes and data structure)
Example
Manifest
Example in a kind resource manifest file
kind: csv
spec:
data-def:
logical-name: favorite_books
header-row-id: 1
delimiter-character: ','
columns:
- name: asin
type: varchar
precision: 20
- name: description
type: varchar
- name: price
type: double
- name: group
type: varchar
Operation Argument
Example as argument of the define step in a pipeline manifest
kind: pipeline
spec:
steps:
- operation: "define"
arguments:
# The data def argument
data-def:
logical-name: "colors"
columns: ["id","color"]
Cli Options
You can also set them with tabul_cli_option.
For example, setting a semicolon separator to a CSV
tabul data head \
--attribute delimiter-character=';' \
books-semicolon.csv@howto
Usage
Resource Manifest
Data definition may be defined in any resource manifest (ie yaml file that defines a resource)
Operation Argument
This format is used in pipeline step in the data-def argument.
Example:
- the define operation
- the select operation
- the enrich operation
Tabul Cli Option
You can set the data definition with the following tabul options
- source-attribute or target-attribute in a transfer command
- attribute for others data command
Format
The following data definition file shows the common structure of all data definition file that defines the name of the tabular structure and its columns.
Scalar
At the root, you can set any scalar attributes such as the common attributes
Example:
logical-name: LogicalName
where
- logical-name is the logical name of the resource (Default to the name of the file without structure information).
Relational Structure
Columns
columns:
- name: column_name1
Type: date
ansi-type: date
Precision:
Scale:
Comment:
Position: 1
- name: column_name2
type: varchar
precision:
scale: 0
comment: A comment
where:
- columns defines the columns
- name is the name of the column
- type is the data type name of the connection (Default: varchar)
- ansi-type is the ansi type of data (By default, derived from the type)
- precision is the precision of the data type (Default value of the data type)
- scale is the scale of the data type (Default value of the data type)
- comment is a comment on the column
- position is the physical column position
Primary Columns
primary-columns: [ "column_name1", "column_name2" ]
where primary-columns defines a list of column names that compose the primary key.
Extra Attributes
Each data resource type may need additional information about a table or a column. This information can be added at each level (table or column) as a attribute (ie property).
DataResourceAttribute1: value1
Columns:
- name: column_name
ColumnProperty1: value1
....
where:
- DataResourceAttribute1 is a resource attribute
- ColumnProperty1 is a column attribute
A Property value may be:
- a scalar (ie single value)
- a list
- or a mapping
The column data generators use them to add the data-supplier argument.