A data definition is a manifest file property or argument that defines the metadata of a data resource (ie attributes and data structure)
Example in a kind resource manifest file
kind: csv
spec:
data-def:
logical-name: favorite_books
header-row-id: 1
delimiter-character: ','
columns:
- name: asin
type: varchar
precision: 20
- name: description
type: varchar
- name: price
type: double
- name: group
type: varchar
Example as argument of the define step in a pipeline manifest
kind: pipeline
spec:
steps:
- operation: "define"
arguments:
# The data def argument
data-def:
logical-name: "colors"
columns: ["id","color"]
You can also set them with tabul_cli_option.
For example, setting a semicolon separator to a CSV
tabul data head \
--attribute delimiter-character=';' \
books-semicolon.csv@howto
Data definition may be defined in any resource manifest (ie yaml file that defines a resource)
This format is used in pipeline step in the data-def argument.
Example:
You can set the data definition with the following tabul options
The following data definition file shows the common structure of all data definition file that defines the name of the tabular structure and its columns.
At the root, you can set any scalar attributes such as the common attributes
Example:
logical-name: LogicalName
where
columns:
- name: column_name1
Type: date
ansi-type: date
Precision:
Scale:
Comment:
Position: 1
- name: column_name2
type: varchar
precision:
scale: 0
comment: A comment
where:
primary-columns: [ "column_name1", "column_name2" ]
where primary-columns defines a list of column names that compose the primary key.
Each data resource type may need additional information about a table or a column. This information can be added at each level (table or column) as a attribute (ie property).
DataResourceAttribute1: value1
Columns:
- name: column_name
ColumnProperty1: value1
....
where:
A Property value may be:
The column data generators use them to add the data-supplier argument.