---json
{
"page_id": "tinebxd98hfmbtsflf7eu"
}
---
====== Data Definition (DataDef.yml) ======
===== About =====
A ''data definition'' is a [[#manifest|manifest file property]] or [[#argument|argument]] that defines the [[docs:resource:metadata|metadata]] of a [[resource|data resource]] (ie [[attribute|attributes]] and [[structure|data structure]])
===== Example =====
==== Manifest====
Example in a [[docs:resource:manifest#kind|kind resource manifest file]]
kind: csv
spec:
data-def:
logical-name: favorite_books
header-row-id: 1
delimiter-character: ','
columns:
- name: asin
type: varchar
precision: 20
- name: description
type: varchar
- name: price
type: double
- name: group
type: varchar
==== Operation Argument ====
Example as argument of the [[:docs:op:define|define step]] in a [[:docs:flow:pipeline|pipeline manifest]]
kind: pipeline
spec:
steps:
- operation: "define"
arguments:
# The data def argument
data-def:
logical-name: "colors"
columns: ["id","color"]
==== Cli Options ====
You can also set them with [[#tabul cli option]].
For example, setting a semicolon separator to a [[:docs:resource:csv|CSV]]
tabul data head \
--attribute delimiter-character=';' \
books-semicolon.csv@howto
===== Usage =====
==== Resource Manifest ====
''Data definition'' may be defined in any [[docs:resource:manifest|resource manifest]] (ie yaml file that defines a resource)
==== Operation Argument ====
This format is used in pipeline step in the ''data-def'' argument.
Example:
* the [[docs:op:define|define operation]]
* the [[docs:op:select|select operation]]
* the [[:docs:op:enrich|enrich operation]]
==== Tabul Cli Option ====
You can set the data definition with the following [[docs:tabul:option|tabul options]]
* ''source-attribute'' or ''target-attribute'' in a [[docs:tabul:data:transfer|transfer command]]
* ''attribute'' for others [[docs:tabul:data:start|data command]]
===== Format =====
The following ''data definition'' file shows the common structure of all data definition file that defines the name of the tabular structure and its columns.
==== Scalar ====
At the root, you can set any scalar attributes such as the [[docs:resource:attribute|common attributes]]
Example:
logical-name: LogicalName
where
* ''logical-name'' is the [[logical_name|logical name]] of the [[resource|resource]] (Default to the name of the file without structure information).
==== Relational Structure ====
=== Columns ===
columns:
- name: column_name1
Type: date
ansi-type: date
Precision:
Scale:
Comment:
Position: 1
- name: column_name2
type: varchar
precision:
scale: 0
comment: A comment
where:
* ''columns'' defines the [[docs:resource:column|columns]]
* ''name'' is the name of the column
* ''type'' is the [[docs:data_type:data_type|data type name]] of the connection (Default: ''varchar'')
* ''ansi-type'' is the [[:docs:data_type:data_type#ansi|ansi type of data]] (By default, derived from the ''type'')
* ''precision'' is the precision of the data type (Default value of the data type)
* ''scale'' is the scale of the data type (Default value of the data type)
* ''comment'' is a comment on the column
* ''position'' is the physical column position
=== Primary Columns ===
primary-columns: [ "column_name1", "column_name2" ]
where ''primary-columns'' defines a list of column names that compose the primary key.
===== Extra Attributes =====
Each data resource type may need additional information about a table or a column. This information can be added at each level (table or column) as a attribute (ie property).
DataResourceAttribute1: value1
Columns:
- name: column_name
ColumnProperty1: value1
....
where:
* ''DataResourceAttribute1'' is a [[attribute|resource attribute]]
* ''ColumnProperty1'' is a column attribute
A Property value may be:
* a scalar (ie single value)
* a list
* or a mapping
The [[docs:generator:data-supplier|column data generators]] use them to add the ''data-supplier'' argument.