Data Definition (DataDef.yml)

Undraw Environment

About

A data definition is a manifest file property or argument that defines the metadata of a data resource (ie attributes and data structure)

Example

Manifest

Example in a kind resource manifest file

kind: csv
spec:
  data-def:
    logical-name: favorite_books
    header-row-id: 1
    delimiter-character: ','
    columns:
      - name: asin
        type: varchar
        precision: 20
      - name: description
        type: varchar
      - name: price
        type: double
      - name: group
        type: varchar


Operation Argument

Example as argument of the define step in a pipeline manifest

kind: pipeline
spec:
  steps:
    - operation: "define"
      arguments:
        # The data def argument
        data-def:
          logical-name: "colors"
          columns: ["id","color"]

Cli Options

You can also set them with tabul_cli_option.

For example, setting a semicolon separator to a CSV

tabul data head \
  --attribute delimiter-character=';' \
  books-semicolon.csv@howto

Usage

Resource Manifest

Data definition may be defined in any resource manifest (ie yaml file that defines a resource)

Operation Argument

This format is used in pipeline step in the data-def argument.

Example:

Tabul Cli Option

You can set the data definition with the following tabul options

Format

The following data definition file shows the common structure of all data definition file that defines the name of the tabular structure and its columns.

Scalar

At the root, you can set any scalar attributes such as the common attributes

Example:

logical-name: LogicalName

where

  • logical-name is the logical name of the resource (Default to the name of the file without structure information).

Relational Structure

Columns

columns:
  - name: column_name1
    Type: date
    ansi-type: date 
    Precision:
    Scale:
    Comment:
    Position: 1
  - name: column_name2
    type: varchar
    precision:
    scale: 0
    comment: A comment

where:

  • columns defines the columns
    • name is the name of the column
    • type is the data type name of the connection (Default: varchar)
    • ansi-type is the ansi type of data (By default, derived from the type)
    • precision is the precision of the data type (Default value of the data type)
    • scale is the scale of the data type (Default value of the data type)
    • comment is a comment on the column
    • position is the physical column position

Primary Columns

primary-columns: [ "column_name1", "column_name2" ]

where primary-columns defines a list of column names that compose the primary key.

Extra Attributes

Each data resource type may need additional information about a table or a column. This information can be added at each level (table or column) as a attribute (ie property).

DataResourceAttribute1: value1
Columns:
    - name: column_name
      ColumnProperty1: value1
....

where:

A Property value may be:

  • a scalar (ie single value)
  • a list
  • or a mapping

The column data generators use them to add the data-supplier argument.




Related HowTo
Undraw Environment
Tabul - How to fill a table with a data generation file

This how-to will show you how to define the generation of data via a data definition file and load it into a table via the tabul data fill operation.

Task Runner