Tabulify - How to fill a database relational schema with generated data and Tabul

About

This how-to will show you how to use the tabul fill command to fill all the tables of a schema with generated data.

Steps

Prerequisites

You should have Tabulify installed on your computer.

Learning Tabulify - Step 1 - Installation

Creation of the schema

You could:

# drop all tables in sqlite
tabul data drop --no-strict-selection *@sqlite
# Create the store_sales star schema (ie the store_sales and its dependencies tables)
tabul data create --with-dependencies store_sales@tpcds 'tpc_${input_logicalName}@sqlite'
# in bash, you need to escape the $ with \ sign or quote the argument so that it's not seen as a bash variable

Schema Listing

You should now see in the Sqlite HowTo connection the following tables.

tabul data list --with-dependencies tpc_store_sales@sqlite
path                         media_type
--------------------------   ----------
tpc_customer                 table
tpc_customer_address         table
tpc_customer_demographics    table
tpc_date_dim                 table
tpc_household_demographics   table
tpc_income_band              table
tpc_item                     table
tpc_promotion                table
tpc_store                    table
tpc_store_sales              table
tpc_time_dim                 table

Schema Filling

Let's fill the table with the tabul data fill command that applies the fill operation.

tabul data fill --with-dependencies tpc_store_sales@sqlite
Transfer results
input                               target                              latency   record_count   error_code   error_message
---------------------------------   ---------------------------------   -------   ------------   ----------   -------------
tpc_customer@memgen                 tpc_customer@sqlite                 0.18s              100
tpc_customer_address@memgen         tpc_customer_address@sqlite         0.23s              100
tpc_customer_demographics@memgen    tpc_customer_demographics@sqlite    0.39s              100
tpc_date_dim@memgen                 tpc_date_dim@sqlite                 0.48s              100
tpc_household_demographics@memgen   tpc_household_demographics@sqlite   0.12s              100
tpc_income_band@memgen              tpc_income_band@sqlite              0.10s              100
tpc_item@memgen                     tpc_item@sqlite                     0.40s              100
tpc_promotion@memgen                tpc_promotion@sqlite                0.20s              100
tpc_store@memgen                    tpc_store@sqlite                    0.34s              100
tpc_store_sales@memgen              tpc_store_sales@sqlite              0.23s              100
tpc_time_dim@memgen                 tpc_time_dim@sqlite                 0.15s              100

Looking at the result

The fill operation will by default enforce:

tabul data head --limit 10 date_dim@sqlite

Controlling the generated data with Generator

A generator is a file that contains the data generation definition.

For each column, a column data generator is defined that control the data generated.

The below generator generates one year of data with two columns:

kind: generator
spec:
  LogicalName: date_dim
  Columns:
    - name: d_date
      type: date
      comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
      data-supplier:
        type: sequence
        arguments:
          start: 2001-01-01
          step: 1
    - name: d_year
      type: integer
      precision: 4
      comment: A column with a expression generator that extracts the year from the date column
      data-supplier:
        type: expression
        arguments:
          column-variable: d_date
          expression: "d_date.getFullYear()"


This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated

tabul data head date_dim_2001--generator.yml@howto
The first 10 rows of the data resource (date_dim_2001@memgen):
d_date       d_year
----------   ------
2001-01-01     2001
2001-01-02     2001
2001-01-03     2001
2001-01-04     2001
2001-01-05     2001
2001-01-06     2001
2001-01-07     2001
2001-01-08     2001
2001-01-09     2001
2001-01-10     2001

Fill with generators

After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controlled.

tabul data fill --generator-selector date_dim_2001--generator.yml@howto   *@sqlite

As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with the glob pattern.

Output:

Transfer results
input                               target                              latency   record_count   error_code   error_message
---------------------------------   ---------------------------------   -------   ------------   ----------   -------------
tpc_customer@memgen                 tpc_customer@sqlite                 0.18s              100
tpc_customer_address@memgen         tpc_customer_address@sqlite         0.22s              100
tpc_customer_demographics@memgen    tpc_customer_demographics@sqlite    0.35s              100
tpc_date_dim@memgen                 tpc_date_dim@sqlite                 0.42s              100
tpc_household_demographics@memgen   tpc_household_demographics@sqlite   0.9s               100
tpc_income_band@memgen              tpc_income_band@sqlite              0.9s               100
tpc_item@memgen                     tpc_item@sqlite                     0.25s              100
tpc_promotion@memgen                tpc_promotion@sqlite                0.19s              100
tpc_store@memgen                    tpc_store@sqlite                    0.29s              100
tpc_store_sales@memgen              tpc_store_sales@sqlite              0.23s              100
tpc_time_dim@memgen                 tpc_time_dim@sqlite                 0.11s              100




Related Pages
Tabul - How to copy the data definition of all tables from a star/snowflake schema

This howto will show you how to copy the data definition of all SQL tables from a star/snowflake schema.
Tabul - How to copy the data definition of all tables from a star/snowflake schema

This howto will show you how to copy the data definition of all SQL tables from a star/snowflake schema.

Task Runner