About
This how-to will show you how to use the tabul fill command to fill all the tables of a schema with generated data.
Steps
Prerequisites
You should have Tabulify installed on your computer.
Creation of the schema
You could:
- follow this howto: How to copy a schema
- or execute the following statements:
# drop all tables in sqlite
tabul data drop --no-strict-selection *@sqlite
# Create the store_sales star schema (ie the store_sales and its dependencies tables)
tabul data create --with-dependencies store_sales@tpcds 'tpc_${input_logicalName}@sqlite'
# in bash, you need to escape the $ with \ sign or quote the argument so that it's not seen as a bash variable
Schema Listing
You should now see in the Sqlite HowTo connection the following tables.
tabul data list --with-dependencies tpc_store_sales@sqlite
path media_type
-------------------------- ----------
tpc_customer table
tpc_customer_address table
tpc_customer_demographics table
tpc_date_dim table
tpc_household_demographics table
tpc_income_band table
tpc_item table
tpc_promotion table
tpc_store table
tpc_store_sales table
tpc_time_dim table
Schema Filling
Let's fill the table with the tabul data fill command that applies the fill operation.
tabul data fill --with-dependencies tpc_store_sales@sqlite
Transfer results
input target latency record_count error_code error_message
--------------------------------- --------------------------------- ------- ------------ ---------- -------------
tpc_customer@memgen tpc_customer@sqlite 0.18s 100
tpc_customer_address@memgen tpc_customer_address@sqlite 0.23s 100
tpc_customer_demographics@memgen tpc_customer_demographics@sqlite 0.39s 100
tpc_date_dim@memgen tpc_date_dim@sqlite 0.48s 100
tpc_household_demographics@memgen tpc_household_demographics@sqlite 0.12s 100
tpc_income_band@memgen tpc_income_band@sqlite 0.10s 100
tpc_item@memgen tpc_item@sqlite 0.40s 100
tpc_promotion@memgen tpc_promotion@sqlite 0.20s 100
tpc_store@memgen tpc_store@sqlite 0.34s 100
tpc_store_sales@memgen tpc_store_sales@sqlite 0.23s 100
tpc_time_dim@memgen tpc_time_dim@sqlite 0.15s 100
Looking at the result
The fill operation will by default enforce:
tabul data head --limit 10 date_dim@sqlite
Controlling the generated data with Generator
A generator is a file that contains the data generation definition.
For each column, a column data generator is defined that control the data generated.
The below generator generates one year of data with two columns:
- d_date that has a date sequence generator to generate date data from 2001-01-01 and upwards
- dyear that has a expression generator that extracts the year of the d_date column.
kind: generator
spec:
LogicalName: date_dim
Columns:
- name: d_date
type: date
comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
data-supplier:
type: sequence
arguments:
start: 2001-01-01
step: 1
- name: d_year
type: integer
precision: 4
comment: A column with a expression generator that extracts the year from the date column
data-supplier:
type: expression
arguments:
column-variable: d_date
expression: "d_date.getFullYear()"
This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated
tabul data head date_dim_2001--generator.yml@howto
The first 10 rows of the data resource (date_dim_2001@memgen):
d_date d_year
---------- ------
2001-01-01 2001
2001-01-02 2001
2001-01-03 2001
2001-01-04 2001
2001-01-05 2001
2001-01-06 2001
2001-01-07 2001
2001-01-08 2001
2001-01-09 2001
2001-01-10 2001
Fill with generators
After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controlled.
tabul data fill --generator-selector date_dim_2001--generator.yml@howto *@sqlite
As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with the glob pattern.
Output:
Transfer results
input target latency record_count error_code error_message
--------------------------------- --------------------------------- ------- ------------ ---------- -------------
tpc_customer@memgen tpc_customer@sqlite 0.18s 100
tpc_customer_address@memgen tpc_customer_address@sqlite 0.22s 100
tpc_customer_demographics@memgen tpc_customer_demographics@sqlite 0.35s 100
tpc_date_dim@memgen tpc_date_dim@sqlite 0.42s 100
tpc_household_demographics@memgen tpc_household_demographics@sqlite 0.9s 100
tpc_income_band@memgen tpc_income_band@sqlite 0.9s 100
tpc_item@memgen tpc_item@sqlite 0.25s 100
tpc_promotion@memgen tpc_promotion@sqlite 0.19s 100
tpc_store@memgen tpc_store@sqlite 0.29s 100
tpc_store_sales@memgen tpc_store_sales@sqlite 0.23s 100
tpc_time_dim@memgen tpc_time_dim@sqlite 0.11s 100