Tabulify integrates natively a data generator.
You can generate realistic production data and start working on your project right away.
Because the data is fake but realistic, you don't need to:
The data fill operation is an operation that will select target data resource and fill them with data.
Tabulify supports two mode:
The fill operation is supported by the data fill command.
Let's first delete all data with the data truncate command to get a clean schema.
tabul data truncate *@sqlite
The below fill command will fill all tables with auto-generated data
tabul data fill *@sqlite
Transfer results
input target latency record_count error_code error_message
----------------------------- ----------------------------- ------- ------------ ---------- -------------
call_center@memgen call_center@sqlite 0.33s 100
catalog_page@memgen catalog_page@sqlite 0.11s 100
catalog_sales@memgen catalog_sales@sqlite 0.34s 100
customer@memgen customer@sqlite 0.17s 100
customer_address@memgen customer_address@sqlite 0.15s 100
customer_demographics@memgen customer_demographics@sqlite 0.18s 100
date_dim@memgen date_dim@sqlite 0.54s 100
household_demographics@memgen household_demographics@sqlite 0.9s 100
income_band@memgen income_band@sqlite 0.9s 100
item@memgen item@sqlite 0.25s 100
promotion@memgen promotion@sqlite 0.18s 100
ship_mode@memgen ship_mode@sqlite 0.13s 100
store@memgen store@sqlite 0.27s 100
store_sales@memgen store_sales@sqlite 0.25s 100
time_dim@memgen time_dim@sqlite 0.13s 100
warehouse@memgen warehouse@sqlite 0.16s 100
web_page@memgen web_page@sqlite 0.12s 100
web_sales@memgen web_sales@sqlite 0.26s 100
web_site@memgen web_site@sqlite 0.23s 100
The data fill command loads 100 records for each table because this is the default value of the max-record-count option (This option defines the number of records generated).
By running the query 11 (of the query lesson), we don't get any data back.
tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id customer_first_name customer_last_name customer_email_address
----------- ------------------- ------------------ ----------------------
Why ? Because the query 11 is based on time data of the year 2001 and unfortunately the auto-generated data does not contain 2001 in the d_year column.
tabul data head --limit 10 date_dim@sqlite
The first 10 rows of the data resource (date_dim@sqlite):
d_date_sk d_date_id d_date d_month_seq d_week_seq d_quarter_seq d_year d_dow d_moy d_dom d_qoy d_fy_year d_fy_quarter_seq d_fy_week_seq d_day_name d_quarter_name d_holiday d_weekend d_following_holiday d_first_dom d_last_dom d_same_day_ly d_same_day_lq d_current_day d_current_week d_current_month d_current_quarter d_current_year
--------- --------- ---------- ----------- ---------- ------------- ------ ----- ----- ----- ----- --------- ---------------- ------------- ---------- -------------- --------- --------- ------------------- ----------- ---------- ------------- ------------- ------------- -------------- --------------- ----------------- --------------
1 a 2025-12-25 4 0 10 4 4 0 4 3 5 1 10 k u j b v 4 3 8 0 u h r t n
2 b 2026-01-01 6 8 7 2 3 6 6 6 6 4 0 u h f s o 6 2 7 5 l w k h a
3 c 2025-12-24 7 6 7 6 5 10 5 0 0 5 9 i c l e g 10 6 4 5 k d l j w
4 d 2025-12-30 0 9 0 0 6 1 6 8 2 3 4 m e o b z 1 2 9 2 h n z c i
5 e 2025-12-31 9 5 4 5 8 9 0 10 3 1 8 j j m j g 8 7 7 8 p x s d h
6 f 2025-12-24 6 8 2 0 8 3 3 3 3 9 3 p o x l h 0 3 3 7 g x o j v
7 g 2026-01-01 9 0 8 1 1 2 8 0 3 5 0 z f l m i 2 9 2 3 d z d m q
8 h 2025-12-27 9 4 9 2 7 7 5 3 5 2 5 l p p r x 0 8 0 3 f g i l j
9 i 2026-01-01 7 6 6 2 1 4 6 2 8 8 1 c z k j g 4 1 1 3 f t h j m
10 j 2026-01-02 4 3 2 0 4 4 9 2 0 9 7 x n z s u 1 9 9 0 n q v l u
To update the column dyear with data from the year 2001, we will use a generator in the next section.
A generator is a file that contains the data generation definition.
For each column, a column data generator is defined that control the data generated.
The below generator generates one year of data with two columns:
kind: generator
spec:
LogicalName: date_dim
# 1000 records to be sure that we have one record by day, the default is 100
max-record-count: 1000
Columns:
- name: d_date
type: date
comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
data-supplier:
type: sequence
arguments:
start: 2001-01-01
step: 1
- name: d_year
type: integer
precision: 4
comment: A column with a expression generator that extracts the year from the date column
data-supplier:
type: expression
arguments:
column-variable: d_date
expression: "d_date.getFullYear()"
This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated
tabul data head date_dim_2001--generator.yml@howto
The first 10 rows of the data resource (date_dim_2001@memgen):
d_date d_year
---------- ------
2001-01-01 2001
2001-01-02 2001
2001-01-03 2001
2001-01-04 2001
2001-01-05 2001
2001-01-06 2001
2001-01-07 2001
2001-01-08 2001
2001-01-09 2001
2001-01-10 2001
After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controled.
tabul data fill --generator-selector date_dim_2001--generator.yml@howto *@sqlite
As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with a glob pattern.
Output:
Transfer results
input target latency record_count error_code error_message
----------------------------- ----------------------------- ------- ------------ ---------- -------------
call_center@memgen call_center@sqlite 0.33s 100
catalog_page@memgen catalog_page@sqlite 0.11s 100
catalog_sales@memgen catalog_sales@sqlite 0.34s 100
customer@memgen customer@sqlite 0.16s 100
customer_address@memgen customer_address@sqlite 0.20s 100
customer_demographics@memgen customer_demographics@sqlite 0.15s 100
date_dim@memgen date_dim@sqlite 0.968s 1000
household_demographics@memgen household_demographics@sqlite 0.9s 100
income_band@memgen income_band@sqlite 0.10s 100
item@memgen item@sqlite 0.24s 100
promotion@memgen promotion@sqlite 0.18s 100
ship_mode@memgen ship_mode@sqlite 0.10s 100
store@memgen store@sqlite 0.23s 100
store_sales@memgen store_sales@sqlite 0.23s 100
time_dim@memgen time_dim@sqlite 0.12s 100
warehouse@memgen warehouse@sqlite 0.13s 100
web_page@memgen web_page@sqlite 0.15s 100
web_sales@memgen web_sales@sqlite 0.36s 100
web_site@memgen web_site@sqlite 0.22s 100
And the query 11 is now giving back a result. The generated data is minimal and should be further defined.
tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id customer_first_name customer_last_name customer_email_address
----------- ------------------- ------------------ ----------------------
bg o v o
cj t a d
Learn how to compare data resource.