Learning Tabulify - Step 9 - How to fill a data resource with generated data ?

Data Generation

Tabulify integrates natively a data generator.

You can generate realistic production data and start working on your project right away.

Because the data is fake but realistic, you don't need to:

  • anonymize production data in your development environment because of the privacy laws
  • or create any acceptance environment.

The fill command

The data fill operation is an operation that will select target data resource and fill them with data.

Tabulify supports two mode:

  • auto - the data generated is automatically chosen
  • generator - the data generated is defined in a file called the generator

The fill operation is supported by the data fill command.

Auto Fill

Let's first delete all data with the data truncate command to get a clean schema.

tabul data truncate *@sqlite

The below fill command will fill all tables with auto-generated data

tabul data fill *@sqlite
Transfer results
input                           target                          latency   record_count   error_code   error_message
-----------------------------   -----------------------------   -------   ------------   ----------   -------------
call_center@memgen              call_center@sqlite              0.37s              100
catalog_page@memgen             catalog_page@sqlite             0.12s              100
catalog_sales@memgen            catalog_sales@sqlite            0.36s              100
customer@memgen                 customer@sqlite                 0.16s              100
customer_address@memgen         customer_address@sqlite         0.21s              100
customer_demographics@memgen    customer_demographics@sqlite    0.15s              100
date_dim@memgen                 date_dim@sqlite                 0.82s              100
household_demographics@memgen   household_demographics@sqlite   0.10s              100
income_band@memgen              income_band@sqlite              0.10s              100
item@memgen                     item@sqlite                     0.32s              100
promotion@memgen                promotion@sqlite                0.19s              100
ship_mode@memgen                ship_mode@sqlite                0.12s              100
store@memgen                    store@sqlite                    0.30s              100
store_sales@memgen              store_sales@sqlite              0.23s              100
time_dim@memgen                 time_dim@sqlite                 0.13s              100
warehouse@memgen                warehouse@sqlite                0.17s              100
web_page@memgen                 web_page@sqlite                 0.16s              100
web_sales@memgen                web_sales@sqlite                0.34s              100
web_site@memgen                 web_site@sqlite                 0.26s              100

The data fill command loads 100 records for each table because this is the default value of the max-record-count option (This option defines the number of records generated).

Query 11

By running the query 11 (of the query lesson), we don't get any data back.

tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id   customer_first_name   customer_last_name   customer_email_address
-----------   -------------------   ------------------   ----------------------

Why ? Because the query 11 is based on time data of the year 2001 and unfortunately the auto-generated data does not contain 2001 in the d_year column.

tabul data head --limit 10 date_dim@sqlite
The first 10 rows of the data resource (date_dim@sqlite):
d_date_sk   d_date_id   d_date       d_month_seq   d_week_seq   d_quarter_seq   d_year   d_dow   d_moy   d_dom   d_qoy   d_fy_year   d_fy_quarter_seq   d_fy_week_seq   d_day_name   d_quarter_name   d_holiday   d_weekend   d_following_holiday   d_first_dom   d_last_dom   d_same_day_ly   d_same_day_lq   d_current_day   d_current_week   d_current_month   d_current_quarter   d_current_year
---------   ---------   ----------   -----------   ----------   -------------   ------   -----   -----   -----   -----   ---------   ----------------   -------------   ----------   --------------   ---------   ---------   -------------------   -----------   ----------   -------------   -------------   -------------   --------------   ---------------   -----------------   --------------
        1   a           2025-10-31             7            2               4        7       5       1       8       0           0                  6               4   k            t                q           x           x                              10            6               8               4   u               p                u                 d                   t
        2   b           2025-11-05             7            3               0        1       6       8       9       5           3                  0              10   p            m                u           f           j                               8            2              10               9   m               m                z                 x                   h
        3   c           2025-11-02             6            6               4        2       1       6       3       1           3                  7               7   w            k                p           i           c                               3            2               4               5   l               l                q                 m                   m
        4   d           2025-11-04             3            1               1        5       1       1       5       0          10                  4               5   r            g                k           f           b                               0            1               2               8   b               w                h                 p                   x
        5   e           2025-11-04             5            8               8        2      10       7       1       4           1                  2               6   n            x                l           w           p                               7            7               0               3   x               o                f                 h                   l
        6   f           2025-11-06             7            3               0        3       5      10       8       5          10                  6              10   a            c                n           l           w                               7            1               7              10   e               k                h                 t                   m
        7   g           2025-11-10             3            0               4        7       0       0       4       4           1                  0               0   o            y                r           f           q                               0            8              10               7   l               d                c                 l                   u
        8   h           2025-11-02             8           10               8        7       2      10       7       8           5                  7               0   i            o                n           a           i                               9            2               5              10   k               z                c                 x                   k
        9   i           2025-11-10             6            2               2        8       6       3       3       9          10                  0               1   w            z                a           m           h                               4            3               8               8   n               w                n                 l                   y
       10   j           2025-11-09             2           10               5        8       3       4      10       9           8                  4               7   p            h                u           m           i                              10            8               2               0   o               u                o                 r                   w

To update the column dyear with data from the year 2001, we will use a generator in the next section.

Generator

A generator is a file that contains the data generation definition.

For each column, a column data generator is defined that control the data generated.

The below generator generates one year of data with two columns:

kind: generator
spec:
  LogicalName: date_dim
  Columns:
    - name: d_date
      type: date
      comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
      data-supplier:
        type: sequence
        arguments:
          start: 2001-01-01
          step: 1
    - name: d_year
      type: integer
      precision: 4
      comment: A column with a expression generator that extracts the year from the date column
      data-supplier:
        type: expression
        arguments:
          column-variable: d_date
          expression: "d_date.getFullYear()"


This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated

tabul data head date_dim_2001--generator.yml@howto
The first 10 rows of the data resource (date_dim_2001@memgen):
d_date       d_year
----------   ------
2001-01-01     2001
2001-01-02     2001
2001-01-03     2001
2001-01-04     2001
2001-01-05     2001
2001-01-06     2001
2001-01-07     2001
2001-01-08     2001
2001-01-09     2001
2001-01-10     2001

Fill with generators

After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controled.

tabul data fill --generator-selector date_dim_2001--generator.yml@howto   *@sqlite

As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with a glob pattern.

Output:

Transfer results
input                           target                          latency   record_count   error_code   error_message
-----------------------------   -----------------------------   -------   ------------   ----------   -------------
call_center@memgen              call_center@sqlite              0.37s              100
catalog_page@memgen             catalog_page@sqlite             0.13s              100
catalog_sales@memgen            catalog_sales@sqlite            0.37s              100
customer@memgen                 customer@sqlite                 0.19s              100
customer_address@memgen         customer_address@sqlite         0.16s              100
customer_demographics@memgen    customer_demographics@sqlite    0.13s              100
date_dim@memgen                 date_dim@sqlite                 0.351s             100
household_demographics@memgen   household_demographics@sqlite   0.11s              100
income_band@memgen              income_band@sqlite              0.9s               100
item@memgen                     item@sqlite                     0.35s              100
promotion@memgen                promotion@sqlite                0.18s              100
ship_mode@memgen                ship_mode@sqlite                0.11s              100
store@memgen                    store@sqlite                    0.29s              100
store_sales@memgen              store_sales@sqlite              0.23s              100
time_dim@memgen                 time_dim@sqlite                 0.15s              100
warehouse@memgen                warehouse@sqlite                0.16s              100
web_page@memgen                 web_page@sqlite                 0.12s              100
web_sales@memgen                web_sales@sqlite                0.28s              100
web_site@memgen                 web_site@sqlite                 0.28s              100

And the query 11 is now giving back a result. The generated data is minimal and should be further defined.

tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id   customer_first_name   customer_last_name   customer_email_address
-----------   -------------------   ------------------   ----------------------

Next

Learn how to compare data resource.

Data Resource Diff




Related Pages
Generator - MaxRecordCount attribute

maxRecordCount is a attribute of a generator data resource (ie the maximum of the count attribute) It defines the maximum number of record that a generator data resource can generate. MaxRecordCount...
How to read and write an Excel file?

This howto will demonstrate you how to read and write to an Excel resource. To following this howto, you should have followed the getting started guide to have knowledge of: Resource creation...
Learning Tabulify - Step 8 - How to modify the content of a table

Tabulify learning guidetransfer step The transfer page should have made you familiar with the transfer concept where we have introduced the copy operation. The copy and move operation does not modify...

Task Runner