---json
{
    "page_id": "ognjto9n01l621jzyrzrq"
}
---
====== Learning Tabulify - Step 9 - How to fill a data resource with generated data ? ======


===== Data Generation =====
''Tabulify'' integrates natively a [[docs:generator:generator|data generator]].

You can generate realistic production data and start working on your project right away.


<note tip>
Because the data is fake but realistic, you don't need to:
  * anonymize production data in your development environment because of the privacy laws
  * or create any acceptance environment.
</note>


===== The fill command =====

The [[docs:op:fill|data fill operation]] is an operation that will [[docs:op:select|select target data resource]] and fill them with data.

''Tabulify'' supports two mode:
  * [[#auto fill|auto]] - the data generated is automatically chosen
  * [[#generator|generator]] - the data generated is defined in a file called the ''generator''

The [[docs:op:fill|fill operation]] is supported by the [[docs:tabul:data:fill|data fill command]].

===== Auto Fill =====

Let's first delete all data with the [[docs:tabul:data:truncate|data truncate command]] to get a clean schema.
<unit >
<code bash>
tabul data truncate *@sqlite
</code>
</unit>

The below [[docs:tabul:data:fill|fill command]] will fill all [[docs:resource:sql_table|tables]] with auto-generated data

<unit >
<code bash>
tabul data fill *@sqlite
</code>

<console>
Transfer results
input                           target                          latency   record_count   error_code   error_message
-----------------------------   -----------------------------   -------   ------------   ----------   -------------
call_center@memgen              call_center@sqlite              0.33s              100                
catalog_page@memgen             catalog_page@sqlite             0.11s              100                
catalog_sales@memgen            catalog_sales@sqlite            0.34s              100                
customer@memgen                 customer@sqlite                 0.17s              100                
customer_address@memgen         customer_address@sqlite         0.15s              100                
customer_demographics@memgen    customer_demographics@sqlite    0.18s              100                
date_dim@memgen                 date_dim@sqlite                 0.54s              100                
household_demographics@memgen   household_demographics@sqlite   0.9s               100                
income_band@memgen              income_band@sqlite              0.9s               100                
item@memgen                     item@sqlite                     0.25s              100                
promotion@memgen                promotion@sqlite                0.18s              100                
ship_mode@memgen                ship_mode@sqlite                0.13s              100                
store@memgen                    store@sqlite                    0.27s              100                
store_sales@memgen              store_sales@sqlite              0.25s              100                
time_dim@memgen                 time_dim@sqlite                 0.13s              100                
warehouse@memgen                warehouse@sqlite                0.16s              100                
web_page@memgen                 web_page@sqlite                 0.12s              100                
web_sales@memgen                web_sales@sqlite                0.26s              100                
web_site@memgen                 web_site@sqlite                 0.23s              100
</console>
</unit>

<note>
The [[docs:tabul:data:fill|data fill command]] loads ''100'' records for each [[docs:resource:sql_table|table]] because this is the default value of the  [[docs:generator:max-record-count|max-record-count]] [[docs:tabul:option|option]] (This option defines the number of records generated).
</note>


===== Query 11 =====

By running the [[howto:getting_started:7_sql_query|query 11 (of the query lesson)]], we don't get any data back.
<unit >
<code bash>
tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
</code>
<console>
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id   customer_first_name   customer_last_name   customer_email_address
-----------   -------------------   ------------------   ----------------------
</console>
</unit>

Why ? Because the [[howto:getting_started:7_sql_query|query 11]] is based on time data of the year ''2001'' and unfortunately the auto-generated data does not contain ''2001'' in the ''d_year'' column.

<unit >
<code bash>
tabul data head --limit 10 date_dim@sqlite
</code>
<console>
The first 10 rows of the data resource (date_dim@sqlite):
d_date_sk   d_date_id   d_date       d_month_seq   d_week_seq   d_quarter_seq   d_year   d_dow   d_moy   d_dom   d_qoy   d_fy_year   d_fy_quarter_seq   d_fy_week_seq   d_day_name   d_quarter_name   d_holiday   d_weekend   d_following_holiday   d_first_dom   d_last_dom   d_same_day_ly   d_same_day_lq   d_current_day   d_current_week   d_current_month   d_current_quarter   d_current_year
---------   ---------   ----------   -----------   ----------   -------------   ------   -----   -----   -----   -----   ---------   ----------------   -------------   ----------   --------------   ---------   ---------   -------------------   -----------   ----------   -------------   -------------   -------------   --------------   ---------------   -----------------   --------------
        1   a           2025-12-25             4            0              10        4       4       0       4       3           5                  1              10   k            u                j           b           v                               4            3               8               0   u               h                r                 t                   n
        2   b           2026-01-01             6            8               7        2       3       6       6       6           6                  4               0   u            h                f           s           o                               6            2               7               5   l               w                k                 h                   a
        3   c           2025-12-24             7            6               7        6       5      10       5       0           0                  5               9   i            c                l           e           g                              10            6               4               5   k               d                l                 j                   w
        4   d           2025-12-30             0            9               0        0       6       1       6       8           2                  3               4   m            e                o           b           z                               1            2               9               2   h               n                z                 c                   i
        5   e           2025-12-31             9            5               4        5       8       9       0      10           3                  1               8   j            j                m           j           g                               8            7               7               8   p               x                s                 d                   h
        6   f           2025-12-24             6            8               2        0       8       3       3       3           3                  9               3   p            o                x           l           h                               0            3               3               7   g               x                o                 j                   v
        7   g           2026-01-01             9            0               8        1       1       2       8       0           3                  5               0   z            f                l           m           i                               2            9               2               3   d               z                d                 m                   q
        8   h           2025-12-27             9            4               9        2       7       7       5       3           5                  2               5   l            p                p           r           x                               0            8               0               3   f               g                i                 l                   j
        9   i           2026-01-01             7            6               6        2       1       4       6       2           8                  8               1   c            z                k           j           g                               4            1               1               3   f               t                h                 j                   m
       10   j           2026-01-02             4            3               2        0       4       4       9       2           0                  9               7   x            n                z           s           u                               1            9               9               0   n               q                v                 l                   u
</console>
</unit>

To update the column ''dyear'' with data from the year ''2001'', we will use a [[#generator|generator]] in the next section.

===== Generator =====

A [[docs:resource:generator|generator]] is a [[docs:resource:file|file]] that contains the data generation definition.

For each column, a [[docs:generator:data-supplier|column data generator]] is defined that control the data generated.

The below [[docs:resource:generator|generator]] generates one year of data with two columns:
  * ''d_date'' that has a [[docs:generator:sequence|date sequence generator]] to generate ''date'' data from ''2001-01-01'' and upwards
  * ''dyear'' that has a [[docs:generator:expression|expression generator]] that extracts the ''year'' of the ''d_date'' column.

<unit>
<file yaml date_dim_2001--generator.yml>
kind: generator
spec:
  LogicalName: date_dim
  # 1000 records to be sure that we have one record by day, the default is 100
  max-record-count: 1000
  Columns:
    - name: d_date
      type: date
      comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
      data-supplier:
        type: sequence
        arguments:
          start: 2001-01-01
          step: 1
    - name: d_year
      type: integer
      precision: 4
      comment: A column with a expression generator that extracts the year from the date column
      data-supplier:
        type: expression
        arguments:
          column-variable: d_date
          expression: "d_date.getFullYear()"

</file>
</unit>

This [[docs:resource:generator|generator]] is also a [[docs:resource:content|content resource]] and therefore you can use it as any [[docs:resource:tabular|tabular resource]] and take a look at the data generated

<unit>
<code bash>
tabul data head date_dim_2001--generator.yml@howto
</code>
<console>
The first 10 rows of the data resource (date_dim_2001@memgen):
d_date       d_year
----------   ------
2001-01-01     2001
2001-01-02     2001
2001-01-03     2001
2001-01-04     2001
2001-01-05     2001
2001-01-06     2001
2001-01-07     2001
2001-01-08     2001
2001-01-09     2001
2001-01-10     2001
</console>
</unit>

===== Fill with generators =====

After having created a generator for the ''date_dim'' [[docs:resource:sql_table|table]], we can pass it to the [[docs:tabul:data:fill|data fill command]] with the ''--generator-selector'' option to make the data generation more controled.
<unit >
<code bash>
tabul data fill --generator-selector date_dim_2001--generator.yml@howto   *@sqlite
</code>
<note>
As the option ''generator-selector'' is a [[docs:flow:selector|resource selector]], you can create a [[docs:resource:generator|generator]] for each table where you want to customize the generated data and select them with a [[docs:common:globbing|glob pattern]].
</note>
Output:
<console>
Transfer results
input                           target                          latency   record_count   error_code   error_message
-----------------------------   -----------------------------   -------   ------------   ----------   -------------
call_center@memgen              call_center@sqlite              0.33s              100                
catalog_page@memgen             catalog_page@sqlite             0.11s              100                
catalog_sales@memgen            catalog_sales@sqlite            0.34s              100                
customer@memgen                 customer@sqlite                 0.16s              100                
customer_address@memgen         customer_address@sqlite         0.20s              100                
customer_demographics@memgen    customer_demographics@sqlite    0.15s              100                
date_dim@memgen                 date_dim@sqlite                 0.968s            1000                
household_demographics@memgen   household_demographics@sqlite   0.9s               100                
income_band@memgen              income_band@sqlite              0.10s              100                
item@memgen                     item@sqlite                     0.24s              100                
promotion@memgen                promotion@sqlite                0.18s              100                
ship_mode@memgen                ship_mode@sqlite                0.10s              100                
store@memgen                    store@sqlite                    0.23s              100                
store_sales@memgen              store_sales@sqlite              0.23s              100                
time_dim@memgen                 time_dim@sqlite                 0.12s              100                
warehouse@memgen                warehouse@sqlite                0.13s              100                
web_page@memgen                 web_page@sqlite                 0.15s              100                
web_sales@memgen                web_sales@sqlite                0.36s              100                
web_site@memgen                 web_site@sqlite                 0.22s              100
</console>
</unit>

And the ''query 11'' is now giving back a result. The generated data is minimal and should be further defined.

<unit >
<code bash>
tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
</code>

<console>
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id   customer_first_name   customer_last_name   customer_email_address
-----------   -------------------   ------------------   ----------------------
bg            o                     v                    o
cj            t                     a                    d
</console>
</unit>

===== Next =====
Learn how to compare data resource.

<btn>
[[10_resource_comparison|Data Resource Diff]]
</btn>