This page is part of the Tabulify learning guide and to be able to follow it, you should have followed the transfer step.
The transfer page should have made you familiar with the transfer concept where we have introduced the copy operation.
The copy and move operation does not modify any actual data. They just create a target data resource that is an exact copy of the source data resource.
This page introduces you to the data modification operations that modify the target data resource content. ie
They all works the same way, they accepts a source data resource that defines the data to insert, update, upsert or delete.
The tpc-ds warehouse table will be our target data resource for this page.
The warehouse table has one row that we can see with the data print command
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------
1 AAAAAAAABAAAAAAA Conventional childr 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
The upsert data operation is supported by the data upsert command.
This upsert operation:
We will:
The data to upsert is defined in the warehouse_upsert.csv Csv data resource in the howto connection.
We take a look at its content with the data print command:
tabul data print warehouse_upsert.csv@howto
warehouse_upsert.csv@howto
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
tabul data upsert warehouse_upsert.csv@howto warehouse@sqlite
Transfer results
input target latency record_count error_code error_message
-------------------------- ---------------- ------- ------------ ---------- -------------
warehouse_upsert.csv@howto warehouse@sqlite 0.52s 2
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
Let's say for the sake of simplicity that we want to increase the square footage column w_warehouse_sq_ft of the warehouse by 10 square meter.
The source query needs to include:
Example with the warehouse_update.sql file in the howto connection
tabul data print --type text warehouse_update.sql@howto
select
w_warehouse_sk,
w_warehouse_sq_ft+10 as w_warehouse_sq_ft, -- the name of the column match the name of the target column
w_warehouse_sq_ft as w_warehouse_sq_ft_old_value -- just to show the diff
from
warehouse w2
;
tabul data print '(warehouse_update.sql@howto)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(warehouse_update.sql@howto)@sqlite
w_warehouse_sk w_warehouse_sq_ft w_warehouse_sq_ft_old_value
-------------- ----------------- ---------------------------
1 977797 977787
2 100010 100000
Calling the data update command with the query as source will update the warehouse table w_warehouse_sq_ft columns by 10 square meters.
tabul data update \
--transfer-mapping-strict false \
'(warehouse_update.sql@howto)@sqlite' warehouse@sqlite
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
Transfer results
input target latency record_count error_code error_message
----------------------------------- ---------------- ------- ------------ ---------- -------------
(warehouse_update.sql@howto)@sqlite warehouse@sqlite 0.40s 2
where:
We can verify it by printing the warehouse table
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100010 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
The delete command takes as input a source data resource that contains the primary or unique key of a target table.
We want to delete the second record, the CSV file contains then the primary key with the value 2
tabul data print warehouse_delete.csv@howto
warehouse_delete.csv@howto
w_warehouse_sk
--------------
2
tabul data delete warehouse_delete.csv@howto warehouse@sqlite
Transfer results
input target latency record_count error_code error_message
-------------------------- ---------------- ------- ------------ ---------- -------------
warehouse_delete.csv@howto warehouse@sqlite 0.45s 1
tabul data list -a name -a count warehouse@sqlite
path media_type name count
--------- ---------- --------- -----
warehouse table warehouse 1
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
Until now, all data manipulation were out-of-place, meaning that there was a source and a target.
You can also apply in-place Data Manipulation with SQL Data Manipulation Language request (UPDATE/INSERT/UPDATE/MERGE)
For a full example, applying DDL and DML request, see this howto
With the laws on data privacy, it's now mandatory to develop data application with generated data. Tabulify has built-in a data generator that we will explore in the next page.
How to fill a resource with generated data ?