---json
{
"page_id": "wt5zet0z99msmhqniljgi"
}
---
====== Learning Tabulify - Step 8 - How to modify the content of a table ======
This page is part of the [[start|Tabulify learning guide]] and to be able to follow it, you should have followed the [[6_transfer_data_resource|transfer step]].
===== Data Modification =====
The [[6_transfer_data_resource|transfer page]] should have made you familiar with the [[docs:op:transfer|transfer concept]] where we have introduced the [[docs:op:copy|copy operation]].
The [[docs:op:copy|copy]] and [[docs:op:move|move]] operation does not modify any actual data. They just create a [[docs:flow:target|target data resource]] that is an ''exact'' [[docs:op:copy|copy]] of the source data resource.
This page introduces you to the [[docs:op:modication|data modification operations]] that modify the [[docs:flow:target|target data resource]] [[docs:resource:content|content]]. ie
* [[docs:op:insert|insertion / append]]
* [[docs:op:update|update]]
* [[docs:op:upsert|upsert]]
* [[docs:op:delete|delete]]
They all works the same way, they accepts a [[docs:flow:source|source data resource]] that defines the data to [[docs:op:insert|insert]], [[docs:op:update|update]], [[docs:op:upsert|upsert]] or [[docs:op:delete|delete]].
===== Warehouse - The target resource =====
This is to be able to re-run the page.
We put the warehouse table to the same state than before:
* We delete the row added
* We update the first row to the original
tabul data delete warehouse_delete.csv@howto warehouse@sqlite
tabul data update warehouse_original.csv@howto warehouse@sqlite
The [[:docs:system:tpcds|tpc-ds]] ''warehouse'' table will be our [[docs:flow:target|target data resource]] for this page.
The ''warehouse'' [[docs:resource:sql_table|table]] has one row that we can see with the [[docs:tabul:data:print|data print command]]
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------
1 AAAAAAAABAAAAAAA Conventional childr 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
===== Warehouse Upsert with a CSV file as Source =====
The [[docs:op:upsert|upsert data operation]] is supported by the [[docs:tabul:data:upsert|data upsert command]].
This upsert operation:
* will try to insert data
* and if it can't (because there is already a record identified by a primary key), it will update it.
We will:
* modify the name of the first warehouse to ''Modified Warehouse Name''
* add a ''second'' new warehouse row.
==== The source CSV file for the upsert ====
The data to upsert is defined in the ''warehouse_upsert.csv'' [[docs:resource:csv|Csv data resource]] in the [[docs:connection:howto|howto connection]].
We take a look at its content with the [[docs:tabul:data:print|data print command]]:
tabul data print warehouse_upsert.csv@howto
warehouse_upsert.csv@howto
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
==== The upsert command ====
* The [[docs:tabul:data:upsert|upsert command]] is a [[docs:tabul:data:transfer|transfer command]] and takes as argument:
* a [[docs:flow:source|source]] - the [[docs:resource:csv|csv file]] in the [[docs:connection:howto|howto connection]]
* a [[docs:flow:target|target]] - the [[docs:resource:sql_table|sql table]] in the [[docs:connection:howtos|sqlite connection]]
tabul data upsert warehouse_upsert.csv@howto warehouse@sqlite
* Run it and you get the below output:
Transfer results
input target latency record_count error_code error_message
-------------------------- ---------------- ------- ------------ ---------- -------------
warehouse_upsert.csv@howto warehouse@sqlite 0.52s 2
* You can verify that the [[docs:flow:target|target data resource]] has been changed with the [[docs:tabul:data:print|data print command]] and check that:
* there is a second new record
* the new name of the first warehouse is now ''Modified Warehouse Name''
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
===== Warehouse Update with a SQL query as Source =====
==== The source query for the update ====
Let's say for the sake of simplicity that we want to increase the square footage column ''w_warehouse_sq_ft'' of the warehouse by ''10'' square meter.
The [[docs:flow:source|source]] [[docs:resource:sql_select|query]] needs to include:
* the primary key ''w_warehouse_sk''
* the target column with the new value ''w_warehouse_sq_ft''
Example with the ''warehouse_update.sql'' [[docs:resource:file|file]] in the [[docs:connection:howto|howto]] connection
tabul data print --type text warehouse_update.sql@howto
select
w_warehouse_sk,
w_warehouse_sq_ft+10 as w_warehouse_sq_ft, -- the name of the column match the name of the target column
w_warehouse_sq_ft as w_warehouse_sq_ft_old_value -- just to show the diff
from
warehouse w2
;
* The result of the [[docs:resource:sql_select|select query]] can be seen with the [[docs:tabul:data:print|print command]]
tabul data print '(warehouse_update.sql@howto)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(warehouse_update.sql@howto)@sqlite
w_warehouse_sk w_warehouse_sq_ft w_warehouse_sq_ft_old_value
-------------- ----------------- ---------------------------
1 977797 977787
2 100010 100000
==== The update command ====
Calling the [[docs:tabul:data:update|data update command]] with the [[docs:resource:sql_select|query]] as [[docs:flow:source|source]] will update the ''warehouse'' table ''w_warehouse_sq_ft'' columns by ''10'' square meters.
tabul data update \
--transfer-mapping-strict false \
'(warehouse_update.sql@howto)@sqlite' warehouse@sqlite
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
Transfer results
input target latency record_count error_code error_message
----------------------------------- ---------------- ------- ------------ ---------- -------------
(warehouse_update.sql@howto)@sqlite warehouse@sqlite 0.40s 2
where:
* `--transfer-mapping-strict` is set to `false` because the source column `w_warehouse_sq_ft_old_value` is not a target column.
We can verify it by [[docs:tabul:data:print|printing]] the ''warehouse'' table
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100010 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
===== Warehouse Delete =====
==== The source CSV file for the DELETE ====
The [[docs:tabul:data:delete|delete command]] takes as input a [[docs:flow:source|source data resource]] that contains the primary or unique key of a [[docs:flow:target|target table]].
We want to delete the second record, the CSV file contains then the primary key with the value ''2''
tabul data print warehouse_delete.csv@howto
* Run it. You get the below output:
warehouse_delete.csv@howto
w_warehouse_sk
--------------
2
==== Warehouse Delete ====
* You give the ''CSV'' [[docs:flow:source|source]] and the [[docs:flow:target|target]] as argument to the [[docs:tabul:data:delete|data delete command]] to delete the second row
tabul data delete warehouse_delete.csv@howto warehouse@sqlite
Transfer results
input target latency record_count error_code error_message
-------------------------- ---------------- ------- ------------ ---------- -------------
warehouse_delete.csv@howto warehouse@sqlite 0.45s 1
* How [[docs:resource:count|many rows]] do we have ?
tabul data list -a name -a count warehouse@sqlite
path media_type name count
--------- ---------- --------- -----
warehouse table warehouse 1
* Do we still see the first row ?
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
===== Applying in-place SQL Data Manipulation =====
Until now, all data manipulation were out-of-place, meaning that there was a source and a target.
You can also apply in-place Data Manipulation with [[:docs:resource:sql_dml|SQL Data Manipulation Language request]] (''UPDATE/INSERT/UPDATE/MERGE'')
For a full example, applying DDL and DML request, see this [[howto:sql_query:dml_query|howto]]
===== Next =====
With the laws on data privacy, it's now mandatory to develop data application with generated data. ''Tabulify'' has built-in a data generator that we will explore in the next page.
[[9_data_generation|How to fill a resource with generated data ?]]