---json { "page_id": "wt5zet0z99msmhqniljgi" } --- ====== Learning Tabulify - Step 8 - How to modify the content of a table ====== This page is part of the [[start|Tabulify learning guide]] and to be able to follow it, you should have followed the [[6_transfer_data_resource|transfer step]]. ===== Data Modification ===== The [[6_transfer_data_resource|transfer page]] should have made you familiar with the [[docs:op:transfer|transfer concept]] where we have introduced the [[docs:op:copy|copy operation]]. The [[docs:op:copy|copy]] and [[docs:op:move|move]] operation does not modify any actual data. They just create a [[docs:flow:target|target data resource]] that is an ''exact'' [[docs:op:copy|copy]] of the source data resource. This page introduces you to the [[docs:op:modication|data modification operations]] that modify the [[docs:flow:target|target data resource]] [[docs:resource:content|content]]. ie * [[docs:op:insert|insertion / append]] * [[docs:op:update|update]] * [[docs:op:upsert|upsert]] * [[docs:op:delete|delete]] They all works the same way, they accepts a [[docs:flow:source|source data resource]] that defines the data to [[docs:op:insert|insert]], [[docs:op:update|update]], [[docs:op:upsert|upsert]] or [[docs:op:delete|delete]]. ===== Warehouse - The target resource ===== This is to be able to re-run the page. We put the warehouse table to the same state than before: * We delete the row added * We update the first row to the original tabul data delete warehouse_delete.csv@howto warehouse@sqlite tabul data update warehouse_original.csv@howto warehouse@sqlite The [[:docs:system:tpcds|tpc-ds]] ''warehouse'' table will be our [[docs:flow:target|target data resource]] for this page. The ''warehouse'' [[docs:resource:sql_table|table]] has one row that we can see with the [[docs:tabul:data:print|data print command]] tabul data print warehouse@sqlite warehouse@sqlite w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset -------------- ---------------- ------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------ 1 AAAAAAAABAAAAAAA Conventional childr 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5 ===== Warehouse Upsert with a CSV file as Source ===== The [[docs:op:upsert|upsert data operation]] is supported by the [[docs:tabul:data:upsert|data upsert command]]. This upsert operation: * will try to insert data * and if it can't (because there is already a record identified by a primary key), it will update it. We will: * modify the name of the first warehouse to ''Modified Warehouse Name'' * add a ''second'' new warehouse row. ==== The source CSV file for the upsert ==== The data to upsert is defined in the ''warehouse_upsert.csv'' [[docs:resource:csv|Csv data resource]] in the [[docs:connection:howto|howto connection]]. We take a look at its content with the [[docs:tabul:data:print|data print command]]: tabul data print warehouse_upsert.csv@howto warehouse_upsert.csv@howto w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset -------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------ 1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5 2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1 ==== The upsert command ==== * The [[docs:tabul:data:upsert|upsert command]] is a [[docs:tabul:data:transfer|transfer command]] and takes as argument: * a [[docs:flow:source|source]] - the [[docs:resource:csv|csv file]] in the [[docs:connection:howto|howto connection]] * a [[docs:flow:target|target]] - the [[docs:resource:sql_table|sql table]] in the [[docs:connection:howtos|sqlite connection]] tabul data upsert warehouse_upsert.csv@howto warehouse@sqlite * Run it and you get the below output: Transfer results input target latency record_count error_code error_message -------------------------- ---------------- ------- ------------ ---------- ------------- warehouse_upsert.csv@howto warehouse@sqlite 0.52s 2 * You can verify that the [[docs:flow:target|target data resource]] has been changed with the [[docs:tabul:data:print|data print command]] and check that: * there is a second new record * the new name of the first warehouse is now ''Modified Warehouse Name'' tabul data print warehouse@sqlite warehouse@sqlite w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset -------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------ 1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5 2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1 ===== Warehouse Update with a SQL query as Source ===== ==== The source query for the update ==== Let's say for the sake of simplicity that we want to increase the square footage column ''w_warehouse_sq_ft'' of the warehouse by ''10'' square meter. The [[docs:flow:source|source]] [[docs:resource:sql_select|query]] needs to include: * the primary key ''w_warehouse_sk'' * the target column with the new value ''w_warehouse_sq_ft'' Example with the ''warehouse_update.sql'' [[docs:resource:file|file]] in the [[docs:connection:howto|howto]] connection tabul data print --type text warehouse_update.sql@howto select w_warehouse_sk, w_warehouse_sq_ft+10 as w_warehouse_sq_ft, -- the name of the column match the name of the target column w_warehouse_sq_ft as w_warehouse_sq_ft_old_value -- just to show the diff from warehouse w2 ; * The result of the [[docs:resource:sql_select|select query]] can be seen with the [[docs:tabul:data:print|print command]] tabul data print '(warehouse_update.sql@howto)@sqlite' # The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell) (warehouse_update.sql@howto)@sqlite w_warehouse_sk w_warehouse_sq_ft w_warehouse_sq_ft_old_value -------------- ----------------- --------------------------- 1 977797 977787 2 100010 100000 ==== The update command ==== Calling the [[docs:tabul:data:update|data update command]] with the [[docs:resource:sql_select|query]] as [[docs:flow:source|source]] will update the ''warehouse'' table ''w_warehouse_sq_ft'' columns by ''10'' square meters. tabul data update \ --transfer-mapping-strict false \ '(warehouse_update.sql@howto)@sqlite' warehouse@sqlite # The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell) Transfer results input target latency record_count error_code error_message ----------------------------------- ---------------- ------- ------------ ---------- ------------- (warehouse_update.sql@howto)@sqlite warehouse@sqlite 0.40s 2 where: * `--transfer-mapping-strict` is set to `false` because the source column `w_warehouse_sq_ft_old_value` is not a target column. We can verify it by [[docs:tabul:data:print|printing]] the ''warehouse'' table tabul data print warehouse@sqlite warehouse@sqlite w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset -------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------ 1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5 2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100010 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1 ===== Warehouse Delete ===== ==== The source CSV file for the DELETE ==== The [[docs:tabul:data:delete|delete command]] takes as input a [[docs:flow:source|source data resource]] that contains the primary or unique key of a [[docs:flow:target|target table]]. We want to delete the second record, the CSV file contains then the primary key with the value ''2'' tabul data print warehouse_delete.csv@howto * Run it. You get the below output: warehouse_delete.csv@howto w_warehouse_sk -------------- 2 ==== Warehouse Delete ==== * You give the ''CSV'' [[docs:flow:source|source]] and the [[docs:flow:target|target]] as argument to the [[docs:tabul:data:delete|data delete command]] to delete the second row tabul data delete warehouse_delete.csv@howto warehouse@sqlite Transfer results input target latency record_count error_code error_message -------------------------- ---------------- ------- ------------ ---------- ------------- warehouse_delete.csv@howto warehouse@sqlite 0.45s 1 * How [[docs:resource:count|many rows]] do we have ? tabul data list -a name -a count warehouse@sqlite path media_type name count --------- ---------- --------- ----- warehouse table warehouse 1 * Do we still see the first row ? tabul data print warehouse@sqlite warehouse@sqlite w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset -------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------ 1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5 ===== Applying in-place SQL Data Manipulation ===== Until now, all data manipulation were out-of-place, meaning that there was a source and a target. You can also apply in-place Data Manipulation with [[:docs:resource:sql_dml|SQL Data Manipulation Language request]] (''UPDATE/INSERT/UPDATE/MERGE'') For a full example, applying DDL and DML request, see this [[howto:sql_query:dml_query|howto]] ===== Next ===== With the laws on data privacy, it's now mandatory to develop data application with generated data. ''Tabulify'' has built-in a data generator that we will explore in the next page. [[9_data_generation|How to fill a resource with generated data ?]]