---json { "page_id": "flgk903gphn1cadb4v2tb" } --- ====== How to diff a SQL table ? ====== ===== About ===== This howto show you how to perform [[:docs:op:diff|diff operation]] between 2 [[docs:resource:sql_table|SQL Tables]] with the [[:docs:tabul:data:diff|data diff command]]. ===== Steps ===== ==== Load Data ==== In this step, we load the [[:docs:resource:csv|csv]] resources that we want to compare into [[:docs:system:sqlite:sqlite|sqlite]] * Load the ''original'' data resource (also known as the ''source'', ''old'', ''from'' or ''left'' resource) tabul data replace diff/characters_original.csv@howto characters_original@sqlite Transfer results input target latency record_count error_code error_message ---------------------------------- -------------------------- ------- ------------ ---------- ------------- diff/characters_original.csv@howto characters_original@sqlite 0.50s 5 * The ''new'' data resource (also known as the ''target'', ''right'', ''to'' resource) tabul data replace diff/characters_new.csv@howto characters_new@sqlite Transfer results input target latency record_count error_code error_message ----------------------------- --------------------- ------- ------------ ---------- ------------- diff/characters_new.csv@howto characters_new@sqlite 0.53s 3 ==== The data ==== The fact that [[:docs:resource:record|records]] are missing can be easily spotted but there are more subtle differences. tabul data print \ --strict-selection \ characters_new@sqlite \ characters_original@sqlite characters_new@sqlite Last Name First Name Birth Date Wikipedia Page -------------- ---------- ---------- --------------------------------------------------- Schwarzenegger Arnold 1947-07-30 https://en.wikipedia.org/wiki/Arnold_Schwarzenegger Norman Don 1935-12-23 https://en.wikipedia.org/wiki/Don_Norman Kahneman Daniel 1934-03-05 https://en.wikipedia.org/wiki/Daniel_Kahneman characters_original@sqlite last_name first_name birth_date wikipedia_page ----------------- ---------- ---------- -------------------------------------------------------------- Schwarzenegger Arnold 1947-07-30 https://en.wikipedia.org/wiki/Arnold_Schwarzenegger Norman Don 1935-12-25 https://en.wikipedia.org/wiki/Don_Norman Harbison Carnagey Dale 1888-11-24 https://en.wikipedia.org/wiki/Dale_Carnegie Kahneman Daniel 1934-03-05 https://en.wikipedia.org/wiki/Daniel_Kahneman Pittman Bob 1953-12-28 https://en.wikipedia.org/wiki/Robert_Pittman_(media_executive) ==== Failure Behavior ==== By default, a diff that detects differences will fail (ie return a bad exit code). * The `--no-fail` option changes this behavior. * You don't need to set it normally but as we monitor our documentation for failure, we don't want to fail ==== Perform the diff ==== All diff can be executed with the [[docs:tabul:data:diff|data diff command]] by giving as argument the [[docs:flow:source|source]] and [[docs:flow:target|target]] data resource. tabul data diff \ --no-fail \ characters_original@sqlite characters_new@sqlite Diff Summary Report The data resources are not equals. from to equals record_count change_count -------------------------- --------------------- ------ ------------ ------------ characters_original@sqlite characters_new@sqlite 5 6 ==== Perform a diff and generate a unified report ==== In this step, we will asked for more details by asking for a [[:docs:op:diff#unified|unified report type]] tabul data diff \ --no-fail \ --report-type unified \ characters_original@sqlite characters_new@sqlite Unified Diff Report between the resources (characters_original@sqlite) and (characters_new@sqlite). - red for deleted - green for added - blue for type change (example: number as varchar) The data resources are not equals. diff_status last_name first_name birth_date wikipedia_page ----------- ----------------- ---------- ---------- -------------------------------------------------------------- = Schwarzenegger Arnold 1947-07-30 https://en.wikipedia.org/wiki/Arnold_Schwarzenegger + Norman Don 1935-12-23 https://en.wikipedia.org/wiki/Don_Norman + Kahneman Daniel 1934-03-05 https://en.wikipedia.org/wiki/Daniel_Kahneman - Norman Don 1935-12-25 https://en.wikipedia.org/wiki/Don_Norman - Harbison Carnagey Dale 1888-11-24 https://en.wikipedia.org/wiki/Dale_Carnegie - Kahneman Daniel 1934-03-05 https://en.wikipedia.org/wiki/Daniel_Kahneman - Pittman Bob 1953-12-28 https://en.wikipedia.org/wiki/Robert_Pittman_(media_executive) We can see that the diff is not line by line as if it was a text file. In the next step, we will tell the ''diff'' that we want a diff on the ''last_name''. ==== Perform a diff with a column driver ==== To improve our comparison, we will define the unique column of our data resource with the ''%%--driver-columns%%'' option. tabul data diff \ --no-fail \ --report-type unified \ --driver-columns "Last Name" \ characters_original@sqlite characters_new@sqlite Unified Diff Report between the resources (characters_original@sqlite) and (characters_new@sqlite). - red for deleted - green for added - blue for type change (example: number as varchar) The data resources are not equals. diff_status last_name first_name birth_date wikipedia_page ----------- ----------------- ---------- ---------- -------------------------------------------------------------- = Schwarzenegger Arnold 1947-07-30 https://en.wikipedia.org/wiki/Arnold_Schwarzenegger +1 Norman Don 1935-12-23 https://en.wikipedia.org/wiki/Don_Norman -1 Norman Don 1935-12-25 https://en.wikipedia.org/wiki/Don_Norman - Harbison Carnagey Dale 1888-11-24 https://en.wikipedia.org/wiki/Dale_Carnegie = Kahneman Daniel 1934-03-05 https://en.wikipedia.org/wiki/Daniel_Kahneman - Pittman Bob 1953-12-28 https://en.wikipedia.org/wiki/Robert_Pittman_(media_executive) This time, the unified report tell us that we need to: * update the ''birth date'' of the ''Norman'' record (''+1,-1'', in the ''1'' change) * delete the records * ''Harbison Carnagey'' * and ''Pittman'' from the ''original'' resource to obtain the ''new'' resource ==== Advanced Diff with a pipeline ==== The [[:docs:op:diff|diff operation]] is an [[:docs:flow:intermediate|intermediate pipeline operation]] that has a lot of [[:docs:op:diff#arguments|arguments]] that are not available with the [[:docs:tabul:data:diff|diff command line]]. If you want to perform an advanced diff, create a [[:docs:flow:pipeline|pipeline]].