Table of Contents

How to diff a SQL table ?

About

This howto show you how to perform diff operation between 2 SQL Tables with the data diff command.

Steps

Load Data

In this step, we load the csv resources that we want to compare into sqlite

tabul data replace diff/characters_original.csv@howto characters_original@sqlite
Transfer results
input                                target                       latency   record_count   error_code   error_message
----------------------------------   --------------------------   -------   ------------   ----------   -------------
diff/characters_original.csv@howto   characters_original@sqlite   0.50s                5

tabul data replace diff/characters_new.csv@howto characters_new@sqlite
Transfer results
input                           target                  latency   record_count   error_code   error_message
-----------------------------   ---------------------   -------   ------------   ----------   -------------
diff/characters_new.csv@howto   characters_new@sqlite   0.53s                3

The data

The fact that records are missing can be easily spotted but there are more subtle differences.

tabul data print \
    --strict-selection \
    characters_new@sqlite \
    characters_original@sqlite
characters_new@sqlite
Last Name        First Name   Birth Date   Wikipedia Page
--------------   ----------   ----------   ---------------------------------------------------
Schwarzenegger   Arnold       1947-07-30   https://en.wikipedia.org/wiki/Arnold_Schwarzenegger
Norman           Don          1935-12-23   https://en.wikipedia.org/wiki/Don_Norman
Kahneman         Daniel       1934-03-05   https://en.wikipedia.org/wiki/Daniel_Kahneman



characters_original@sqlite
last_name           first_name   birth_date   wikipedia_page
-----------------   ----------   ----------   --------------------------------------------------------------
Schwarzenegger      Arnold       1947-07-30   https://en.wikipedia.org/wiki/Arnold_Schwarzenegger
Norman              Don          1935-12-25   https://en.wikipedia.org/wiki/Don_Norman
Harbison Carnagey   Dale         1888-11-24   https://en.wikipedia.org/wiki/Dale_Carnegie
Kahneman            Daniel       1934-03-05   https://en.wikipedia.org/wiki/Daniel_Kahneman
Pittman             Bob          1953-12-28   https://en.wikipedia.org/wiki/Robert_Pittman_(media_executive)

Failure Behavior

By default, a diff that detects differences will fail (ie return a bad exit code).

Perform the diff

All diff can be executed with the data diff command by giving as argument the source and target data resource.

tabul data diff \
  --no-fail \
  characters_original@sqlite characters_new@sqlite
Diff Summary Report
The data resources are not equals.
from                         to                      equals   record_count   change_count
--------------------------   ---------------------   ------   ------------   ------------
characters_original@sqlite   characters_new@sqlite                       5              6

Perform a diff and generate a unified report

In this step, we will asked for more details by asking for a unified report type

tabul data diff \
    --no-fail \
    --report-type unified \
    characters_original@sqlite characters_new@sqlite
Unified Diff Report between the resources (characters_original@sqlite) and (characters_new@sqlite).
 - �[31mred�[0m   for deleted
 - �[32mgreen�[0m for added
 - �[34mblue�[0m  for type change (example: number as varchar)
The data resources are not equals.
diff_status   last_name           first_name   birth_date   wikipedia_page
-----------   -----------------   ----------   ----------   --------------------------------------------------------------
=             Schwarzenegger      Arnold       1947-07-30   https://en.wikipedia.org/wiki/Arnold_Schwarzenegger
�[32m+             �[0m�[32mNorman              �[0m�[32mDon          �[0m�[32m1935-12-23   �[0m�[32mhttps://en.wikipedia.org/wiki/Don_Norman�[0m
�[32m+             �[0m�[32mKahneman            �[0m�[32mDaniel       �[0m�[32m1934-03-05   �[0m�[32mhttps://en.wikipedia.org/wiki/Daniel_Kahneman�[0m
�[31m-             �[0m�[31mNorman              �[0m�[31mDon          �[0m�[31m1935-12-25   �[0m�[31mhttps://en.wikipedia.org/wiki/Don_Norman�[0m
�[31m-             �[0m�[31mHarbison Carnagey   �[0m�[31mDale         �[0m�[31m1888-11-24   �[0m�[31mhttps://en.wikipedia.org/wiki/Dale_Carnegie�[0m
�[31m-             �[0m�[31mKahneman            �[0m�[31mDaniel       �[0m�[31m1934-03-05   �[0m�[31mhttps://en.wikipedia.org/wiki/Daniel_Kahneman�[0m
�[31m-             �[0m�[31mPittman             �[0m�[31mBob          �[0m�[31m1953-12-28   �[0m�[31mhttps://en.wikipedia.org/wiki/Robert_Pittman_(media_executive)�[0m

We can see that the diff is not line by line as if it was a text file. In the next step, we will tell the diff that we want a diff on the last_name.

Perform a diff with a column driver

To improve our comparison, we will define the unique column of our data resource with the --driver-columns option.

tabul data diff \
   --no-fail \
   --report-type unified \
   --driver-columns "Last Name" \
   characters_original@sqlite characters_new@sqlite
Unified Diff Report between the resources (characters_original@sqlite) and (characters_new@sqlite).
 - �[31mred�[0m   for deleted
 - �[32mgreen�[0m for added
 - �[34mblue�[0m  for type change (example: number as varchar)
The data resources are not equals.
diff_status   last_name           first_name   birth_date   wikipedia_page
-----------   -----------------   ----------   ----------   --------------------------------------------------------------
=             Schwarzenegger      Arnold       1947-07-30   https://en.wikipedia.org/wiki/Arnold_Schwarzenegger
�[32m+1            �[0mNorman              Don          �[32m1935-12-23   �[0mhttps://en.wikipedia.org/wiki/Don_Norman
�[31m-1            �[0mNorman              Don          �[31m1935-12-25   �[0mhttps://en.wikipedia.org/wiki/Don_Norman
�[31m-             �[0m�[31mHarbison Carnagey   �[0m�[31mDale         �[0m�[31m1888-11-24   �[0m�[31mhttps://en.wikipedia.org/wiki/Dale_Carnegie�[0m
=             Kahneman            Daniel       1934-03-05   https://en.wikipedia.org/wiki/Daniel_Kahneman
�[31m-             �[0m�[31mPittman             �[0m�[31mBob          �[0m�[31m1953-12-28   �[0m�[31mhttps://en.wikipedia.org/wiki/Robert_Pittman_(media_executive)�[0m

This time, the unified report tell us that we need to:

from the original resource to obtain the new resource

Advanced Diff with a pipeline

The diff operation is an intermediate pipeline operation that has a lot of arguments that are not available with the diff command line.

If you want to perform an advanced diff, create a pipeline.