Table of Contents

Query

For Tabulify, a SQL Query is a runtime resource that when executed returns a data set in tabular format (known as the result set).

The selector of a runtime is known as a runtime selector. This selector is composed of a data selector inside its expression.

The syntax is:

(sqlSelector)@connection

where:

In the next steps, we will show this concepts by running a query of the Tpcds benchmark.

Query 11

Tpcds offers 100 queries that are all located in the tpcds-queries connection

We will execute the query 11 that lists the customers that have generated growing sales from one year to another in the s and w sales type.

Below is the query 11 located at sqlite/query_11.sql@tpcds_query

tabul data cat sqlite/query_11.sql@tpcds_query
with year_total as (
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(ss_ext_list_price-ss_ext_discount_amt) year_total
       ,'s' sale_type
 from customer
     ,store_sales
     ,date_dim
 where c_customer_sk = ss_customer_sk
   and ss_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag 
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year 
 union all
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(ws_ext_list_price-ws_ext_discount_amt) year_total
       ,'w' sale_type
 from customer
     ,web_sales
     ,date_dim
 where c_customer_sk = ws_bill_customer_sk
   and ws_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag 
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
         )
  select  
                  t_s_secyear.customer_id
                 ,t_s_secyear.customer_first_name
                 ,t_s_secyear.customer_last_name
                 ,t_s_secyear.customer_email_address
 from year_total t_s_firstyear
     ,year_total t_s_secyear
     ,year_total t_w_firstyear
     ,year_total t_w_secyear
 where t_s_secyear.customer_id = t_s_firstyear.customer_id
         and t_s_firstyear.customer_id = t_w_secyear.customer_id
         and t_s_firstyear.customer_id = t_w_firstyear.customer_id
         and t_s_firstyear.sale_type = 's'
         and t_w_firstyear.sale_type = 'w'
         and t_s_secyear.sale_type = 's'
         and t_w_secyear.sale_type = 'w'
         and t_s_firstyear.dyear = 2001
         and t_s_secyear.dyear = 2001+1
         and t_w_firstyear.dyear = 2001
         and t_w_secyear.dyear = 2001+1
         and t_s_firstyear.year_total > 0
         and t_w_firstyear.year_total > 0
         and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else 0.0 end
             > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else 0.0 end
 order by t_s_secyear.customer_id
         ,t_s_secyear.customer_first_name
         ,t_s_secyear.customer_last_name
         ,t_s_secyear.customer_email_address
limit 100;

To make the query 11 executable, we need to add the execution connection.

The executable data uri will be then:

(sqlite/query_11.sql@tpcds_query)@connection

Printing the query

Printing the query against the sqlite connection will show us the customers that have generated growing sales from one year to another in the s and w sales type.

tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id        customer_first_name   customer_last_name   customer_email_address
----------------   -------------------   ------------------   ----------------------------------
AAAAAAAABMDAAAAA   Jose                  Guzman               [email protected]
AAAAAAAAFLAAAAAA   Joseph                Riley                [email protected]
AAAAAAAAJGBAAAAA   Bonnie                Harrison             [email protected]
AAAAAAAAKMDAAAAA   Alexander             White                [email protected]
AAAAAAAAOFCAAAAA   Timothy               Grogan               [email protected]

A query is just a data resource

Don't forget that:

A query can therefore be used on every command as any other data resource.

How to copy a query into another table

For instance, if you want to copy its output to a table called query_11, you would issue the following command:

tabul data copy '(sqlite/query_11.sql@tpcds_query)@sqlite'  query_11@sqlite
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)

And the content of the new table is just the same as the output of the query 11.

tabul data print query_11@sqlite
query_11@sqlite
customer_id        customer_first_name   customer_last_name   customer_email_address
----------------   -------------------   ------------------   ----------------------------------
AAAAAAAABMDAAAAA   Jose                  Guzman               [email protected]
AAAAAAAAFLAAAAAA   Joseph                Riley                [email protected]
AAAAAAAAJGBAAAAA   Bonnie                Harrison             [email protected]
AAAAAAAAKMDAAAAA   Alexander             White                [email protected]
AAAAAAAAOFCAAAAA   Timothy               Grogan               [email protected]

Let's drop it to keep a clean schema

tabul data drop query_11@sqlite
Dropped data resources
data_uri          media_type
---------------   ----------
query_11@sqlite   table

How to download a query as a CSV

For instance, if you want to copy its output:

you would issue the following command

tabul data copy '(sqlite/query_11.sql@tpcds_query)@sqlite'  query_11.csv@cd
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)

And the content of the new csv is also just the same than the output of the query 11.

tabul data print query_11.csv@cd
query_11.csv@cd
customer_id        customer_first_name   customer_last_name   customer_email_address
----------------   -------------------   ------------------   ----------------------------------
AAAAAAAABMDAAAAA   Jose                  Guzman               [email protected]
AAAAAAAAFLAAAAAA   Joseph                Riley                [email protected]
AAAAAAAAJGBAAAAA   Bonnie                Harrison             [email protected]
AAAAAAAAKMDAAAAA   Alexander             White                [email protected]
AAAAAAAAOFCAAAAA   Timothy               Grogan               [email protected]

Next

Now that we can query our data, we are going to learn how to modify the content of a table

How to modify the content of a table