For Tabulify, a SQL Query is a runtime resource that when executed returns a data set in tabular format (known as the result set).
The selector of a runtime is known as a runtime selector. This selector is composed of a data selector inside its expression.
The syntax is:
(sqlSelector)@connection
where:
In the next steps, we will show this concepts by running a query of the Tpcds benchmark.
Tpcds offers 100 queries that are all located in the tpcds-queries connection
We will execute the query 11 that lists the customers that have generated growing sales from one year to another in the s and w sales type.
Below is the query 11 located at sqlite/query_11.sql@tpcds_query
tabul data cat sqlite/query_11.sql@tpcds_query
with year_total as (
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(ss_ext_list_price-ss_ext_discount_amt) year_total
,'s' sale_type
from customer
,store_sales
,date_dim
where c_customer_sk = ss_customer_sk
and ss_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(ws_ext_list_price-ws_ext_discount_amt) year_total
,'w' sale_type
from customer
,web_sales
,date_dim
where c_customer_sk = ws_bill_customer_sk
and ws_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
select
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
from year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
where t_s_secyear.customer_id = t_s_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_secyear.customer_id
and t_s_firstyear.customer_id = t_w_firstyear.customer_id
and t_s_firstyear.sale_type = 's'
and t_w_firstyear.sale_type = 'w'
and t_s_secyear.sale_type = 's'
and t_w_secyear.sale_type = 'w'
and t_s_firstyear.dyear = 2001
and t_s_secyear.dyear = 2001+1
and t_w_firstyear.dyear = 2001
and t_w_secyear.dyear = 2001+1
and t_s_firstyear.year_total > 0
and t_w_firstyear.year_total > 0
and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else 0.0 end
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else 0.0 end
order by t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
limit 100;
To make the query 11 executable, we need to add the execution connection.
The executable data uri will be then:
(sqlite/query_11.sql@tpcds_query)@connection
Printing the query against the sqlite connection will show us the customers that have generated growing sales from one year to another in the s and w sales type.
tabul data print '(sqlite/query_11.sql@tpcds_query)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/query_11.sql@tpcds_query)@sqlite
customer_id customer_first_name customer_last_name customer_email_address
---------------- ------------------- ------------------ ----------------------------------
AAAAAAAABMDAAAAA Jose Guzman [email protected]
AAAAAAAAFLAAAAAA Joseph Riley [email protected]
AAAAAAAAJGBAAAAA Bonnie Harrison [email protected]
AAAAAAAAKMDAAAAA Alexander White [email protected]
AAAAAAAAOFCAAAAA Timothy Grogan [email protected]
Don't forget that:
A query can therefore be used on every command as any other data resource.
For instance, if you want to copy its output to a table called query_11, you would issue the following command:
tabul data copy '(sqlite/query_11.sql@tpcds_query)@sqlite' query_11@sqlite
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
And the content of the new table is just the same as the output of the query 11.
tabul data print query_11@sqlite
query_11@sqlite
customer_id customer_first_name customer_last_name customer_email_address
---------------- ------------------- ------------------ ----------------------------------
AAAAAAAABMDAAAAA Jose Guzman [email protected]
AAAAAAAAFLAAAAAA Joseph Riley [email protected]
AAAAAAAAJGBAAAAA Bonnie Harrison [email protected]
AAAAAAAAKMDAAAAA Alexander White [email protected]
AAAAAAAAOFCAAAAA Timothy Grogan [email protected]
Let's drop it to keep a clean schema
tabul data drop query_11@sqlite
Dropped data resources
data_uri media_type
--------------- ----------
query_11@sqlite table
For instance, if you want to copy its output:
you would issue the following command
tabul data copy '(sqlite/query_11.sql@tpcds_query)@sqlite' query_11.csv@cd
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
And the content of the new csv is also just the same than the output of the query 11.
tabul data print query_11.csv@cd
query_11.csv@cd
customer_id customer_first_name customer_last_name customer_email_address
---------------- ------------------- ------------------ ----------------------------------
AAAAAAAABMDAAAAA Jose Guzman [email protected]
AAAAAAAAFLAAAAAA Joseph Riley [email protected]
AAAAAAAAJGBAAAAA Bonnie Harrison [email protected]
AAAAAAAAKMDAAAAA Alexander White [email protected]
AAAAAAAAOFCAAAAA Timothy Grogan [email protected]
Now that we can query our data, we are going to learn how to modify the content of a table