Table of Contents

Tabulify - TPC-DS (Benchmark)

Tpc

About

Tabulify supports the Tpc-Ds database benchmark on the following points:

Operations

Schema Management

This section shows you how to manage the sub-schema of TPC-DS

All tables

tpcds - all TPC-DS tables

tabul data list *@tpcds
tabul data create *@tpcds @targetConnection
tabul data fill *@tpcds @targetConnection

Dwh

the data-warehouse tables - all tables without the tables that starts with a s (ie without the staging tables)

tabul data list [!s]*@tpcds
tabul data create [!s]*@tpcds @targetConnection
tabul data fill [!s]*@tpcds @targetConnection

Store Sales

The store-sales schema has the store_sales and store_return star schema (a data-warehouse schema).

tabul data list --with-dependencies store*@tpcds
path                     media_type
----------------------   ------------
customer                 sql/relation
customer_address         sql/relation
customer_demographics    sql/relation
date_dim                 sql/relation
household_demographics   sql/relation
income_band              sql/relation
item                     sql/relation
promotion                sql/relation
reason                   sql/relation
store                    sql/relation
store_returns            sql/relation
store_sales              sql/relation
time_dim                 sql/relation

tabul data create --with-dependencies store*@tpcds @targetConnection
tabul data copy --with-dependencies store*@tpcds @targetConnection

This article explains this technic: how to select a star schema

Note on the schema

The TPC-DS benchmark does not define the B column (business key) as unique key. Our implementation makes them all unique (except on the item table where the column is unique only with the start and end date)

Why ? Because when using TPC-DS as a sample schema, the data generator will then create data that is consistent with the queries.

For TPC-DS, a business key is neither a primary key nor a foreign key in the context of the data warehouse schema. It is only used to differentiate new data from update data of the source tables during the data maintenance operations.