Learning Tabulify - Step 4 - How to select Data Resources
Concepts
To select a data resources such as a file or a database table, Tabulify uses the concept of:
- and dependency (Do we select also the dependent data resources)
This page goes through this concepts with explanation and examples.
Data Selector
A data selector is composed of two parts:
- and a connection
- separated by the @ (at) sign.
A data selector looks like that:
globPattern@connection
A glob pattern permits to define the name or the path of the data resource located in its system connection.
Normal Selection
For instance, with the internal TPC-DS data store, the below list command will select all tables that ends with the term sales because the * character matches all characters.
tabul data list *sales@tpcds
where:
- tabul is the main command line utility
- data is a module (ie the data module)
- list is a command
- *sales@tpcds is a resource data selector that select data resources.
- tpcds defines the connection
- *sales defines the tables to look for with a glob pattern. In our case all tables that finish with the word sales because * is the globbing star and select all characters.
Output:
path media_type
------------- ------------
catalog_sales sql/relation
store_sales sql/relation
web_sales sql/relation
To get more practice on glob pattern, you can have a look at this page. Tabulify - How to select data resources with a Glob Pattern
Selection with dependencies
When moving data due to foreign-key constraint, you need to move the data resources and their dependencies.
That's why Tabulify offers the --with-dependencies flag that will select also the dependent resources of the selected data resource
Example: All tables that have a name that ends with sales in the tpcds system and their dependent tables
tabul data list --with-dependencies *sales@tpcds
path media_type
---------------------- ------------
call_center sql/relation
catalog_page sql/relation
catalog_sales sql/relation
customer sql/relation
customer_address sql/relation
customer_demographics sql/relation
date_dim sql/relation
household_demographics sql/relation
income_band sql/relation
item sql/relation
promotion sql/relation
ship_mode sql/relation
store sql/relation
store_sales sql/relation
time_dim sql/relation
warehouse sql/relation
web_page sql/relation
web_sales sql/relation
web_site sql/relation
Local File System
The connection part of a data selector is not mandatory as the default connection is the local file system.
Therefore, performing the list command with a data selector without connection will give you a list of the file in your current directory.
tabul data list *
path
-----------------------
README.md
characters.csv
date_dim--generator.yml
sequence--generator.yml
This is then the equivalent of the ls command
Next
Now that we know how to select data resources, the next page will show you how to print their content.