Learning Tabulify - Step 4 - How to select Data Resources

Learning Tabulify - Step 4 - How to select Data Resources

Concepts

To select a data resources such as a file or a database table, Tabulify uses the concept of:

This page goes through this concepts with explanation and examples.

Data Selector

A data selector is composed of two parts:

A data selector looks like that:

globPattern@connection

A glob pattern permits to define the name or the path of the data resource located in its system connection.

Normal Selection

For instance, with the internal TPC-DS data store, the below list command will select all tables that ends with the term sales because the * character matches all characters.

tabul data list *sales@tpcds

where:

Output:

path            media_type
-------------   ------------
catalog_sales   sql/relation
store_sales     sql/relation
web_sales       sql/relation

To get more practice on glob pattern, you can have a look at this page. Tabulify - How to select data resources with a Glob Pattern

Selection with dependencies

When moving data due to foreign-key constraint, you need to move the data resources and their dependencies.

That's why Tabulify offers the --with-dependencies flag that will select also the dependent resources of the selected data resource

Example: All tables that have a name that ends with sales in the tpcds system and their dependent tables

tabul data list --with-dependencies *sales@tpcds
path                     media_type
----------------------   ------------
call_center              sql/relation
catalog_page             sql/relation
catalog_sales            sql/relation
customer                 sql/relation
customer_address         sql/relation
customer_demographics    sql/relation
date_dim                 sql/relation
household_demographics   sql/relation
income_band              sql/relation
item                     sql/relation
promotion                sql/relation
ship_mode                sql/relation
store                    sql/relation
store_sales              sql/relation
time_dim                 sql/relation
warehouse                sql/relation
web_page                 sql/relation
web_sales                sql/relation
web_site                 sql/relation

Local File System

The connection part of a data selector is not mandatory as the default connection is the local file system.

Therefore, performing the list command with a data selector without connection will give you a list of the file in your current directory.

tabul data list *
path
-----------------------
README.md
characters.csv
date_dim--generator.yml
sequence--generator.yml

This is then the equivalent of the ls command

Next

Now that we know how to select data resources, the next page will show you how to print their content.

How to print Data Resources




Related Pages
Data Operation - Select (Selection)

select is a supplier data operation that selects data Resources. sql select statementsql select statementrecordsselectData Resources learning guide The select operations has the following arguments....
Data Resource - Data Selector

A data selector is a data uri expression that permits to select: data resources (file, table,...) or container resources (directory, schema, ...) This syntax select data resources listed in the...
How to read and write an Excel file?

This howto will demonstrate you how to read and write to an Excel resource. To following this howto, you should have followed the getting started guide to have knowledge of: Resource creation...
Learning Tabulify - Step 3 - What's a Connection ?

In the previous page, we learned that all data are known as data resource. Data resources are stored in systems in Tabulify. There is generally speaking two kind of systems: file system database...
Learning Tabulify - Step 5 - How to print Data Resources

In the previous page, we learned how to select data resources. This page will show you how to discover their content. Tabulify offers three command to explore the data content: print: to print the...
Learning Tabulify - Step 6 - How to transfer Data Resources

In Tabulify, when you want to manipulate data, it's called a transfer. You want to move data, you transfer You want to copy data, you transfer You want to insert data, you transfer You want to...
Tabul - Summarize Command

data summarize is a command of the data module that output summary/statistics for the selected data resources You got the following summaries: count size (min, max, sum, avg) The important notions...

Task Runner