Tabulify - How to select data resources with a Glob Pattern

About

With this page, you will learn what a glob pattern is and how to use it to select data resources.

A glob pattern or glob expression is a string that can be matched or not against another string. If the glob pattern matches, a data resource name, the data resource is selected otherwise it's not.

To express the pattern, special characters called wildcard are used that have special meaning. The following paragraphs go through each of this wildcard and shows you how to use them.

For the hackers, this is like a regular expression but simplified

Steps

Prerequisites

You should have Tabulify installed on your computer.

Learning Tabulify - Step 1 - Installation

Star (Any character)

The star character * also known as asterix matches any number of characters.

If we want to select all data resources that ends with the term sales, we will use the following glob pattern

*sales

where:

  • * matches all characters before sales
  • sales matches itself.

Example:

tabul data list *sales@tpcds
path            media_type
-------------   ------------
catalog_sales   sql/relation
store_sales     sql/relation
web_sales       sql/relation

Question mark (One Character)

A question mark, ?, matches exactly one character.

Example:

  • ???? - Matches all data resources with exactly four letters or digits
tabul data list ????@tpcds
path   media_type
----   ------------
item   sql/relation

  • w?*sales - Matches any string beginning with w, followed by at least one letter or digit, and ending with sales
tabul data list w?*sales@tpcds
path        media_type
---------   ------------
web_sales   sql/relation

Braces (ie OR)

Braces {} specify a collection of subpatterns.

For example:

  • {item,store} matches item or store
tabul data list {item,store}@tpcds
path    media_type
-----   ------------
item    sql/relation
store   sql/relation

  • {web*,store*} matches all data resource names that begins with web or store.
tabul data list {web*,store*}@tpcds
path            media_type
-------------   ------------
store           sql/relation
store_returns   sql/relation
store_sales     sql/relation
web_page        sql/relation
web_returns     sql/relation
web_sales       sql/relation
web_site        sql/relation

Square brackets (Set of characters)

Square brackets [] convey:

  • a set of single characters
  • or a range of characters when the hyphen character (-) is used

that matches a single character. Within the square brackets, the wildcard *, ?, and \ match themselves.

Example:

  • [aeiou] matches any lowercase vowel.
  • [0-9] matches any digit.
  • [A-Z] matches any uppercase letter.
  • [a-z,A-Z] matches any uppercase or lowercase letter.
  • [!abc] exclusion does not match the set of characters abc
  • *[0-9]* ? Matches all strings containing a numeric value

Example:

  • all data resource that does not have a wy in their name
tabul data list *[wy]*@tpcds
path                   media_type
--------------------   ------------
inventory              sql/relation
s_inventory            sql/relation
s_warehouse            sql/relation
s_web_order            sql/relation
s_web_order_lineitem   sql/relation
s_web_page             sql/relation
s_web_returns          sql/relation
s_web_site             sql/relation
warehouse              sql/relation
web_page               sql/relation
web_returns            sql/relation
web_sales              sql/relation
web_site               sql/relation

You can also search container such as directory or schema recursively by adding to your glob pattern the double start (or asterix).

For instance, the below patter will search the file that starts with RE in the current directory and all sub-directories of the howto connection directory

tabul data list **/RE*@howto
  • And there is two README.md files found that match the pattern
path                                     media_type
--------------------------------------   ----------
dataset/world/README.md                  text/plain
email/README.md                          text/plain
mysql/coffee-break-sample-db/README.md   text/plain
README.md                                text/plain
recursive/README.md                      text/plain

Escape

The escape character is the backslash \ and turns a wildcard into a simple characters.

Syntax

\wildcard

For example:

  • \\ will match a single backslash \
  • \? matches the question mark ?
  • \* matches the star *

This is almost always not needed because most of the wildcard are already not allowed when naming resource such as file or table.




Related Pages
Undraw Apps
Data Resource - Runtime Data Resource

A runtime data resource is a special type of data resource that represents: a data resource known as the executable resource that should be executed against a connection They are called runtime...
Undraw Apps
HTTP System (The Web)

http is a system of URL. A URL when requested returns a response resource (document) with a media type. As a file system: a document can be seen as a file with its URL as path. http does not...
Undraw Apps
How to create a SQL View with a SELECT query

This howto shows you how to create sql views with select query stored in SQL file We are going to use the tpcds query_11.sql to create this view. With the tabul create command, to create view...
Undraw Apps
How to execute a procedure in MySQL ?

This howto shows you how to create and execute a SQL Procedure in MySQL. This example has been taken from the official...
Undraw Apps
Learning Tabulify - Step 4 - How to select Data Resources

To select a data resources such as a file or a database table, Tabulify uses the concept of: and dependency (Do we select also the dependent data resources) This page goes through this concepts...
Undraw Apps
Tabulify - Glob Pattern

Glob pattern are used in Tabulify to select objects such as : connection resource (in data selector) or configuration glob pattern The following wildcard are supported in order to define the...

Task Runner