This howto will show you how to load and analyze yaml document (files) into a relational database.
This example uses the sqlite howto database but you may use any other relational database
At its core a YAML file is just a JSON file styled in a block style 1).
Tabulify will treat them by default as JSON. When you load a Yaml, you are loading a Json file (ie the yaml style attribute has a value of json)
This block style yaml file:
tabul data print --type text yaml/phonebook/Arnold.yaml@howto
phones:
- "704-555-0150"
- "202-555-0150"
city: Los Angeles
is equivalent to this json file
tabul data print \
--pipe-mode \
--attribute style=json \
yaml/phonebook/Arnold.yaml@howto | jq
{
"phones": [
"704-555-0150",
"202-555-0150"
],
"city": "Los Angeles"
}
Note that:
By default, a Yaml file is seen as a relation with one column called json
The structure of the YAML file can be seen with the data struct command.
For instance, for the Don.yaml file, there is one column:
tabul data describe yaml/phonebook/Don.yaml@howto
Structure of the resource yaml/phonebook/Don.yaml@howto
position name type precision scale primary_key nullable autoincrement comment
-------- ---- ---- --------- ----- ----------- -------- ------------- -------
1 yaml json 0 0 x
To be able to follow along the SQLite examples of the JSON documentation, we will load the yaml data (ie city and phones numbers in a user table.
The Yaml files are located in the subdirectory yaml/phonebook of the howto directory where there is one file by user.
With the data list command, we can list them.
tabul data list yaml/phonebook/*.yaml@howto
path media_type
-------------------------- ----------
yaml/phonebook/Arnold.yaml text/yaml
yaml/phonebook/Bob.yaml text/yaml
yaml/phonebook/Dale.yaml text/yaml
yaml/phonebook/Daniel.yaml text/yaml
yaml/phonebook/Don.yaml text/yaml
The below data insert command will load the YAML files into the user table of the sqlite howto database.
tabul data insert yaml/phonebook/*.yaml@howto user@sqlite
Transfer results
input target latency record_count error_code error_message
-------------------------------- ----------- ------- ------------ ---------- -------------
yaml/phonebook/Arnold.yaml@howto user@sqlite 0.53s 1
yaml/phonebook/Bob.yaml@howto user@sqlite 0.8s 1
yaml/phonebook/Dale.yaml@howto user@sqlite 0.7s 1
yaml/phonebook/Daniel.yaml@howto user@sqlite 0.9s 1
yaml/phonebook/Don.yaml@howto user@sqlite 0.8s 1
You can inspect the content with the head command
tabul data head user@sqlite
The first 10 rows of the data resource (user@sqlite):
yaml
--------------------------------------------------------------------------------------------------------------------
{\n "phones": [\n "704-555-0150",\n "202-555-0150"\n ]\n ,\n "city": "Los Angeles"\n}\n
{\n "phones": [\n "323-423-2238",\n "747-444-3222",\n "475-202-7142"\n ]\n ,\n "city": "New York"\n}\n
{\n "phones": [\n "731-414-7236",\n "256-233-3557",\n "409-645-3618"\n ]\n ,\n "city": "New York"\n}\n
{\n "phones": [\n "425-898-3481",\n "360-637-7972",\n "205-533-8271"\n ]\n ,\n "city": "Princeton"\n}\n
{\n "phones": [\n "470-293-7517"\n ]\n ,\n "city": "San Diego"\n}\n
The name of the YAML column is by default yaml but we can change it by setting the columnName json attribute
After running the below head command, we can see that the column name has changed to data
tabul data head --attribute columnName=data yaml/phonebook/Don.yaml@howto
The first 10 rows of the data resource (yaml/phonebook/Don.yaml@howto):
data
--------------------------------------------------------------------------
{\n "phones": [\n "470-293-7517"\n ]\n ,\n "city": "San Diego"\n}\n
Because the YAML content of files does not have any kind of identifier, we can see that the data is actually worthless because we can't relate the data to a user.
To be able to add context to file loaded, Tabulify supports the concept of virtual columns where it's possible to add any attribute information about the file loaded.
The virtual columns are specified with the –virtual-column option that has the following syntax:
--virtual-column virtualColumnName=resourceAttributeName
where:
For instance, if we want to add:
we would add –virtual-column name=logical_name
With the print command, the full command would look like below.
tabul data print \
--attribute columnName=data \
--virtual-column name=logical_name \
yaml/phonebook/Bob.yaml@howto
yaml/phonebook/Bob.yaml@howto
data name
------------------------------------------------------------------------------------------------------------------- ----
{\n "phones": [\n "323-423-2238",\n "747-444-3222",\n "475-202-7142"\n ]\n ,\n "city": "New York"\n}\n Bob
To reload the YAML files:
the syntax of the tabul data replace would be:
tabul data replace \
--source-attribute columnName=data \
--virtual-column name=logical_name \
yaml/phonebook/*@howto user@sqlite
where:
Output:
Transfer results
input target latency record_count error_code error_message
-------------------------------- ----------- ------- ------------ ---------- -------------
yaml/phonebook/Arnold.yaml@howto user@sqlite 0.64s 1
yaml/phonebook/Bob.yaml@howto user@sqlite 0.10s 1
yaml/phonebook/Dale.yaml@howto user@sqlite 0.9s 1
yaml/phonebook/Daniel.yaml@howto user@sqlite 0.7s 1
yaml/phonebook/Don.yaml@howto user@sqlite 0.8s 1
The YAML data loaded has now the name column as identifier.
tabul data print user@sqlite
user@sqlite
data name
-------------------------------------------------------------------------------------------------------------------- ------
{\n "phones": [\n "704-555-0150",\n "202-555-0150"\n ]\n ,\n "city": "Los Angeles"\n}\n Arnold
{\n "phones": [\n "323-423-2238",\n "747-444-3222",\n "475-202-7142"\n ]\n ,\n "city": "New York"\n}\n Bob
{\n "phones": [\n "731-414-7236",\n "256-233-3557",\n "409-645-3618"\n ]\n ,\n "city": "New York"\n}\n Dale
{\n "phones": [\n "425-898-3481",\n "360-637-7972",\n "205-533-8271"\n ]\n ,\n "city": "Princeton"\n}\n Daniel
{\n "phones": [\n "470-293-7517"\n ]\n ,\n "city": "San Diego"\n}\n Don
Inspired from the SQLite examples of the JSON documentation - section 4.3.1, we will now run the below SQL with JSON function.
-- https://www.sqlite.org/json1.html#examples_using_json_each_and_json_tree_
SELECT distinct name
FROM user, json_tree(user.data, '$.phones')
WHERE value LIKE '704-%';
To see the result of this SQL query, we will use the print command and see that Arnold is the user that have a phone number that starts with 704.
tabul data print '(sqlite/user_phone_json_from_yaml.sql@howto)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(sqlite/user_phone_json_from_yaml.sql@howto)@sqlite
name
------
Arnold
If you want to explore the JSON SQL function of SQLite, you can read the Sqlite JSON documentation.