---json { "page_id": "7fznn5ince8c9ugjt3k10" } --- ====== How to load and analyze YAML documents in a database? ====== ===== About ===== This [[howto:howto|howto]] will show you how to load and analyze [[docs:resource:yaml|yaml document (files)]] into a [[docs:system:database:database|relational database]]. This example uses the [[howto:sqlite:howto_connection|sqlite howto database]] but you may use any other [[docs:system:database:database|relational database]] ===== Steps ===== tabul data drop --no-strict-selection users@sqlite ==== Understanding YAML file ==== At its core a YAML file is just a [[:docs:resource:json|JSON file]] styled in a [[:docs:resource:yaml#attributes|block style]] (([[https://yaml.org/spec/1.2.2/#chapter-6-structural-productions|Yaml Block Style]])). Tabulify will treat them by default as [[:docs:resource:json|JSON]]. When you load a Yaml, you are loading a Json file (ie the [[:docs:resource:yaml#attributes|yaml style attribute]] has a value of ''json'') This block style yaml file: tabul data print --type text yaml/phonebook/Arnold.yaml@howto phones: - "704-555-0150" - "202-555-0150" city: Los Angeles is equivalent to this json file tabul data print \ --pipe-mode \ --attribute style=json \ yaml/phonebook/Arnold.yaml@howto | jq { "phones": [ "704-555-0150", "202-555-0150" ], "city": "Los Angeles" } Note that: * `style=json` is optional as this is the default * `--pipe-mode` is used to pipe a raw json into [[https://jqlang.org|jq]] (`pipe-mode` deletes the headers and will not print control characters) ==== Yaml relational structure ==== By default, a [[docs:resource:yaml|Yaml file]] is seen as a [[docs:resource:tabular|relation]] with one column called ''json'' The [[docs:resource:structure|structure]] of the YAML file can be seen with the [[docs:tabul:data:describe|data struct command]]. For instance, for the ''Don.yaml'' file, there is one column: * at the ''position 1'' * called ''yaml'' (the value of the [[:docs:resource:yaml#attributes|column name attribute]]) * with the ''json'' [[docs:data_type:data_type|data type]] tabul data describe yaml/phonebook/Don.yaml@howto Structure of the resource yaml/phonebook/Don.yaml@howto position name type precision scale primary_key nullable autoincrement comment -------- ---- ---- --------- ----- ----------- -------- ------------- ------- 1 yaml json 0 0 x ==== The Yaml files ==== To be able to follow along the [[https://www.sqlite.org/json1.html|SQLite examples of the JSON documentation]], we will load the ''yaml data (ie city and phones numbers'' in a ''user'' table. The [[docs:resource:yaml|Yaml files]] are located in the subdirectory ''yaml/phonebook'' of the [[docs:connection:howto|howto directory]] where there is one file by ''user''. With the [[docs:tabul:data:list|data list command]], we can list them. tabul data list yaml/phonebook/*.yaml@howto path media_type -------------------------- ---------- yaml/phonebook/Arnold.yaml text/yaml yaml/phonebook/Bob.yaml text/yaml yaml/phonebook/Dale.yaml text/yaml yaml/phonebook/Daniel.yaml text/yaml yaml/phonebook/Don.yaml text/yaml ==== Loading the YAML files ==== The below [[docs:tabul:data:insert|data insert command]] will load the YAML files into the ''user'' [[docs:resource:sql_table|table]] of the [[howto:sqlite:howto_connection|sqlite howto]] database. tabul data drop --no-strict-selection user@sqlite tabul data insert yaml/phonebook/*.yaml@howto user@sqlite Transfer results input target latency record_count error_code error_message -------------------------------- ----------- ------- ------------ ---------- ------------- yaml/phonebook/Arnold.yaml@howto user@sqlite 0.53s 1 yaml/phonebook/Bob.yaml@howto user@sqlite 0.8s 1 yaml/phonebook/Dale.yaml@howto user@sqlite 0.7s 1 yaml/phonebook/Daniel.yaml@howto user@sqlite 0.9s 1 yaml/phonebook/Don.yaml@howto user@sqlite 0.8s 1 You can inspect the content with the [[docs:tabul:data:head|head command]] tabul data head user@sqlite The first 10 rows of the data resource (user@sqlite): yaml -------------------------------------------------------------------------------------------------------------------- {\n "phones": [\n "704-555-0150",\n "202-555-0150"\n ]\n ,\n "city": "Los Angeles"\n}\n {\n "phones": [\n "323-423-2238",\n "747-444-3222",\n "475-202-7142"\n ]\n ,\n "city": "New York"\n}\n {\n "phones": [\n "731-414-7236",\n "256-233-3557",\n "409-645-3618"\n ]\n ,\n "city": "New York"\n}\n {\n "phones": [\n "425-898-3481",\n "360-637-7972",\n "205-533-8271"\n ]\n ,\n "city": "Princeton"\n}\n {\n "phones": [\n "470-293-7517"\n ]\n ,\n "city": "San Diego"\n}\n ==== Changing the name of the YAML column ==== The name of the YAML column is by default ''yaml'' but we can change it by setting the [[docs:resource:yaml#attributes|columnName json attribute]] After running the below [[docs:tabul:data:head|head command]], we can see that the column name has changed to ''data'' tabul data head --attribute columnName=data yaml/phonebook/Don.yaml@howto The first 10 rows of the data resource (yaml/phonebook/Don.yaml@howto): data -------------------------------------------------------------------------- {\n "phones": [\n "470-293-7517"\n ]\n ,\n "city": "San Diego"\n}\n ==== Adding a column with the logical name of the JSON file ==== Because the ''YAML'' content of files does not have any kind of identifier, we can see that the data is actually worthless because we can't relate the ''data'' to a ''user''. To be able to add context to file loaded, ''Tabulify'' supports the concept of [[docs:resource:virtual_column|virtual columns]] where it's possible to add [[docs:resource:attribute|any attribute information about the file loaded]]. The [[docs:resource:virtual_column|virtual columns]] are specified with the ''--virtual-column'' [[docs:tabul:option|option]] that has the following syntax: --virtual-column virtualColumnName=resourceAttributeName where: * ''virtualColumnName'' is the name of the column added * ''resourceAttributeName'' is the name of [[docs:resource:attribute|resource attribute]]. For instance, if we want to add: * the [[docs:resource:logical_name|logical name attribute]] * into a column called ''name'', we would add ''--virtual-column name=logical_name'' With the [[docs:tabul:data:print|print command]], the full command would look like below. tabul data print \ --attribute columnName=data \ --virtual-column name=logical_name \ yaml/phonebook/Bob.yaml@howto yaml/phonebook/Bob.yaml@howto data name ------------------------------------------------------------------------------------------------------------------- ---- {\n "phones": [\n "323-423-2238",\n "747-444-3222",\n "475-202-7142"\n ]\n ,\n "city": "New York"\n}\n Bob ==== Reloading the YAML files with the virtual column name ==== To reload the YAML files: * with the extra column ''name'' and the [[docs:resource:logical_name|file logical name]] * with a column ''data'' containing the ''YAML document'' the syntax of the [[docs:tabul:data:replace|tabul data replace]] would be: tabul data replace \ --source-attribute columnName=data \ --virtual-column name=logical_name \ yaml/phonebook/*@howto user@sqlite where: * ''replace'' will ''drop'' the target ''user@sqlite'' * ''%%--source-attribute columnName=data%%'' will set the [[docs:resource:json#attribute|JSON attribute columnName]] to ''data'' * ''%%--virtual-column name=logical_name%%'' will add the [[docs:resource:virtual_column|virtual column]] names ''name'' with the [[docs:resource:logical_name|logical name attribute]] of the source * ''yaml/phonebook/*@howto'' is the [[docs:flow:data_selector|source data selector]] * ''user@sqlite'' is the [[docs:flow:template_data_uri|target uri]] Output: Transfer results input target latency record_count error_code error_message -------------------------------- ----------- ------- ------------ ---------- ------------- yaml/phonebook/Arnold.yaml@howto user@sqlite 0.64s 1 yaml/phonebook/Bob.yaml@howto user@sqlite 0.10s 1 yaml/phonebook/Dale.yaml@howto user@sqlite 0.9s 1 yaml/phonebook/Daniel.yaml@howto user@sqlite 0.7s 1 yaml/phonebook/Don.yaml@howto user@sqlite 0.8s 1 The YAML data loaded has now the ''name'' column as identifier. tabul data print user@sqlite user@sqlite data name -------------------------------------------------------------------------------------------------------------------- ------ {\n "phones": [\n "704-555-0150",\n "202-555-0150"\n ]\n ,\n "city": "Los Angeles"\n}\n Arnold {\n "phones": [\n "323-423-2238",\n "747-444-3222",\n "475-202-7142"\n ]\n ,\n "city": "New York"\n}\n Bob {\n "phones": [\n "731-414-7236",\n "256-233-3557",\n "409-645-3618"\n ]\n ,\n "city": "New York"\n}\n Dale {\n "phones": [\n "425-898-3481",\n "360-637-7972",\n "205-533-8271"\n ]\n ,\n "city": "Princeton"\n}\n Daniel {\n "phones": [\n "470-293-7517"\n ]\n ,\n "city": "San Diego"\n}\n Don ==== Performing JSON SQL statement ==== Inspired from the [[https://www.sqlite.org/json1.html#examples_using_json_each_and_json_tree_|SQLite examples of the JSON documentation - section 4.3.1]], we will now run the below SQL with JSON function. -- https://www.sqlite.org/json1.html#examples_using_json_each_and_json_tree_ SELECT distinct name FROM user, json_tree(user.data, '$.phones') WHERE value LIKE '704-%'; To see the result of this [[docs:resource:sql_select|SQL query]], we will use the [[docs:tabul:data:print|print command]] and see that ''Arnold'' is the user that have a phone number that starts with ''704''. tabul data print '(sqlite/user_phone_json_from_yaml.sql@howto)@sqlite' # The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell) (sqlite/user_phone_json_from_yaml.sql@howto)@sqlite name ------ Arnold ==== Next ==== If you want to explore the JSON SQL function of SQLite, you can read the [[https://www.sqlite.org/json1.html|Sqlite JSON documentation]].