howto:json:load

About

This howto will show you how to load json document (files) into a relational database.

This example uses the sqlite howto database but you may use any other relational database

Steps

tabul data drop --no-strict-selection user@sqlite

The Json files

To be able to follow along the SQLite examples of the JSON documentation, we will load phones numbers in a user table.

The JSON files are located in the subdirectory json/phone of the howto directory where there is one file by user containing a JSON array of phones.

With the data list command, we can list them.

tabul data list json/phone/*.json@howto

path                     media_type
----------------------   ----------
json/phone/Arnold.json   text/json
json/phone/Bob.json      text/json
json/phone/Dale.json     text/json
json/phone/Daniel.json   text/json
json/phone/Don.json      text/json

With the data print command, we can see for instance the JSON array of phones in the Bob.json file

tabul data print json/phone/Bob.json@howto

json/phone/Bob.json@howto
json
----------------------------------------------
["323-423-2238","747-444-3222","475-202-7142"]

Json relational structure

By default, a json file is seen as a relation with one column called json

The structure of the JSON file can be seen with the data describe command.

For instance, for the Don.json file, there is one column:

at the position 1
called json
with the json data type

tabul data describe json/phone/Don.json@howto

Structure of the resource json/phone/Don.json@howto
position   name   type   precision   scale   primary_key   nullable   autoincrement   comment
--------   ----   ----   ---------   -----   -----------   --------   -------------   -------
1          json   json   0           0                     x

Loading the JSON files

The below data insert command will load the JSON files into the phones table of the sqlite howto database.

tabul data drop --no-strict-selection user@sqlite

tabul data insert json/phone/*.json@howto  user@sqlite

Transfer results
input                          target        latency   record_count   error_code   error_message
----------------------------   -----------   -------   ------------   ----------   -------------
json/phone/Arnold.json@howto   user@sqlite   0.60s                1                
json/phone/Bob.json@howto      user@sqlite   0.8s                 1                
json/phone/Dale.json@howto     user@sqlite   0.9s                 1                
json/phone/Daniel.json@howto   user@sqlite   0.9s                 1                
json/phone/Don.json@howto      user@sqlite   0.8s                 1

You can inspect the content with the head command

tabul data head user@sqlite

The first 10 rows of the data resource (user@sqlite):
json
-----------------------------------------------
["704-555-0150","202-555-0150"]
["323-423-2238","747-444-3222","475-202-7142"]
["731-414-7236","256-233-3557","409-645-3618"]
["425-898-3481","360-637-7972", "205-533-8271"]
["470-293-7517"]

Changing the name of the JSON column

The name of the JSON column is by default json but we can change it by setting the columnName json attribute

After running the below head command, we can see that the column name has changed to phones

tabul data head --attribute columnName=phone json/phone/Don.json@howto

The first 10 rows of the data resource (json/phone/Don.json@howto):
phone
----------------
["470-293-7517"]

Adding a column with the logical name of the JSON file

Because the JSON content of files does not have any kind of identifier, we can see that the data is actually worthless because we can't relate the phone to a user.

To be able to add context to file loaded, Tabulify supports the concept of virtual columns where it's possible to add any attribute information about the file loaded.

The virtual columns are specified with the –virtual-column option that has the following syntax:

--virtual-column virtualColumnName=resourceAttributeName

where:

virtualColumnName is the name of the column added
resourceAttributeName is the name of resource attribute.

For instance, if we want to add:

the logical name attribute
into a column called name,

we would add –virtual-column name=logical_name

With the print command, the full command would look like below.

tabul data print --attribute columnName=phone --virtual-column name=logical_name json/phone/Bob.json@howto

json/phone/Bob.json@howto
phone                                            name
----------------------------------------------   ----
["323-423-2238","747-444-3222","475-202-7142"]   Bob

Reloading the JSON files with the virtual column name

First, we will drop the old structure with the json column

tabul data drop --no-strict-selection user@sqlite

To reload the JSON files:

with the extra column name and the file logical name
with a column phones containing the JSON document

the syntax of the tabul data replace would be:

tabul data replace --source-attribute columnName=phone --virtual-column name=logical_name json/phone/*@howto  user@sqlite

where:

--source-attribute columnName=phones will set the JSON attribute columnName to phone
--virtual-column name=logical_name will add the virtual column names name with the logical name attribute of the source
json/phone/*@howto is the source data selector
phones@sqlite is the target uri

Output:

Transfer results
input                          target        latency   record_count   error_code   error_message
----------------------------   -----------   -------   ------------   ----------   -------------
json/phone/Arnold.json@howto   user@sqlite   0.59s                1                
json/phone/Bob.json@howto      user@sqlite   0.9s                 1                
json/phone/Dale.json@howto     user@sqlite   0.9s                 1                
json/phone/Daniel.json@howto   user@sqlite   0.9s                 1                
json/phone/Don.json@howto      user@sqlite   0.11s                1

The JSON data loaded has now the name column as identifier.

tabul data print user@sqlite

user@sqlite
phone                                             name
-----------------------------------------------   ------
["704-555-0150","202-555-0150"]                   Arnold
["323-423-2238","747-444-3222","475-202-7142"]    Bob
["731-414-7236","256-233-3557","409-645-3618"]    Dale
["425-898-3481","360-637-7972", "205-533-8271"]   Daniel
["470-293-7517"]                                  Don

Performing JSON SQL statement

Inspired from the SQLite examples of the JSON documentation - section 4.3.1, we will now run the below SQL with JSON function.

-- https://www.sqlite.org/json1.html#examples_using_json_each_and_json_tree_
SELECT DISTINCT user.name
FROM user, json_each(user.phone)
WHERE json_each.value LIKE '704-%';

cat

To see the result of this SQL query, we will use the print command and see that Arnold is the user that have a phone number that starts with 704.

tabul data print '(sqlite/user_phone_json.sql@howto)@sqlite'
# The quotes are mandatory because parenthesis have a meaning in Bash (ie they start a subshell)

(sqlite/user_phone_json.sql@howto)@sqlite
name
------
Arnold

If you want to explore the JSON SQL function of SQLite, you can read the Sqlite JSON documentation.
A JSON file containing an JSON object can also be loaded with its first properties representing columns. To do that, you need to set the structure attribute to properties.

Table of Contents

Database HowTo - How to load JSON documents in a database

About

Steps

The Json files

Json relational structure

Loading the JSON files

Changing the name of the JSON column

Adding a column with the logical name of the JSON file

Reloading the JSON files with the virtual column name

Performing JSON SQL statement

Next