This Howto will show you how to load Xml file and document into a database.
Normally our database howto are created with Sqlite but because Sqlite does not support any SQL XML function, nor native XML data type, we have created this howto with the Postgres database.
The next steps will be using the postgres howto connection.
tabul service start postgres
1 service was started
Name Type
-------- ------
postgres docker
tabul connection ping postgres
The connection (postgres) has been pinged successfully
We will load the XML file relation.xml
<?xml version="1.0"?>
<api batchcomplete="">
<query>
<normalized>
<n from="Relation_(database)" to="Relation (database)"/>
</normalized>
<pages>
<page _idx="23720058" pageid="23720058" ns="0" title="Relation (database)" description="The base data structure of relational databases">
<categories>
<cl ns="14" title="Category:Articles with example SQL code"/>
<cl ns="14" title="Category:Database management systems"/>
<cl ns="14" title="Category:Relational model"/>
</categories>
</page>
</pages>
</query>
</api>
An XML resource is seen by Tabulify by default as a tabular resource that have:
tabul data describe xml/relation.xml@howto
Structure of the resource xml/relation.xml@howto
position name type precision scale primary_key nullable autoincrement comment
-------- ---- ---- --------- ----- ----------- -------- ------------- -------
1 xml xml 0 0 x
You can change the name of the xml column with the columnName attribute
For instance, to name the column data, you would set the --attribute option with the value column_name=data
tabul data describe --attribute column_name=data xml/relation.xml@howto
Structure of the resource xml/relation.xml@howto
position name type precision scale primary_key nullable autoincrement comment
-------- ---- ---- --------- ----- ----------- -------- ------------- -------
1 data xml 0 0 x
You can add extra information about the XML resource loaded via the virtual column.
For instance, if you want to add the logical name.
tabul data describe --virtual-column filename=logical_name xml/relation.xml@howto
Structure of the resource xml/relation.xml@howto
position name type precision scale primary_key nullable autoincrement comment
-------- -------- ----------------- --------- ----- ----------- -------- ------------- -------
1 xml xml 0 0 x
2 filename character varying 0 0 x
To load an XML file, you can use any transfer command.
We will be using the insert command that will:
tabul data insert --virtual-column name=logicalName xml/relation.xml@howto wikipedia@postgres
Transfer results
input target latency record_count error_code error_message
---------------------- ------------------ ------- ------------ ---------- -------------
xml/relation.xml@howto wikipedia@postgres 0.72s 1
We can see that there is one row loaded in the wikipedia table with the list command
tabul data list -a count wikipedia@postgres
path media_type count
--------- ---------- -----
wikipedia table 1
Tabulify supports also to load XML data (even JSON) from a URL.
Let's add the wikipedia XML data for the page SQL wikipedia page
This command line will work only with a bash based terminal. More … see Why Bash
tabul data insert "https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description|categories" wikipedia@postgres
Transfer results
input target latency record_count error_code error_message
-------------------------------- ------------------ ------- ------------ ---------- -------------
[email protected]\w wikipedia@postgres 0.342s 1
And we have now 2 rows
tabul data list -a count wikipedia@postgres
path media_type count
--------- ---------- -----
wikipedia table 2
Now that our XML have been loaded, we can analyze them with SQL XML function. Postgres documents all this XML function on this page.
For instance, to extract the title and description, we can use the xpath function
SELECT
CAST((xpath('/api/query/pages/page/@title', xml))[1] AS text) AS title,
CAST((xpath('/api/query/pages/page/@description', xml))[1] AS text) AS description
FROM wikipedia;
Let's run this Query SQL select
tabul data print '(postgres/xml_wikipedia.sql@howto)@postgres'
# The quotes are mandatory because parenthesis have a meaning in Bash (ie they start a subshell)
(postgres/xml_wikipedia.sql@howto)@postgres
title description
------------------- -----------------------------------------------
Relation (database) The base data structure of relational databases
SQL Relational database programming language
You can explore further, the XML Postgres functions
select
xml, -- Because XML is an XML object, you will normally not see the content in a database IDE
cast(xml as text) -- By casting it as text, you will see the content
xml is document, -- You can also check that this is a XML document
xml_is_well_formed_document(cast(xml as text)), -- And that it's well formed
from
wikipedia;
or you can also try to load JSON document.