howto:xml:load

Howto

This Howto will show you how to load Xml file and document into a database.

Steps

tabul data drop --no-strict-selection wikipedia@postgres

Start Postgres

Normally our database howto are created with Sqlite but because Sqlite does not support any SQL XML function, nor native XML data type, we have created this howto with the Postgres database.

The next steps will be using the postgres howto connection.

You need to start the postgres docker image.

tabul service start postgres

1 service was started
Name       Type
--------   ------
postgres   docker

Ping your connection to check that the database is up.

tabul connection ping postgres

The connection (postgres) has been pinged successfully

The XML file to load

We will load the XML file relation.xml

located in the XML subdirectory of the howto directory connection
that contains information over the relation wikipedia page
that have been created by the mediawiki API

<?xml version="1.0"?>
<api batchcomplete="">
  <query>
    <normalized>
      <n from="Relation_(database)" to="Relation (database)"/>
    </normalized>
    <pages>
      <page _idx="23720058" pageid="23720058" ns="0" title="Relation (database)" description="The base data structure of relational databases">
        <categories>
          <cl ns="14" title="Category:Articles with example SQL code"/>
          <cl ns="14" title="Category:Database management systems"/>
          <cl ns="14" title="Category:Relational model"/>
        </categories>
      </page>
    </pages>
  </query>
</api>

The structure of an XML file

Default

An XML resource is seen by Tabulify by default as a tabular resource that have:

one column with the name xml and the xml data type
one row.

tabul data describe xml/relation.xml@howto

Structure of the resource xml/relation.xml@howto
position   name   type   precision   scale   primary_key   nullable   autoincrement   comment
--------   ----   ----   ---------   -----   -----------   --------   -------------   -------
1          xml    xml    0           0                     x

Changing the XML column name

You can change the name of the xml column with the columnName attribute

For instance, to name the column data, you would set the --attribute option with the value column_name=data

tabul data describe --attribute column_name=data xml/relation.xml@howto

Structure of the resource xml/relation.xml@howto
position   name   type   precision   scale   primary_key   nullable   autoincrement   comment
--------   ----   ----   ---------   -----   -----------   --------   -------------   -------
1          data   xml    0           0                     x

Adding context with virtual columns

You can add extra information about the XML resource loaded via the virtual column.

For instance, if you want to add the logical name.

tabul data describe --virtual-column filename=logical_name xml/relation.xml@howto

Structure of the resource xml/relation.xml@howto
position   name       type                precision   scale   primary_key   nullable   autoincrement   comment
--------   --------   -----------------   ---------   -----   -----------   --------   -------------   -------
1          xml        xml                 0           0                     x
2          filename   character varying   0           0                     x

Loading the XML file into the Postgres database

To load an XML file, you can use any transfer command.

We will be using the insert command that will:

insert the XML file
in a wikipedia target table

tabul data insert --virtual-column name=logicalName xml/relation.xml@howto wikipedia@postgres

Transfer results
input                    target               latency   record_count   error_code   error_message
----------------------   ------------------   -------   ------------   ----------   -------------
xml/relation.xml@howto   wikipedia@postgres   0.72s                1

We can see that there is one row loaded in the wikipedia table with the list command

tabul data list -a count wikipedia@postgres

path        media_type   count
---------   ----------   -----
wikipedia   table            1

Loading from a Web API

Tabulify supports also to load XML data (even JSON) from a URL.

Let's add the wikipedia XML data for the page SQL wikipedia page

This command line will work only with a bash based terminal. More … see Why Bash

tabul data insert "https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description|categories" wikipedia@postgres

Transfer results
input                              target               latency   record_count   error_code   error_message
--------------------------------   ------------------   -------   ------------   ----------   -------------
[email protected]\w   wikipedia@postgres   0.342s               1

And we have now 2 rows

tabul data list -a count wikipedia@postgres

path        media_type   count
---------   ----------   -----
wikipedia   table            2

Analyzing the XML with the SQL XML functions

Now that our XML have been loaded, we can analyze them with SQL XML function. Postgres documents all this XML function on this page.

For instance, to extract the title and description, we can use the xpath function

SELECT
  CAST((xpath('/api/query/pages/page/@title', xml))[1] AS text) AS title,
  CAST((xpath('/api/query/pages/page/@description', xml))[1] AS text) AS description
FROM wikipedia;

Let's run this Query SQL select

tabul data print '(postgres/xml_wikipedia.sql@howto)@postgres'
# The quotes are mandatory because parenthesis have a meaning in Bash (ie they start a subshell)

(postgres/xml_wikipedia.sql@howto)@postgres
title                 description
-------------------   -----------------------------------------------
Relation (database)   The base data structure of relational databases
SQL                   Relational database programming language

You can explore further, the XML Postgres functions

select
    xml, -- Because XML is an XML object, you will normally not see the content in a database IDE
    cast(xml as text) -- By casting it as text, you will see the content
    xml is document,  -- You can also check that this is a XML document
    xml_is_well_formed_document(cast(xml as text)), -- And that it's well formed
from
    wikipedia;

or you can also try to load JSON document.

Table of Contents

Howto

Steps

Start Postgres

The XML file to load

The structure of an XML file

Default

Changing the XML column name

Adding context with virtual columns

Loading the XML file into the Postgres database

Loading from a Web API

Analyzing the XML with the SQL XML functions

Next