---json { "description": "This database howto will show you how to load Xml file and document into a database.", "low_quality_page": "false", "page_id": "k9z63dibggq5kin4ubajf" } --- ====== HowTo - How to load XML files into a Postgres with Tabulify ====== ===== Howto ===== This [[postgres|postgres howto]] will show you how to load [[docs:resource:xml|Xml file and document]] into [[postgres|postgres]]. ===== Steps ===== tabul data drop --no-strict-selection wikipedia@postgres ==== Start Postgres ==== The next steps will be using the [[howto:postgres:howto_connection_service|postgres howto connection]]. * You need to [[howto:postgres:howto_connection_service#docker|start the postgres docker image]]. tabul service start postgres 1 service was started Name Type -------- ------ postgres docker * [[docs:tabul:connection:ping|Ping]] your connection to check that the database is up. tabul connection ping postgres The connection (postgres) has been pinged successfully ==== The XML file to load ==== We will load the [[docs:resource:xml|XML file]] ''relation.xml'' * located in the ''XML'' subdirectory of the [[docs:connection:howto|howto directory connection]] * that contains information over the [[wp>Relation_(database)|relation wikipedia page]] * that have been created by the [[https://www.mediawiki.org/w/api.php|mediawiki API]] ==== The structure of an XML file ==== === Default === An [[docs:resource:xml|XML resource]] is seen by ''Tabulify'' by default as a [[docs:resource:tabular|tabular resource]] that have: * one column with the name ''xml'' and the ''xml'' [[docs:data_type:data_type|data type]] * one row. tabul data describe xml/relation.xml@howto Structure of the resource xml/relation.xml@howto position name type precision scale primary_key nullable autoincrement comment -------- ---- ---- --------- ----- ----------- -------- ------------- ------- 1 xml xml 0 0 x === Changing the XML column name === You can change the name of the ''xml'' column with the [[docs:resource:xml#attributes|columnName attribute]] For instance, to name the column ''data'', you would set the ''%%--attribute%%'' [[docs:tabul:option|option]] with the value ''column_name=data'' tabul data describe --attribute column_name=data xml/relation.xml@howto Structure of the resource xml/relation.xml@howto position name type precision scale primary_key nullable autoincrement comment -------- ---- ---- --------- ----- ----------- -------- ------------- ------- 1 data xml 0 0 x === Adding context with virtual columns === You can add extra information about the [[docs:resource:xml|XML resource]] loaded via the [[docs:resource:virtual_column|virtual column]]. For instance, if you want to add the [[docs:resource:logical_name|logical name]]. tabul data describe --virtual-column filename=logical_name xml/relation.xml@howto Structure of the resource xml/relation.xml@howto position name type precision scale primary_key nullable autoincrement comment -------- -------- ----------------- --------- ----- ----------- -------- ------------- ------- 1 xml xml 0 0 x 2 filename character varying 0 0 x ==== Loading the XML file into the Postgres database ==== To load an XML file, you can use any [[docs:tabul:data:transfer|transfer command]]. We will be using the [[docs:tabul:data:insert|insert command]] that will: * insert the [[docs:resource:xml|XML file]] * in a ''wikipedia'' target [[docs:resource:sql_table|table]] tabul data insert \ --virtual-column name=logicalName \ xml/relation.xml@howto wikipedia@postgres Transfer results input target latency record_count error_code error_message ---------------------- ------------------ ------- ------------ ---------- ------------- xml/relation.xml@howto wikipedia@postgres 0.70s 1 We can see that there is one row loaded in the ''wikipedia'' table with the [[docs:tabul:data:list|list command]] tabul data list -a count wikipedia@postgres path media_type count --------- ---------- ----- wikipedia table 1 ==== Loading from a Web API ==== ''Tabulify'' supports also to load XML data (even [[docs:resource:json|JSON]]) from a [[docs:resource:uri|URL]]. Let's add the wikipedia XML data for the page [[wp>SQL|SQL wikipedia page]] This command line will work only with a [[docs:common:bash|bash based terminal]]. More ... see [[docs:common:bash#why_do_we_recommend_bash|Why Bash]] tabul data insert "https://en.wikipedia.org/w/api.php?action=query&titles=SQL&format=xml&prop=description|categories" wikipedia@postgres Transfer results input target latency record_count error_code error_message -------------------------------- ------------------ ------- ------------ ---------- ------------- api.php@https-en.wikipedia.org\w wikipedia@postgres 0.603s 1 And we have now 2 rows tabul data list -a count wikipedia@postgres path media_type count --------- ---------- ----- wikipedia table 2 ==== Analyzing the XML with the SQL XML functions ==== Now that our XML have been loaded, we can analyze them with SQL XML function. [[docs:system:postgres:postgres|Postgres]] documents all this XML function on this [[https://www.postgresql.org/docs/9.1/functions-xml.html|page]]. For instance, to extract the title and description, we can use the [[https://www.postgresql.org/docs/9.1/functions-xml.html#FUNCTIONS-XML-PROCESSING|xpath function]] SELECT CAST((xpath('/api/query/pages/page/@title', xml))[1] AS text) AS title, CAST((xpath('/api/query/pages/page/@description', xml))[1] AS text) AS description FROM wikipedia; Let's run this [[docs:resource:sql_select|Query SQL select]] tabul data print '(postgres/xml_wikipedia.sql@howto)@postgres' # The quotes are mandatory because parenthesis have a meaning in Bash (ie they start a subshell) (postgres/xml_wikipedia.sql@howto)@postgres title description ------------------- ----------------------------------------------- Relation (database) The base data structure of relational databases SQL Relational database programming language ==== Next ==== You can explore further, the [[https://www.postgresql.org/docs/9.1/functions-xml.html|XML Postgres functions]] select xml, -- Because XML is an XML object, you will normally not see the content in a database IDE cast(xml as text) -- By casting it as text, you will see the content xml is document, -- You can also check that this is a XML document xml_is_well_formed_document(cast(xml as text)), -- And that it's well formed from wikipedia; or you can also try to [[howto:json:load_json|load JSON document]].