• About Dangerous DBA
  • Table of Contents
Dangerous DBA A blog for those DBA's who live on the edge

Category Archives: Admin_est_inline_length

Shredding XML in DB2 – XML Document shredding

August 9, 2012 11:00 am / Leave a Comment / dangerousDBA

XML is most useful, especially when it is well formatted and you know what it contains and all the documents stored in your XML column are the same. When they are not and you can’t exploit fully DB2’s very good XML features, and you don’t know how to query your XML it can be very hard.

The problem

Millions of records of business information with data in them, no one really knows what’s in there, and no one knows what we might be missing out on. There is a solution in place at the moment that uses a Java application to read the record from DB2 parse it as a string, this is because the element names that you could use XQuery against are all in different cases and appear in different locations in the XML therefore writing an XQuery against it is near impossible without missing something, which the Java App does anyway.

The solution – XML Shredder

I can’t and don’t take credit fully for my solution as it is largely based on this article How to list the paths of all elements in an XML document? – Dr.Matthias Nicola. I have adapted it to list both the element name, path and value. First of all we will look at a simple version, then a more advanced probably more useful version with a built in function that I found very useful so that you don’t parse a document bigger than you buffer pool attached to the temp tablespaces where the query is run.

XML Shredder – Basic Code

WITH pathstable (name, node, xpath, value) AS (
  SELECT x.name AS name, 
         x.node AS xmlnode,
         '/' || x.name AS xpath,
         x.value as value
  FROM {SCHEMA}.{TABLE} {ALIAS} ,
       XMLTABLE('${XML COLUMN YOU WISH TO SHRED}/*'
        COLUMNS
          name varchar(300) PATH './local-name()',
          value varchar(12000) PATH 'xs:string(.)',
          node    XML PATH '.') AS x
  WHERE {ALIAS}.{FIELD} etc etc
  UNION ALL
  SELECT y.name AS name, 
         y.node AS xmlnode, 
         xpath|| '/' || y.name AS xpath,
         y.value as value
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
        COLUMNS
         name varchar(300) PATH 'local-name()',
         value varchar(12000) PATH 'xs:string(.)',
         node    XML PATH '.') AS y
) SELECT name, xpath, value
  FROM pathstable

So that's the basic version of the SQL and you will probably supply the WHERE clause something to identify the row that you wish to shred. The section "'${XML COLUMN YOU WISH TO SHRED}/*'" you need to keep the '$ and /*' to wrap your XML column. It will get you a result set something like:

name          xpath                        value
-------------------------------------------------------
email         /personal/email              a@b.com
first_name    /personal/name/first_name    Philip

GOTCHA's

  1. The VARCHAR (in this case 12000) size attached to the value column needs to be long enough to all the values strung together, but if you are going to insert it anywhere this long return value can be taken out in the WHERE clause as it is useless any case and therefore the column that you insert into can be far shorter.
  2. Make sure that you can inline you XML otherwise this code will not work. This can be done using ADMIN_EST_INLINE_LENGTH({XML FIELD}) result is -1 as explained below.

XML Shredder - "Advanced" Code

So the original "Basic" code gave you no identifier to the XML, whatever yours may be so by using something like the below you can create an extended "triple-store" a quad store. I do this as the same element name can exist down different paths and can mean slightly different things depending on where it exists, plus the identifier for the row then it can make it unique and useful.


WITH pathstable ({ADDITIONAL FIELD(S)}, name, node, xpath, value) AS (
SELECT {ADDITIONAL FIELD(S)},
     x.name AS name, 
     x.node AS xmlnode,
     '/' || x.name AS xpath,
     x.value as value
FROM {SCHEMA}.{TABLE} {ALIAS} {JOIN TYPE} JOIN {SCHEMA}.{TABLE} {ALIAS} ON {ALIAS ... etc, etc},
       XMLTABLE('${XML COLUMN YOU WISH TO SHRED}/*'
    COLUMNS
      name varchar(300) PATH './local-name()',
      value varchar(12000) PATH 'xs:string(.)',
      node    XML PATH '.') AS x
WHERE ADMIN_EST_INLINE_LENGTH(XML FIELD) <> -1
    AND {ANY OTHER WHERE CLAUSES THAT AFFECT THE RESULT SET}
UNION ALL
SELECT {ADDITIONAL FIELD(S)},
     y.name AS name, 
     y.node AS xmlnode, 
     xpath|| '/' || y.name AS xpath,
     y.value as value
FROM pathstable,
   XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
    COLUMNS
     name varchar(300) PATH 'local-name()',
     value varchar(12000) PATH 'xs:string(.)',
     node    XML PATH '.') AS y
) SELECT {ADDITIONAL FIELD(S)}, name, xpath, value
FROM pathstable
WHERE NAME {SOME PREDICATE} {SOME VALUE}
          OR
      VALUE {SOME PREDICATE} {SOME VALUE}
          OR
      {ADDITIONAL FIELD(S)} {SOME PREDICATE} {SOME VALUE}

In addition to the all the new code there to identify the record so you could store it in a table to use later you will notice the line ADMIN_EST_INLINE_LENGTH({XML FIELD}) <> -1. This inline function, full information from IBM DB2 Info Center here is used to make sure that no record that is too long too process. This is needed as your XML is stored separately therefore is not restricted by the page size of table space where your XML holding table exists and if you do not have a temporary table space large enough then this code will not work.

Future posts

Looking at how you could make this into a useful system, with the ability to be queried easily



Posted in: ADMIN_EST_INLINE_LENGTH, DB2 built in functions, DB2 Data Types, DB2 Development, Decompose XML, IBM DB2 LUW, Recursive Query, Recursive SQL, Shred XML, Varchar, XML, XMLTABLE / Tagged: ADMIN_EST_INLINE_LENGTH, DB2 Development, Decompose XML, IBM DB2 LUW, Recursive Query, Recursive SQL, Shred XML, Varchar, XML, XML PATH, XMLTABLE

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 757 other subscribers

Recent Posts

  • Self generating Simple SQL procedures – MySQL
  • Google Cloud Management – My Idea – My White Whale?
  • Position Tracker – The Stub – Pandas:
  • Position Tracker – The Stub
  • Position Tracker – In the beginning
  • Whats been going on in the world of the Dangerous DBA:
  • QCon London Day 1
  • Testing Amazon Redshift: Distribution keys and styles
  • Back to dangerous blogging
  • DB2 10.1 LUW Certification 611 notes 1 : Physical Design

Dangerous Topics

added functionality ADMIN_EST_INLINE_LENGTH Bootcamp colum convert data types DB2 db2 DB2 Administration DB2 Development db2advis db2licm Decompose XML EXPORT GCP Google IBM IBM DB2 LUW idug information centre infosphere IOT LOAD merry christmas and a happy new year Position Tracking python Recursive Query Recursive SQL Reorganisation Reorganise Reorganise Indexes Reorganise Tables Runstats sql statement Stored Procedures SYSPROC.ADMIN_CMD Time UDF User Defined Functions V9.7 V10.1 Varchar XML XML PATH XMLTABLE

DangerousDBA Links

  • DB2 for WebSphere Commerce
  • My Personal Blog

Disclaimer:

The posts here represent my personal views and not those of my employer. Any technical advice or instructions are based on my own personal knowledge and experience, and should only be followed by an expert after a careful analysis. Please test any actions before performing them in a critical or nonrecoverable environment. Any actions taken based on my experiences should be done with extreme caution. I am not responsible for any adverse results. DB2 is a trademark of IBM. I am not an employee or representative of IBM.

Advertising

© Copyright 2023 - Dangerous DBA
Infinity Theme by DesignCoral / WordPress