• About Dangerous DBA
  • Table of Contents
Dangerous DBA A blog for those DBA's who live on the edge

Category Archives: Vargraphic

XML Shredding – Data Model and code

October 19, 2012 10:01 pm / Leave a Comment / dangerousDBA
ShreddingERD

In a previous post I wrote about shredding xml especially when you don’t know how your XML is formatted or it is always different, which where I work the XML is not XML it is a pseudo XML that conforms to an XML column.

What am I giving you

So below you will find some “light” documentation on the system that is like the system that I have created at my workplace to shred XML and store it in a way that will be easy to query. I will also go through some of the GOTCHA’s that I have seen so far in doing this. At the bottom of the page you will find a file with all the code you need.

GOTCHA’s

  1. You may find that no size of VARCHAR column is big enough to hold your XML data in the query especially when the initial stages of the recursive query therefore you may need to to use a CLOB column. I have found this is a little quicker than a very large VARCHAR column, and you will not need to create a temporary tablespace bufferpool so large to hold the query.
  2. You may have elements that are named the same thing (e.g. email, Email, EMAIL, eMAIL) and exist down the same XPATH then I have created the SHREDDED_REQUESTS_ELEMENTS_GROUP table and this will create a store for all the same named and located elements normalised around the XPATH and element name being lowered.
  3. The query code will produce entries that contain multiple values when the element is not the lowest element. So if you say had /booking/personal with /email and /name under it then you would get three rows output with /booking/personal with a value of the the values of /email and /name concatenated together, and the two additional rows of /booking/personal/email and /booking/personal/name, therefore you will need to build into the WHERE clause to exclude any paths like /booking/personal out of the result sets.

XML Shredding – ERD

XML Shredding – The Tables

There are four tables in the solution a staging table and then three others that make up a mini-flake like schema.

SHREDDED_REQUESTS_STAGE

The staging tables for the solution where the initial shredding of the XML is inserted into. This table is where the rest of the processing starts. The thing to watch out for here is the size of the XML_ELEMENT_VALUE column; here you need to make sure that you do not make this column too small. You could get rid of the date column but I find it useful if you are processing multiple days.

SHREDDED_REQUESTS

This table is where the data resides at the end of the process and resides for the rest of the data life cycle. It is essentially the same staging table but I have made a surrogate key of the the column XML_ELEMENT_NAME and XML_XPATH. This is to reduce the storage needs of the table and makes it easier to search.

SHREDDED_REQUEST_ELEMENTS

This is the store for the XML_ELEMENT_NAME and XML_XPATH once they have been made a surrogate. It also holds the key out too the grouping table.

SHREDDED_REQUESTS_ELEMENTS_GROUPS

This is the “final” table in the solution, and you may be able to leave it off because I had to create it too enable the normalisation of all the same element names in different case (e.g. EMAIL, Email, email and eMail) issues in the XML_ELEMENT_NAME and XML_XPATH.

XML Shredding – The Process

The process happens in four stages:

1.Shed the XML into the staging table

So this stage has to happen otherwise there is very little point in this process. It uses a bit of recursive SQL as out lined in a previous post and all the element values, XPaths and element names with the appropriate identifier.

To speed this up and in the code supplied in the attachment you will see that I use an EXPORT statement to unload and an LOAD to make sure that the process happens as fast as possible, with the minimum of log contention and use. It is basic maths that you are going to be inserting a lot of rows, say your site does 100,000 requests and there are 50 elements on each request that is 5,000,000 rows!

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS_STAGE(IN IN_START DATE, IN IN_END DATE)
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Takes in two dates
    --Does this using a EXPORT and LOAD so that it works quicker than insert
    --   as the actual query takes no time at all and it is only the inserting 
    --   that takes the time.
    --Uses the ADMIN_EST_INLINE_LENGTH Function from http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0054083.html 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process
            --Create the EXPORT string
            SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09 
                                WITH pathstable (REQUEST_ID,REQUEST_DATE, name, node, xpath, value) AS (
                                  SELECT HM.REQUEST_ID,
                                         HM.REQUEST_DATE,
                                         x.name AS name, 
                                         x.node AS xmlnode,
                                         ''/'' || x.name AS xpath,
                                         x.value as value
                                  FROM {SCHEMA}.{TABLE WITH XML COLUMN} HM,
                                       XMLTABLE(''$REQUEST_XML/*''
                                        COLUMNS
                                          name varchar(300) PATH ''./local-name()'',
                                          value varchar(12000) PATH ''xs:string(.)'',
                                          node    XML PATH ''.'') AS x
                                   WHERE HR.REQUEST_DATE BETWEEN ''' || IN_START || ''' AND ''' || IN_END || '''                                        
                                        AND ADMIN_EST_INLINE_LENGTH(HM.REQUEST_XML) {} -1
                                  UNION ALL
                                  SELECT REQUEST_ID,
                                         REQUEST_DATE,
                                         y.name AS name, 
                                         y.node AS xmlnode, 
                                         xpath|| ''/'' || y.name AS xpath,
                                         y.value as value
                                  FROM pathstable,
                                       XMLTABLE(''$XMLNODE/(*,@*)'' PASSING pathstable.node AS "XMLNODE"
                                        COLUMNS
                                         name varchar(300) PATH ''local-name()'',
                                         value varchar(12000) PATH ''xs:string(.)'',
                                         node    XML PATH ''.'') AS y
                                ) SELECT REQUEST_ID, REQUEST_DATE, name, xpath, value
                                  FROM pathstable
                                  WHERE NAME {} ''serialized''
                                    AND NAME {} ''_viewstate''
                                    AND NAME {} ''__shorturl_postdata'' 
                                    AND NAME {} ''API_Reply''
                                    AND NAME {} ''apiReply'' ';

            --Execute the EXPORT string
            CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

            --Create the LOAD string
            SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09
                              METHOD P (1,2,3,4,5)
                              INSERT INTO {SCHEMA}.SHREDDED_REQUESTS_STAGE(
                                REQUEST_ID,
                                REQUEST_DATE,
                                XML_ELEMENT_NAME,
                                XML_XPATH,
                                XML_ELEMENT_VALUE
                              ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';

            --Execue the LOAD string
            CALL SYSPROC.ADMIN_CMD(LOADSTRING);

END

2.Gather the group elements

If you have nicely formed XML without upper / lower case and everything in-between you might not need this stage, but here I gather the element names and paths and normalise them to lower, and insert the new ones to be used in the next stage. All the normalised values are given an ID

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS_GROUPS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS(
        LOWER_XML_ELEMENT_NAME,
        LOWER_XML_XPATH
    )
    SELECT DISTINCT LOWER(HSRS.XML_ELEMENT_NAME) AS LOWER_XML_ELEMENT_NAME,
                    LOWER(HSRS.XML_XPATH) AS LOWER_XML_XPATH
     FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS
     WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSRE
                      WHERE LOWER(HSRS.XML_ELEMENT_NAME) = HSRE.LOWER_XML_ELEMENT_NAME
                        AND LOWER(HSRS.XML_XPATH) = HSRE.LOWER_XML_XPATH);

    COMMIT;
END

3.Gather the elements

At this stage I gather all the new elements and paths (un-normailised) and put them into the table matched against the grouped version. This gives you a table that can be used in the next stage to create the final table with a far more index friendly integer for the elements and paths.

Again an example of the SP:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS(
        XML_ELEMENT_NAME,
        XML_XPATH,
        SHREDDED_REQUEST_ELEMENT_GROUP_ID
    )
    SELECT DISTINCT HSRS.XML_ELEMENT_NAME,
                    HSRS.XML_XPATH,
                    HSREG.SHREDDED_REQUEST_ELEMENT_GROUP_ID
    FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSREG ON LOWER(HSRS.XML_ELEMENT_NAME) = HSREG.LOWER_XML_ELEMENT_NAME
                                                                                                    AND LOWER(HSRS.XML_XPATH) = HSREG.LOWER_XML_XPATH
    WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE
                      WHERE HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                        AND HSRS.XML_XPATH = HSRE.XML_XPATH);

    COMMIT;
END
GO

4.Add to the permanent store

Use the table from the previous stage and load the data into the final table SHREDDED_REQUESTS. Loading only the surrogate key for the element and path of the XML value.

Something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add to the main table based on what is in the staging table. 
    --Everything to go over so if there is 1 day or 100 all goes!!
    --Using Export and load to get it done quicker and save on log space. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process

    --Create the EXPORT string
    SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                        SELECT HSRS.REQUEST_ID,
                               HSRS.REQUEST_DATE,
                               HSRE.SHREDDED_REQUEST_ELEMENT_ID,
                               HSRS.XML_ELEMENT_VALUE
                        FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE ON HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                                                                                                                AND HSRS.XML_XPATH = HSRE.XML_XPATH';

    --Execute the EXPORT string
    CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

    --Create the LOAD string
    SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                      METHOD P (1,2,3,4)
                      INSERT INTO {SCHEMA}.SHREDDED_REQUESTS(
                        REQUEST_ID, 
                        REQUEST_DATE, 
                        SHREDDED_REQUEST_ELEMENT_ID, 
                        XML_ELEMENT_VALUE
                      ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';
                      
    --Execue the LOAD string
    CALL SYSPROC.ADMIN_CMD(LOADSTRING);
END

XML Shredding – The file

As a reward for getting through all of the above, or for skipping it all and going for the juicy bit at the end here is the file – enjoy

GOTCHA’s
  1. I have compression on the servers at work, I have kept COMPRESS YES statement in, if you don’t have it then you will need to remove it other wise you will be in violation of your licences.
  2. You will have to change all {values} to get it to work, to your schemas etc that are relevant to your servers
  3. Please make sure you test before you use, you have been warned



Posted in: Date, DB2 Administration, DB2 Data Types, DB2 Development, Decompose XML, IBM DB2 LUW, Recursive Query, Recursive SQL, Shred XML, V10, V9.7, Vargraphic, XML, XMLTABLE / Tagged: ADMIN_EST_INLINE_LENGTH, DB2, DB2 Administration, DB2 Development, Decompose XML, EXPORT, IBM, IBM DB2 LUW, LOAD, Recursive Query, Recursive SQL, V10.1, V9.7, XML, XML PATH, XMLTABLE

Dr. DBA: CLI0109E String data right truncation: SQLCODE=-1822, SQLSTATE=560BD – Cured

August 7, 2012 12:21 pm / Leave a Comment / dangerousDBA

So have you ever seen this error message, and it might be an intermittent fault which makes it an even better problem to try and solve:

Unexpected error code “22001” received from data source “<DATABASE>”. Associated text and tokens are “func=”open” msg=” CLI0109E String data right truncatio”.. SQLCODE=-1822, SQLSTATE=560BD, DRIVER=4.0.100

We had this issue on some production code that did not raise its head till a few months after it had been working fine. The issue seemed to be that a piece of SQL that was generating the error was joining two federated tables on columns of different data types and sizes. The join in question was a VARCHAR(5) to a VARGRAPHIC(50) over federation, these two data types can be joined together and the size does not matter in general until the VARGRAPHIC column went over 5 characters long, causing the intermittent error!

The solution apart from the obvious one we were facing with our data was to CAST the VARGRAPHIC into a VARCHAR(5) the same as the field that it was joining too. This is working now and has fixed the issue and is one to think about for the future.



Posted in: DB2 Administration, DB2 Data Types, DB2 Development, IBM DB2 LUW, Vargraphic / Tagged: Data types, DB2, DB2 Administration, DB2 Development, IBM DB2 LUW, VARGRAHPIC

DB2 Errors Solved: CLI0109E String data right truncation: SQLCODE=-1822, SQLSTATE=560BD

February 28, 2012 1:56 pm / Leave a Comment / dangerousDBA

So have you ever seen this error message, and it might be an intermittent fault which makes it an even better problem to try and solve:

Unexpected error code “22001” received from data source “<DATABASE>”. Associated text and tokens are “func=”open” msg=” CLI0109E  String data right truncatio”.. SQLCODE=-1822, SQLSTATE=560BD, DRIVER=4.0.100

We had this issue on some production code that did not raise its head till a few months after it had been working fine. The issue seemed to be that a piece of SQL that was generating the error was joining two federated tables on columns of different data types and sizes.  The join in question was a VARCHAR(5) to a VARGRAPHIC(50) over federation, these two data types can be joined together and the size does not matter in general until the VARGRAPHIC column went over 5 characters long,  causing the intermittent  error!

The solution apart from the obvious one we were facing with our data was to CAST the VARGRAPHIC into a VARCHAR(5) the same as the field that it was joining too. This is working now and has fixed the issue and is one to think about for the future.

Another short interesting one for a lunchtime post.

Posted in: Data types, DB2, DB2 Administration, DB2 built in functions, DB2 Maintenance, IBM, Triton, Varchar, Vargraphic / Tagged: data types, intermittent fault, sqlcode, string data, unexpected error, varchar

The problem with VARGRAPHIC data type

January 19, 2011 2:10 pm / 1 Comment / dangerousDBA

So many of you may have come across this issue when you have to deal with foreign chars in the data that you hold. When I say foreign I mean the chars that are above decimal position 126 in any ASCII table.

The problem with these is that they are not a single byte, they are a double byte, when you want to store these in a DB2 database it can cause some issues; unfortunately I don’t know about the other major providers offerings but hopefully I can offer some insights into to what I have found so far. IBM offer two pieces of advice on coping with this

  1. Make the VARCHAR field twice the length that you expect it too be
  2. Use a VARGRAPHIC field

We were doing a migration, which is a whole different story and decided to change the fields that would be accepting user input values from batch files and other jobs data types from VARCHAR to VARGRAPHIC, as they potentially contain the most varied characters.

Below is some test code and results to show you the interesting differences here:

CREATE TABLE DEV.DATA_TYPE_TEST(
VARGRAPHIC_FIELD VARGRAPHIC(20),
VARCHAR_FIELD VARCHAR(20)
) IN DEV_TS
INSERT INTO DEV.DATA_TYPE_TEST
VALUES ('Mr Bloggs','Mr Bloggs'),('Mr Blôggs','Mr Blôggs')

Possibly not the best example but it will do for this purpose , when you issue the following query to get the sizes of the field, you can see that it will cause some issues down the line with left, right and substr functions

SELECT VARGRAPHIC_FIELD,
LENGTH(VARGRAPHIC_FIELD) LEN_VARCHAR_FIELD,
VARCHAR_FIELD,
LENGTH(VARCHAR_FIELD) LEN_VARCHAR_FIELD
FROM DEV.DATA_TYPE_TEST

RESULTS:

VARGRAPHIC_FIELD     LEN_VARCHAR_FIELD     VARCHAR_FIELD     LEN_VARCHAR_FIELD
-------------------  --------------------  ----------------  --------------------
Mr Bloggs            9                     Mr Bloggs         9
Mr Blôggs            9                     Mr Blôggs         10

So this leads to some issues if you want too only get part of the field:

SELECT VARGRAPHIC_FIELD,
LEFT(VARGRAPHIC_FIELD, 6) LEFT_6_VARGRAPHIC,
VARCHAR_FIELD,
LEFT(VARCHAR_FIELD, 6) LEFT_6_VARCHAR
FROM DEV.DATA_TYPE_TEST

RESULTS:

VARGRAPHIC_FIELD     LEFT_6_VARGRAPHIC     VARCHAR_FIELD     LEFT_6_VARCHAR
-------------------  --------------------  ----------------  -----------------
Mr Bloggs            Mr Blo                Mr Bloggs         Mr Blo
Mr Blôggs            Mr Blô                Mr Blôggs         Mr Bl

As you can see the function works fine on the VARGRAPHIC field, but not on the VARCHAR field. The function seems to be getting the first six characters by position, but the foreign characters in the VARCHAR field are stored as a double byte spread across two positions in the VARCHAR field, the whole char is not returned and so is not displayed . The VARGRAPHIC field handles this better and for each position the two byte characters are held in one position and the whole char is returned.

Other things to note that you can’t EXPORT from a VARCHAR field and LOAD into a VARGRAPIC field using an IXF file. DB2 will load all the fields that are of the same or compatible types, and null all the VARCHAR to VARGRAPIC conversion fields.

When you are loading a batch file using the LOAD command in DB2 if you try to load a data point into a varchar field, and the data point is twenty characters long and the field is twenty characters then you will also fit issues. As we have already seen the in a VARCHAR field the foreign characters will count for 2 characters, therefore if you load a string into that field that looks twenty characters long and it has a foreign character in it is actually twenty-one characters. This will obviously pop the field and not necessarily issue any warnings, the LOAD statement then seems to truncate it too twenty so you will be then missing a character from the data point.

Posted in: Data types, DB2, DB2 Administration, DB2 built in functions, DB2 Maintenance, LEFT, LOWER, SUBSTR, UPPER, Varchar, Vargraphic

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 757 other subscribers

Recent Posts

  • Self generating Simple SQL procedures – MySQL
  • Google Cloud Management – My Idea – My White Whale?
  • Position Tracker – The Stub – Pandas:
  • Position Tracker – The Stub
  • Position Tracker – In the beginning
  • Whats been going on in the world of the Dangerous DBA:
  • QCon London Day 1
  • Testing Amazon Redshift: Distribution keys and styles
  • Back to dangerous blogging
  • DB2 10.1 LUW Certification 611 notes 1 : Physical Design

Dangerous Topics

added functionality ADMIN_EST_INLINE_LENGTH Amazon Bootcamp colum convert data types db2 DB2 DB2 Administration DB2 Development db2advis db2licm Decompose XML Exam EXPORT Google IBM IBM DB2 LUW idug information centre infosphere LOAD merry christmas and a happy new year Recursive Query Recursive SQL Redshift Reorganisation Reorganise Reorganise Indexes Reorganise Tables Runstats sqlcode sql statement Stored Procedures SYSPROC.ADMIN_CMD Time UDF User Defined Functions V9.7 V10.1 Varchar XML XML PATH XMLTABLE

DangerousDBA Links

  • DB2 for WebSphere Commerce
  • My Personal Blog

Disclaimer:

The posts here represent my personal views and not those of my employer. Any technical advice or instructions are based on my own personal knowledge and experience, and should only be followed by an expert after a careful analysis. Please test any actions before performing them in a critical or nonrecoverable environment. Any actions taken based on my experiences should be done with extreme caution. I am not responsible for any adverse results. DB2 is a trademark of IBM. I am not an employee or representative of IBM.

Advertising

© Copyright 2021 - Dangerous DBA
Infinity Theme by DesignCoral / WordPress