• About Dangerous DBA
  • Table of Contents
Dangerous DBA A blog for those DBA's who live on the edge

Tag Archives: Ibm

XML Shredding – Data Model and code

October 19, 2012 10:01 pm / Leave a Comment / dangerousDBA
ShreddingERD

In a previous post I wrote about shredding xml especially when you don’t know how your XML is formatted or it is always different, which where I work the XML is not XML it is a pseudo XML that conforms to an XML column.

What am I giving you

So below you will find some “light” documentation on the system that is like the system that I have created at my workplace to shred XML and store it in a way that will be easy to query. I will also go through some of the GOTCHA’s that I have seen so far in doing this. At the bottom of the page you will find a file with all the code you need.

GOTCHA’s

  1. You may find that no size of VARCHAR column is big enough to hold your XML data in the query especially when the initial stages of the recursive query therefore you may need to to use a CLOB column. I have found this is a little quicker than a very large VARCHAR column, and you will not need to create a temporary tablespace bufferpool so large to hold the query.
  2. You may have elements that are named the same thing (e.g. email, Email, EMAIL, eMAIL) and exist down the same XPATH then I have created the SHREDDED_REQUESTS_ELEMENTS_GROUP table and this will create a store for all the same named and located elements normalised around the XPATH and element name being lowered.
  3. The query code will produce entries that contain multiple values when the element is not the lowest element. So if you say had /booking/personal with /email and /name under it then you would get three rows output with /booking/personal with a value of the the values of /email and /name concatenated together, and the two additional rows of /booking/personal/email and /booking/personal/name, therefore you will need to build into the WHERE clause to exclude any paths like /booking/personal out of the result sets.

XML Shredding – ERD

XML Shredding – The Tables

There are four tables in the solution a staging table and then three others that make up a mini-flake like schema.

SHREDDED_REQUESTS_STAGE

The staging tables for the solution where the initial shredding of the XML is inserted into. This table is where the rest of the processing starts. The thing to watch out for here is the size of the XML_ELEMENT_VALUE column; here you need to make sure that you do not make this column too small. You could get rid of the date column but I find it useful if you are processing multiple days.

SHREDDED_REQUESTS

This table is where the data resides at the end of the process and resides for the rest of the data life cycle. It is essentially the same staging table but I have made a surrogate key of the the column XML_ELEMENT_NAME and XML_XPATH. This is to reduce the storage needs of the table and makes it easier to search.

SHREDDED_REQUEST_ELEMENTS

This is the store for the XML_ELEMENT_NAME and XML_XPATH once they have been made a surrogate. It also holds the key out too the grouping table.

SHREDDED_REQUESTS_ELEMENTS_GROUPS

This is the “final” table in the solution, and you may be able to leave it off because I had to create it too enable the normalisation of all the same element names in different case (e.g. EMAIL, Email, email and eMail) issues in the XML_ELEMENT_NAME and XML_XPATH.

XML Shredding – The Process

The process happens in four stages:

1.Shed the XML into the staging table

So this stage has to happen otherwise there is very little point in this process. It uses a bit of recursive SQL as out lined in a previous post and all the element values, XPaths and element names with the appropriate identifier.

To speed this up and in the code supplied in the attachment you will see that I use an EXPORT statement to unload and an LOAD to make sure that the process happens as fast as possible, with the minimum of log contention and use. It is basic maths that you are going to be inserting a lot of rows, say your site does 100,000 requests and there are 50 elements on each request that is 5,000,000 rows!

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS_STAGE(IN IN_START DATE, IN IN_END DATE)
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Takes in two dates
    --Does this using a EXPORT and LOAD so that it works quicker than insert
    --   as the actual query takes no time at all and it is only the inserting 
    --   that takes the time.
    --Uses the ADMIN_EST_INLINE_LENGTH Function from http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0054083.html 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process
            --Create the EXPORT string
            SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09 
                                WITH pathstable (REQUEST_ID,REQUEST_DATE, name, node, xpath, value) AS (
                                  SELECT HM.REQUEST_ID,
                                         HM.REQUEST_DATE,
                                         x.name AS name, 
                                         x.node AS xmlnode,
                                         ''/'' || x.name AS xpath,
                                         x.value as value
                                  FROM {SCHEMA}.{TABLE WITH XML COLUMN} HM,
                                       XMLTABLE(''$REQUEST_XML/*''
                                        COLUMNS
                                          name varchar(300) PATH ''./local-name()'',
                                          value varchar(12000) PATH ''xs:string(.)'',
                                          node    XML PATH ''.'') AS x
                                   WHERE HR.REQUEST_DATE BETWEEN ''' || IN_START || ''' AND ''' || IN_END || '''                                        
                                        AND ADMIN_EST_INLINE_LENGTH(HM.REQUEST_XML) {} -1
                                  UNION ALL
                                  SELECT REQUEST_ID,
                                         REQUEST_DATE,
                                         y.name AS name, 
                                         y.node AS xmlnode, 
                                         xpath|| ''/'' || y.name AS xpath,
                                         y.value as value
                                  FROM pathstable,
                                       XMLTABLE(''$XMLNODE/(*,@*)'' PASSING pathstable.node AS "XMLNODE"
                                        COLUMNS
                                         name varchar(300) PATH ''local-name()'',
                                         value varchar(12000) PATH ''xs:string(.)'',
                                         node    XML PATH ''.'') AS y
                                ) SELECT REQUEST_ID, REQUEST_DATE, name, xpath, value
                                  FROM pathstable
                                  WHERE NAME {} ''serialized''
                                    AND NAME {} ''_viewstate''
                                    AND NAME {} ''__shorturl_postdata'' 
                                    AND NAME {} ''API_Reply''
                                    AND NAME {} ''apiReply'' ';

            --Execute the EXPORT string
            CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

            --Create the LOAD string
            SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09
                              METHOD P (1,2,3,4,5)
                              INSERT INTO {SCHEMA}.SHREDDED_REQUESTS_STAGE(
                                REQUEST_ID,
                                REQUEST_DATE,
                                XML_ELEMENT_NAME,
                                XML_XPATH,
                                XML_ELEMENT_VALUE
                              ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';

            --Execue the LOAD string
            CALL SYSPROC.ADMIN_CMD(LOADSTRING);

END

2.Gather the group elements

If you have nicely formed XML without upper / lower case and everything in-between you might not need this stage, but here I gather the element names and paths and normalise them to lower, and insert the new ones to be used in the next stage. All the normalised values are given an ID

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS_GROUPS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS(
        LOWER_XML_ELEMENT_NAME,
        LOWER_XML_XPATH
    )
    SELECT DISTINCT LOWER(HSRS.XML_ELEMENT_NAME) AS LOWER_XML_ELEMENT_NAME,
                    LOWER(HSRS.XML_XPATH) AS LOWER_XML_XPATH
     FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS
     WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSRE
                      WHERE LOWER(HSRS.XML_ELEMENT_NAME) = HSRE.LOWER_XML_ELEMENT_NAME
                        AND LOWER(HSRS.XML_XPATH) = HSRE.LOWER_XML_XPATH);

    COMMIT;
END

3.Gather the elements

At this stage I gather all the new elements and paths (un-normailised) and put them into the table matched against the grouped version. This gives you a table that can be used in the next stage to create the final table with a far more index friendly integer for the elements and paths.

Again an example of the SP:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS(
        XML_ELEMENT_NAME,
        XML_XPATH,
        SHREDDED_REQUEST_ELEMENT_GROUP_ID
    )
    SELECT DISTINCT HSRS.XML_ELEMENT_NAME,
                    HSRS.XML_XPATH,
                    HSREG.SHREDDED_REQUEST_ELEMENT_GROUP_ID
    FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSREG ON LOWER(HSRS.XML_ELEMENT_NAME) = HSREG.LOWER_XML_ELEMENT_NAME
                                                                                                    AND LOWER(HSRS.XML_XPATH) = HSREG.LOWER_XML_XPATH
    WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE
                      WHERE HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                        AND HSRS.XML_XPATH = HSRE.XML_XPATH);

    COMMIT;
END
GO

4.Add to the permanent store

Use the table from the previous stage and load the data into the final table SHREDDED_REQUESTS. Loading only the surrogate key for the element and path of the XML value.

Something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add to the main table based on what is in the staging table. 
    --Everything to go over so if there is 1 day or 100 all goes!!
    --Using Export and load to get it done quicker and save on log space. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process

    --Create the EXPORT string
    SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                        SELECT HSRS.REQUEST_ID,
                               HSRS.REQUEST_DATE,
                               HSRE.SHREDDED_REQUEST_ELEMENT_ID,
                               HSRS.XML_ELEMENT_VALUE
                        FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE ON HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                                                                                                                AND HSRS.XML_XPATH = HSRE.XML_XPATH';

    --Execute the EXPORT string
    CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

    --Create the LOAD string
    SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                      METHOD P (1,2,3,4)
                      INSERT INTO {SCHEMA}.SHREDDED_REQUESTS(
                        REQUEST_ID, 
                        REQUEST_DATE, 
                        SHREDDED_REQUEST_ELEMENT_ID, 
                        XML_ELEMENT_VALUE
                      ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';
                      
    --Execue the LOAD string
    CALL SYSPROC.ADMIN_CMD(LOADSTRING);
END

XML Shredding – The file

As a reward for getting through all of the above, or for skipping it all and going for the juicy bit at the end here is the file – enjoy

GOTCHA’s
  1. I have compression on the servers at work, I have kept COMPRESS YES statement in, if you don’t have it then you will need to remove it other wise you will be in violation of your licences.
  2. You will have to change all {values} to get it to work, to your schemas etc that are relevant to your servers
  3. Please make sure you test before you use, you have been warned



Posted in: Date, DB2 Administration, DB2 Data Types, DB2 Development, Decompose XML, IBM DB2 LUW, Recursive Query, Recursive SQL, Shred XML, V10, V9.7, Vargraphic, XML, XMLTABLE / Tagged: ADMIN_EST_INLINE_LENGTH, DB2, DB2 Administration, DB2 Development, Decompose XML, EXPORT, IBM, IBM DB2 LUW, LOAD, Recursive Query, Recursive SQL, V10.1, V9.7, XML, XML PATH, XMLTABLE

DB2 Licence Audit – When the big blue comes a knocking

August 22, 2012 4:31 pm / Leave a Comment / dangerousDBA

So we all know that we each year we pay out a not small amount of money for ongoing licencing and maintenance costs, and you may or may not know that every (apparently) three to five years IBM come round and want to check that what you pay for is what you use. In this article I want to cover what I / we have experienced so far an I will hope to do a future post on the outcomes.

What Happened?

The main IBM contact at our business was contacted by Deloitte and was told that we were going to be audited, this was not a particularly good time for us as we were moving our data centre and moving to the cloud, so genuinely we asked them if it would be OK to postpone for a couple of weeks, and comfortingly it was. The first meeting they had with us was to introduce the process and the people that would be involved. The process is that Deloitte will come in and gather the facts of your DB2 estate, but equally will want to make sure you have fully up to date on all other IBM and “IBM Company” products e.g. Lotus notes etc. They will send you a copy of the factual report for “agreement on the factual accuracy” before they “step out” of the process, hand the report to IBM and you either get a pat on the head for being a good little customer, or you pay up at the going rate for what you are over using with no special bid process.

What I did to survive!

For me this was a particularly interesting process in that the person that used to deal with the actual licensing had left the company and it had not really been picked up by anyone, so once the audit was looming it fell to me to look into and it sort out. We licence with the CPU option so I can only comment on that but the pre-audit preparation that I carried out:

1. Work out how many PVU’s a single core of your processor in your server(s) is using the Processor Value Unit Calculator provided by IBM.

2. Go to passport advantage and login and download (print ?) your latest licence documents which will have the amount of PVU’s that you have been paying for, and for what products these are on, either actual (DB2 ESE) or features (DB2 Storage Optimisation) .

3. Work out how many cores DB2 is using, either physical or virtual. So if one core is 50 PVU’s and you have two quad core processors in your server then it would be 50 * 4 * 2 = 400 PVU’s for that server to licence a product that cant be “core limited” (if there are any). The thing to note here is that if you are running a Virtual Machines (VM’s) then if you over allocate processors (e.g. not a 1 to 1 relationship) in the VM to the physical system then you need extra PVU’s for that. If you have an 8 core physical machine, but you allocate 3 VM’s with 3 cores each (3 * 3 = 9) then you will need 50 * 9 = 450 PVU’s. This “sub capacity licencing” apparently should be covered by the reports from the Tivoli Licence Compliance Manager which we had not installed anywhere. I was never asked to but I found some exceptions that I think is the reason I was never told too by the previous incumbent of licencing that can be found here

4. Compare what you think you are using to what your licencing documents say, see if there is a mismatch in the number of PVU’s. Make sure you account for the right product on the right servers by running db2licm -l. Make sure that you have enough for the product and any features, if your server is 400 PVU’s and ran DB2 ESE with Storage Optimisation, you will need 400 PVU’s of DB2 ESE and 400 PVU’s of Storage Optimisation. If you are in the red then probably time to contact your vendor and get PVU’s bought (will come to why later) and if break even or are in the black for some reason then probably no need to worry. Make sure you also work out the PVU’s used and licensed for all the features that you have (e.g. Storage Optimisation) also as they will check that too.

5. Apply your licence keys to all your instances for the product and features and make sure that when you run the db2licm -l or db2licm -g <Output file name> that everything comes back as in compliance. Otherwise get on the Passport advantage site again and download and apply those keys.

6. Find out from all parties in your business that may have an instance or two of DB2, or a legacy system that might run DB2 (basically any DB2 that you do not manage) if they know anything about it and if it is free (express-c) or licensed and who looks after that aspect. If the answers are don’t know probably best you take more work on your broad shoulders and say you will take a look at it and carry out the above steps for those systems because as they will choose this server to report on as part of the subset any unlicensed product (as sods law goes).

What do Deloitte’s do

So the first thing to note here is that Deloitte are a middle man they just “gather facts”, and these are the facts on the day that they (or you) come round to check the system, so if you are in the red like in point four above it might be possible to get PVU’s bought before they come round. This if you do it before you may be able to get them at less than list price, if it is after the audit (as a consequence of) then it will be at the IBM list price! The other point to note is that they will want you to do most of the checking yourself, so not only are they getting paid by IBM to do the audit, they expect you to collect their evidence for them. The spreadsheet they sent us was for the wrong licensing format (user as opposed to CPU) and so we got them to come in and “do what they needed to do”, but as it turns out we still had to most of it as they were not allowed and provide them with the evidence, for them to collect or put into a spreadsheet!

Evidence they wanted off me as the DBA was the output from db2licm -g and information on the number of cores DB2 was running on. From our system admins they wanted scripts running on a subset of our whole server estate and screen shots of various things.

The stage we are at currently is that we are waiting for the report on factual accuracy, so that we can sign it off before IBM see it to give us either a pat on the head or a caning by the big blue headmaster.

Pass or Fail

Well in the end we passed, which was good. There was some wrangling over the fact we had less than 1000 employees and the type of licensing monitoring we needed to do, which was less than they thought.

Future Posts

Look at some more of the db2lic commands features to make auditing easier.



Posted in: DB2 Administration, DB2 Built in commands, db2licm, IBM DB2 LUW / Tagged: DB2, DB2 Administration, db2licm, Deloitte, IBM, IBM DB2 LUW

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 757 other subscribers

Recent Posts

  • Self generating Simple SQL procedures – MySQL
  • Google Cloud Management – My Idea – My White Whale?
  • Position Tracker – The Stub – Pandas:
  • Position Tracker – The Stub
  • Position Tracker – In the beginning
  • Whats been going on in the world of the Dangerous DBA:
  • QCon London Day 1
  • Testing Amazon Redshift: Distribution keys and styles
  • Back to dangerous blogging
  • DB2 10.1 LUW Certification 611 notes 1 : Physical Design

Dangerous Topics

added functionality ADMIN_EST_INLINE_LENGTH Bootcamp colum convert data types DB2 db2 DB2 Administration DB2 Development db2advis db2licm Decompose XML EXPORT GCP Google IBM IBM DB2 LUW idug information centre infosphere IOT LOAD merry christmas and a happy new year Position Tracking python Recursive Query Recursive SQL Reorganisation Reorganise Reorganise Indexes Reorganise Tables Runstats sql statement Stored Procedures SYSPROC.ADMIN_CMD Time UDF User Defined Functions V9.7 V10.1 Varchar XML XML PATH XMLTABLE

DangerousDBA Links

  • DB2 for WebSphere Commerce
  • My Personal Blog

Disclaimer:

The posts here represent my personal views and not those of my employer. Any technical advice or instructions are based on my own personal knowledge and experience, and should only be followed by an expert after a careful analysis. Please test any actions before performing them in a critical or nonrecoverable environment. Any actions taken based on my experiences should be done with extreme caution. I am not responsible for any adverse results. DB2 is a trademark of IBM. I am not an employee or representative of IBM.

Advertising

© Copyright 2023 - Dangerous DBA
Infinity Theme by DesignCoral / WordPress