• About Dangerous DBA
  • Table of Contents
Dangerous DBA A blog for those DBA's who live on the edge

Tag Archives: V9.7

Getting an estimate – DB2 LUW V10.1 Compression

May 20, 2013 8:00 am / Leave a Comment / dangerousDBA

So you want to add compression to your house you need to get a tradesman in to give you an estimate, then carry out the work, DB2 can do all of this. Just like building an extension you need to make sure that you need all the appropriate permissions from the “council” (IBM) in place, you either need to buy the Storage Optimisation as a “feature” or as part of Advanced Enterprise Edition of DB2. Please be careful when trying to use compression because as soon as you include “COMPRESSION YES” it will set the features used to YES for compression and if you get audited you could face a hefty bill.

Benefit’s to extending to compression

At a high level the there are three ways of looking at this.
No compression
Benefits
Not having to pay the licensing fee to IBM for compression.
Costs
Large amounts of disk space used for the data, minimal amounts of data in your bufferpools as the page sizes are not made any smaller
Classic Compression
Benefits
Data is compressed on disk and saves you here, data is also compressed in the bufferpools so more pages in them; less I/0 quicker queries. Data is also compressed in the backup images.
Costs
Licensing fee to IBM. Slight increase in CPU usage for the compression dictionary usage. You need to reset the dictionary with a REORG from time to time to make sure that you get the most out of the compression.
Adaptive Compression
Benefits
Data is compressed on disk, data is also compressed in the bufferpools so more pages in them; less I/0 quicker queries. Data is also compressed in the backup images. Data is continually compressed, no need for the RESETDICTIONARY REORG in the same way as the Classic compression.
Costs
Licensing fee to IBM. Increase in CPU usage for the compression dictionary usage. Only available in the latest DB2 V10.1

Here’s what you could be saving – SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO

Handley IBM have included a very useful table function SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO. The full information for this can be found in the information centre here. This table function will estimate the savings that you will get with no compression, “standard” compression and adaptive compression, GOTCHA’s for this are below:

SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO – GOTCHA’s

  1. Tables that are partitioned will come through the stored procedure as multiple rows. You do get a partition ID which you will be able to either join out too or look up in the table SYSCAT.DATAPARTITIONS.
  2. If the table has an (or more) XML column then you will get an additional row in the results returned, a “DATA” and an “XML” compression estimation row. Together with the other gotcha you could end up a lot of a rows returned for a partitioned table with XML columns.

Getting an estimate – SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO

This procedure can be used to get information on either a table or an entire schema, obviously the later can take some time to run from what I have found especially when the tables are large. The most simple form of the stored procedure is:


SELECT * 
FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO({SCHEMA NAME}, {TABLE NAME}))

This will get you a result a little like this (sorry for the formatting):


TABSCHEMA   TABNAME   DBPARTITIONNUM   DATAPARTITIONID     OBJECT_TYPE     ROWCOMPMODE     PCTPAGESSAVED_CURRENT     AVGROWSIZE_CURRENT     PCTPAGESSAVED_STATIC     AVGROWSIZE_STATIC     PCTPAGESSAVED_ADAPTIVE     AVGROWSIZE_ADAPTIVE    
------------        ----------     -----------------     ------------------  --------------  --------------  ------------------------  ---------------------  -----------------------  --------------------  -------------------------  ---------------------- 
SCHEMA         TABLE      0                  0                   DATA            S               0                         495                    65                       173                   65                         170   

The example above shows that this table currently is using “Classic” compression, represented by the S, a blank would mean no row compression and an A would be the new adaptive compression in DB2. As you can see it gives you an estimate on the average row size in the different compression modes, this is in bytes and you will then need to work out what the full Gb / Mb size might be based on the cardinality of the table.

The table function is telling us though that there are potentially 65% savings to be made in both adaptive and classic compression, but there is a 3 byte difference and adaptive compression in my opinion is far better so I would ALTER TABLE to COMPRESS YES ADAPTIVE.

If you want to run the table function against a schema leave the table part a blank string


SELECT * 
FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO({SCHEMA NAME}, ''))

This will get you a row per table in the schema (plus any extra for XML / partitioned tables)

The future

In a future post I will look at using this table function to record the values for all tables, you can then look at a before and after and therefore prove that the change in compression and the associated REORG’s have worked.



Posted in: DB2, DB2 Administration, DB2 Built in commands, DB2 Built-in Stored Procedures, DB2 Maintenance, db2licm, IBM, IBM DB2 LUW, SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO / Tagged: ADMIN_GET_TAB_COMPRESS_INFO, DB2, DB2 Administration, db2licm, IBM DB2 LUW, Stored Procedures, SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO, V10.1, V9.7, XML

Record the size of your DB2 tables – SYSIBMADM.ADMINTABINFO

February 21, 2013 8:00 am / 2 Comments / dangerousDBA

Don’t know how your tables are growing or shrinking over time then this article should help you, and it uses built in DB2 administrative view called SYSIBMADM.ADMINTABINFO so nothing too complicated to do here; full details about SYSIBMADM.ADMINTABINFO can be found in the IBM Help Centre.

Below I will go through the DB2 objects that I have created to record this info and how you can implement this yourself.

The view using SYSIBMADM.ADMINTABINFO

So that I have something I can query during the day after I have added quantities of data or I can use it in an stored procedure to record the daily table sizes:


CREATE VIEW DB_MAIN.TABLE_SIZES AS (
    SELECT CURRENT_DATE AS STATS_DATE,
            TABNAME AS TABNAME,TABSCHEMA AS TABSCHEMA,TABTYPE AS TABTYPE,TOTAL_SIZE AS TOTAL_OBJECT_P_SIZE,DATA_SIZE AS DATA_OBJECT_P_SIZE,DICT_SIZE AS DICTIONARY_SIZE,INDEX_SIZE AS INDEX_OBJECT_P_SIZE,LOB_SIZE AS LOB_OBJECT_P_SIZE,LONG_SIZE AS LONG_OBJECT_P_SIZE,XML_SIZE AS XML_OBJECT_P_SIZE FROM table(SELECT 							
            TABNAME, 							
            TABSCHEMA, 							
            TABTYPE, 							
            DECIMAL(((data_object_p_size + index_object_p_size + long_object_p_size + lob_object_p_size + xml_object_p_size)/ 1024.0),10,3) as total_size, 							
      DECIMAL((DATA_OBJECT_P_SIZE / 1024.0),10,3) AS DATA_SIZE, 
      DECIMAL((DICTIONARY_SIZE / 1024.0),10,2) AS DICT_SIZE, 							
      DECIMAL((INDEX_OBJECT_P_SIZE / 1024.0),10,3) AS INDEX_SIZE, 
      DECIMAL((LOB_OBJECT_P_SIZE / 1024.0),10,3) AS LOB_SIZE, 							
      DECIMAL((LONG_OBJECT_P_SIZE / 1024.0),10,3) AS LONG_SIZE, DECIMAL((XML_OBJECT_P_SIZE / 1024.0),10,3) AS XML_SIZE 
    FROM SYSIBMADM.ADMINTABINFO WHERE TABSCHEMA NOT LIKE 'SYS%'							
    AND TABSCHEMA NOT LIKE 'SNAP%') as TABLESIZE
)

The view is not all the columns that are available in the view but are the ones that are the most useful for general day to day usage, there are many more here that you could use. The values are stored in Kb’s so need dividing by 1024 to get it too Mb’s. The other GOTCHA is that partitioned tables will appear as one row per partition.

Table sizes record table

Rubbish section title I know but have tried several different names. This is the meta table that will record the information from the cut down version of the view from the stored procedure below.


CREATE TABLE DB_MAIN.TABLE_SIZES_STATS  ( 
	STATS_DATE         	DATE NOT NULL,
	TABNAME            	VARCHAR(128),
	TABSCHEMA          	VARCHAR(128),
	TABTYPE            	CHARACTER(1),
	TOTAL_OBJECT_P_SIZE	DECIMAL(10,3),
	DATA_OBJECT_P_SIZE 	DECIMAL(10,3),
	DICTIONARY_SIZE    	DECIMAL(10,2),
	INDEX_OBJECT_P_SIZE	DECIMAL(10,3),
	LOB_OBJECT_P_SIZE  	DECIMAL(10,3),
	LONG_OBJECT_P_SIZE 	DECIMAL(10,3),
	XML_OBJECT_P_SIZE  	DECIMAL(10,3) 
	)
IN DB_MAIN_TS
COMPRESS YES

Please note that if you do not have the “Storage Optimisation Feature” from IBM then please do not include the line “COMPRESS YES”, otherwise if the big blue comes to do an audit you could be in trouble. The best thing to avoid this is set the licensing to hard

Stored procedure for recording table sizes using SYSIBMADM.ADMINTABINFO

This is the stored procedure that I use to stored the size of the at the time of running the SP.

CREATE PROCEDURE DB_MAIN.ADD_TABLE_SIZES_STATS   ()
LANGUAGE SQL
BEGIN
    INSERT INTO DB_MAIN.TABLE_SIZES_STATS
    SELECT *
    FROM DB_MAIN.TABLE_SIZES
    WITH UR;
END

What to do next

As stated earlier then you can use this to record the day to day table sizes, or if you are in the process of compressing your tables you can use this to record the sizes before and after. In a future article then I will be using this object created here to show how much table size has decreased in implementing adaptive compression.



Posted in: Blogging, DB2, DB2 Administration, DB2 Built in commands, DB2 built in Views, DB2 Data Types, DB2 Maintenance, DB2 Storage Optimisation, db2licm, Decimal, IBM, SYSIBMADM.ADMINTABINFO / Tagged: DB2, DB2 Administration, DB2 Development, db2licm, IBM DB2 LUW, Meta Data, SYSIBMADM.ADMINTABINFO, V10.1, V9.7

Bash: Screen most useful command for DB2

January 21, 2013 8:10 am / Leave a Comment / dangerousDBA

Previous to one of my work colleagues of more years experience in working with LINUX and UNIX enlighten me to this most useful command I believe most of you out there like me would fall into four camps:

  1. Remote desktop machine probably windows that you log into and it stays up with you command running that you can come back to later
  2. Sit around with your command window on your machine, waiting for it to be finished as the rest of your colleagues go home
  3. nohup a script, but once you have closed your session you don’t know when your script has finished
  4. CRON tab a script

There is a fifth way screen

Screen your new best friend

So once you have ssh’ed into your server and got to the command line you can use screen too create a server based terminal session that can be attached to and detached at you leisure. Screen works on “sessions” for the user you are logged in as and these sessions can be connected to and disconnected from as that user, for the life time of the session. A full list of options to the screen command can be found here.

Create a session with screen

You have your command line type in screen and press enter:


$ screen

You will be greeted with this message and prompted to press enter or space and you will be taken to a command line again, you are now in your screen session.

Seeing your screen sessions

You can see the screen sessions that you have available to you by typing in:


$ screen -ls

As you can see its a lot like the ls of the normal LINUX file system and you will get something back like this:


bash-3.2$ screen -ls
There are screens on:
        1747.ttys000.machineid   (Detached)
        1872.ttys000.machineid   (Attached)
2 Sockets in /var/folders/XZ/XZGRJK7vHZuH1b726e+9yE++-GI/-Tmp-/.screen.

As you can see there are two screen sessions on this machine and you can only attach once to a session. Someone else must be “in” the screen “1872.ttys000.machineid”

Attaching to a screen session

So once you have found your screen sessions you can attach to a screen by using the command:


bash-3.2$ screen -r 

To attach to the session above that can be used the command would be:


bash-3.2$ screen -r 1747.ttys000.machineid

This will now allow you to run commands and leave commands on screen as you log on and off. If you try to connect to a session that no longer exists you will get this:


philipcarrington$ screen -r 1747.ttys000.machineid
There is no screen to be resumed matching 1747.ttys000.machineid.

Disconnecting from a screen session

Here you have to be slightly careful in that if you press the wrong keys then you will exit your session and you will lose what it is doing, but depending on what it is doing it will not stop potentially. The command exit will work as normal as it you were in a terminal and return you to the “main” terminal session on the server:


bash-3.2$ exit

Return to the main session with


$ screen -r 1747.ttys000.machineid
[screen is terminating]

If you wish your session to continue so that you can reconnect and your command will continue running until it terminates you need to press ctrl-a followed by ctrl-d. When you do this you will be returned to the terminal that spawned the session with something like:


$ screen
[detached]

You can then list the available sessions and see this one and then attach again later.

Another way to disconnect a session and end it is to just use ctrl-d and you will see something similar to using exit:


$ screen -r 9679.ttys000.machineid
[screen is terminating]

Screen Conclusion

This is a light look at the command and as stated before you can find the full commands options here. As you can see this command is very useful and can potentially reduce your costs as you don’t need a remote desktop machine. I use this command quite extensivly especially when your connection to your server is not guaranteed for a long time like over a company VPN.



Posted in: bash, DB2, DB2 Administration, DB2 Maintenance, IBM, IBM DB2 LUW, screen / Tagged: bash, DB2, DB2 Administration, IBM DB2 LUW, long running db2 commands, long running queries, long running scripts, screen, V10.1, V9.7

Lazy RUNSTATS using SYSPROC.ADMIN_CMD

November 3, 2012 12:00 pm / Leave a Comment / dangerousDBA

So if you follow my Twitter @dangerousDBA will know that I will do anything for an easy life, and where I work thee range of DB2 skills is very varied and so making things as simple as possible is always needed. To this end using the SYSPROC.ADMIN_CMD it is possible to make this as simple as possible without knowing all the ins and outs of of the actual command.

This first one then is just a simple runstats that will runstats on all indexes and columns.


CREATE PROCEDURE DB_MAIN.RUNSTATS  (IN IN_TABLESCHEMA VARCHAR(100), IN IN_TABLENAME VARCHAR(100))
LANGUAGE SQL
BEGIN
DECLARE RUNSTATSTMT VARCHAR(255);

SET RUNSTATSTMT = 'RUNSTATS ON TABLE ' || IN_TABLESCHEMA || '.' || IN_TABLENAME || ' ON ALL COLUMNS AND INDEXES ALL ALLOW WRITE ACCESS';

CALL SYSPROC.ADMIN_CMD(RUNSTATSTMT);

END

As you can probably guess DB_MAIN is the schema that I keep all the stored procedures and tables in for maintaining the DB2 databases in. So this is easy for anyone who wants now do a total runstats on any table in the database. The second one that I created is a little more fine grained. This one runstats on all columns, but also only on an index specified so will run a little quicker.

CREATE PROCEDURE DB_MAIN.RUNSTATS_INDEX   (IN IN_TABLESCHEMA VARCHAR(100), IN IN_TABLENAME VARCHAR(100), IN IN_INDEX_NAME VARCHAR(255))
LANGUAGE SQL
BEGIN
DECLARE RUNSTATSTMT VARCHAR(1000);
SET RUNSTATSTMT = 'RUNSTATS ON TABLE ' || IN_TABLESCHEMA || '.' || IN_TABLENAME || ' ON ALL COLUMNS AND INDEXES ' || IN_TABLESCHEMA || '.' || IN_INDEX_NAME || ' ALLOW WRITE ACCESS';
CALL SYSPROC.ADMIN_CMD(RUNSTATSTMT);
END

There is not a great need to run the statistics on the columns when you are after just the index, but when in Rome. Obviously you can change these to suit your needs and take out the column stats on the index SP.



Posted in: DB2 Administration, DB2 Built in commands, DB2 built in functions, DB2 Built-in Stored Procedures, IBM DB2 LUW, Runstats, Stored Procedures, SYSPROC.ADMIN_CMD, V10 / Tagged: DB2, DB2 Administration, IBM DB2 LUW, Runstats, Stored Procedures, SYSPROC.ADMIN_CMD, Table, update stats, V10.1, V9.7

XML Shredding – Data Model and code

October 19, 2012 10:01 pm / Leave a Comment / dangerousDBA
ShreddingERD

In a previous post I wrote about shredding xml especially when you don’t know how your XML is formatted or it is always different, which where I work the XML is not XML it is a pseudo XML that conforms to an XML column.

What am I giving you

So below you will find some “light” documentation on the system that is like the system that I have created at my workplace to shred XML and store it in a way that will be easy to query. I will also go through some of the GOTCHA’s that I have seen so far in doing this. At the bottom of the page you will find a file with all the code you need.

GOTCHA’s

  1. You may find that no size of VARCHAR column is big enough to hold your XML data in the query especially when the initial stages of the recursive query therefore you may need to to use a CLOB column. I have found this is a little quicker than a very large VARCHAR column, and you will not need to create a temporary tablespace bufferpool so large to hold the query.
  2. You may have elements that are named the same thing (e.g. email, Email, EMAIL, eMAIL) and exist down the same XPATH then I have created the SHREDDED_REQUESTS_ELEMENTS_GROUP table and this will create a store for all the same named and located elements normalised around the XPATH and element name being lowered.
  3. The query code will produce entries that contain multiple values when the element is not the lowest element. So if you say had /booking/personal with /email and /name under it then you would get three rows output with /booking/personal with a value of the the values of /email and /name concatenated together, and the two additional rows of /booking/personal/email and /booking/personal/name, therefore you will need to build into the WHERE clause to exclude any paths like /booking/personal out of the result sets.

XML Shredding – ERD

XML Shredding – The Tables

There are four tables in the solution a staging table and then three others that make up a mini-flake like schema.

SHREDDED_REQUESTS_STAGE

The staging tables for the solution where the initial shredding of the XML is inserted into. This table is where the rest of the processing starts. The thing to watch out for here is the size of the XML_ELEMENT_VALUE column; here you need to make sure that you do not make this column too small. You could get rid of the date column but I find it useful if you are processing multiple days.

SHREDDED_REQUESTS

This table is where the data resides at the end of the process and resides for the rest of the data life cycle. It is essentially the same staging table but I have made a surrogate key of the the column XML_ELEMENT_NAME and XML_XPATH. This is to reduce the storage needs of the table and makes it easier to search.

SHREDDED_REQUEST_ELEMENTS

This is the store for the XML_ELEMENT_NAME and XML_XPATH once they have been made a surrogate. It also holds the key out too the grouping table.

SHREDDED_REQUESTS_ELEMENTS_GROUPS

This is the “final” table in the solution, and you may be able to leave it off because I had to create it too enable the normalisation of all the same element names in different case (e.g. EMAIL, Email, email and eMail) issues in the XML_ELEMENT_NAME and XML_XPATH.

XML Shredding – The Process

The process happens in four stages:

1.Shed the XML into the staging table

So this stage has to happen otherwise there is very little point in this process. It uses a bit of recursive SQL as out lined in a previous post and all the element values, XPaths and element names with the appropriate identifier.

To speed this up and in the code supplied in the attachment you will see that I use an EXPORT statement to unload and an LOAD to make sure that the process happens as fast as possible, with the minimum of log contention and use. It is basic maths that you are going to be inserting a lot of rows, say your site does 100,000 requests and there are 50 elements on each request that is 5,000,000 rows!

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS_STAGE(IN IN_START DATE, IN IN_END DATE)
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Takes in two dates
    --Does this using a EXPORT and LOAD so that it works quicker than insert
    --   as the actual query takes no time at all and it is only the inserting 
    --   that takes the time.
    --Uses the ADMIN_EST_INLINE_LENGTH Function from http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0054083.html 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process
            --Create the EXPORT string
            SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09 
                                WITH pathstable (REQUEST_ID,REQUEST_DATE, name, node, xpath, value) AS (
                                  SELECT HM.REQUEST_ID,
                                         HM.REQUEST_DATE,
                                         x.name AS name, 
                                         x.node AS xmlnode,
                                         ''/'' || x.name AS xpath,
                                         x.value as value
                                  FROM {SCHEMA}.{TABLE WITH XML COLUMN} HM,
                                       XMLTABLE(''$REQUEST_XML/*''
                                        COLUMNS
                                          name varchar(300) PATH ''./local-name()'',
                                          value varchar(12000) PATH ''xs:string(.)'',
                                          node    XML PATH ''.'') AS x
                                   WHERE HR.REQUEST_DATE BETWEEN ''' || IN_START || ''' AND ''' || IN_END || '''                                        
                                        AND ADMIN_EST_INLINE_LENGTH(HM.REQUEST_XML) {} -1
                                  UNION ALL
                                  SELECT REQUEST_ID,
                                         REQUEST_DATE,
                                         y.name AS name, 
                                         y.node AS xmlnode, 
                                         xpath|| ''/'' || y.name AS xpath,
                                         y.value as value
                                  FROM pathstable,
                                       XMLTABLE(''$XMLNODE/(*,@*)'' PASSING pathstable.node AS "XMLNODE"
                                        COLUMNS
                                         name varchar(300) PATH ''local-name()'',
                                         value varchar(12000) PATH ''xs:string(.)'',
                                         node    XML PATH ''.'') AS y
                                ) SELECT REQUEST_ID, REQUEST_DATE, name, xpath, value
                                  FROM pathstable
                                  WHERE NAME {} ''serialized''
                                    AND NAME {} ''_viewstate''
                                    AND NAME {} ''__shorturl_postdata'' 
                                    AND NAME {} ''API_Reply''
                                    AND NAME {} ''apiReply'' ';

            --Execute the EXPORT string
            CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

            --Create the LOAD string
            SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09
                              METHOD P (1,2,3,4,5)
                              INSERT INTO {SCHEMA}.SHREDDED_REQUESTS_STAGE(
                                REQUEST_ID,
                                REQUEST_DATE,
                                XML_ELEMENT_NAME,
                                XML_XPATH,
                                XML_ELEMENT_VALUE
                              ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';

            --Execue the LOAD string
            CALL SYSPROC.ADMIN_CMD(LOADSTRING);

END

2.Gather the group elements

If you have nicely formed XML without upper / lower case and everything in-between you might not need this stage, but here I gather the element names and paths and normalise them to lower, and insert the new ones to be used in the next stage. All the normalised values are given an ID

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS_GROUPS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS(
        LOWER_XML_ELEMENT_NAME,
        LOWER_XML_XPATH
    )
    SELECT DISTINCT LOWER(HSRS.XML_ELEMENT_NAME) AS LOWER_XML_ELEMENT_NAME,
                    LOWER(HSRS.XML_XPATH) AS LOWER_XML_XPATH
     FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS
     WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSRE
                      WHERE LOWER(HSRS.XML_ELEMENT_NAME) = HSRE.LOWER_XML_ELEMENT_NAME
                        AND LOWER(HSRS.XML_XPATH) = HSRE.LOWER_XML_XPATH);

    COMMIT;
END

3.Gather the elements

At this stage I gather all the new elements and paths (un-normailised) and put them into the table matched against the grouped version. This gives you a table that can be used in the next stage to create the final table with a far more index friendly integer for the elements and paths.

Again an example of the SP:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS(
        XML_ELEMENT_NAME,
        XML_XPATH,
        SHREDDED_REQUEST_ELEMENT_GROUP_ID
    )
    SELECT DISTINCT HSRS.XML_ELEMENT_NAME,
                    HSRS.XML_XPATH,
                    HSREG.SHREDDED_REQUEST_ELEMENT_GROUP_ID
    FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSREG ON LOWER(HSRS.XML_ELEMENT_NAME) = HSREG.LOWER_XML_ELEMENT_NAME
                                                                                                    AND LOWER(HSRS.XML_XPATH) = HSREG.LOWER_XML_XPATH
    WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE
                      WHERE HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                        AND HSRS.XML_XPATH = HSRE.XML_XPATH);

    COMMIT;
END
GO

4.Add to the permanent store

Use the table from the previous stage and load the data into the final table SHREDDED_REQUESTS. Loading only the surrogate key for the element and path of the XML value.

Something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add to the main table based on what is in the staging table. 
    --Everything to go over so if there is 1 day or 100 all goes!!
    --Using Export and load to get it done quicker and save on log space. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process

    --Create the EXPORT string
    SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                        SELECT HSRS.REQUEST_ID,
                               HSRS.REQUEST_DATE,
                               HSRE.SHREDDED_REQUEST_ELEMENT_ID,
                               HSRS.XML_ELEMENT_VALUE
                        FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE ON HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                                                                                                                AND HSRS.XML_XPATH = HSRE.XML_XPATH';

    --Execute the EXPORT string
    CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

    --Create the LOAD string
    SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                      METHOD P (1,2,3,4)
                      INSERT INTO {SCHEMA}.SHREDDED_REQUESTS(
                        REQUEST_ID, 
                        REQUEST_DATE, 
                        SHREDDED_REQUEST_ELEMENT_ID, 
                        XML_ELEMENT_VALUE
                      ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';
                      
    --Execue the LOAD string
    CALL SYSPROC.ADMIN_CMD(LOADSTRING);
END

XML Shredding – The file

As a reward for getting through all of the above, or for skipping it all and going for the juicy bit at the end here is the file – enjoy

GOTCHA’s
  1. I have compression on the servers at work, I have kept COMPRESS YES statement in, if you don’t have it then you will need to remove it other wise you will be in violation of your licences.
  2. You will have to change all {values} to get it to work, to your schemas etc that are relevant to your servers
  3. Please make sure you test before you use, you have been warned



Posted in: Date, DB2 Administration, DB2 Data Types, DB2 Development, Decompose XML, IBM DB2 LUW, Recursive Query, Recursive SQL, Shred XML, V10, V9.7, Vargraphic, XML, XMLTABLE / Tagged: ADMIN_EST_INLINE_LENGTH, DB2, DB2 Administration, DB2 Development, Decompose XML, EXPORT, IBM, IBM DB2 LUW, LOAD, Recursive Query, Recursive SQL, V10.1, V9.7, XML, XML PATH, XMLTABLE

Post Navigation

← Older Posts
 

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 757 other subscribers

Recent Posts

  • Self generating Simple SQL procedures – MySQL
  • Google Cloud Management – My Idea – My White Whale?
  • Position Tracker – The Stub – Pandas:
  • Position Tracker – The Stub
  • Position Tracker – In the beginning
  • Whats been going on in the world of the Dangerous DBA:
  • QCon London Day 1
  • Testing Amazon Redshift: Distribution keys and styles
  • Back to dangerous blogging
  • DB2 10.1 LUW Certification 611 notes 1 : Physical Design

Dangerous Topics

added functionality ADMIN_EST_INLINE_LENGTH Amazon Bootcamp colum convert data types db2 DB2 DB2 Administration DB2 Development db2advis db2licm Decompose XML Exam EXPORT Google IBM IBM DB2 LUW idug information centre infosphere LOAD merry christmas and a happy new year Recursive Query Recursive SQL Redshift Reorganisation Reorganise Reorganise Indexes Reorganise Tables Runstats sqlcode sql statement Stored Procedures SYSPROC.ADMIN_CMD Time UDF User Defined Functions V9.7 V10.1 Varchar XML XML PATH XMLTABLE

DangerousDBA Links

  • DB2 for WebSphere Commerce
  • My Personal Blog

Disclaimer:

The posts here represent my personal views and not those of my employer. Any technical advice or instructions are based on my own personal knowledge and experience, and should only be followed by an expert after a careful analysis. Please test any actions before performing them in a critical or nonrecoverable environment. Any actions taken based on my experiences should be done with extreme caution. I am not responsible for any adverse results. DB2 is a trademark of IBM. I am not an employee or representative of IBM.

Advertising

© Copyright 2021 - Dangerous DBA
Infinity Theme by DesignCoral / WordPress