• About Dangerous DBA
  • Table of Contents
Dangerous DBA A blog for those DBA's who live on the edge

Tag Archives: Db2 Development

DB2 10.1 LUW Certification 611 notes 1 : Physical Design

June 3, 2013 1:00 pm / Leave a Comment / dangerousDBA

As part of my career development it has been on the cards for me to do some certification at my current company for about four and a half years, and now they are giving me some time for it again. As you know from a previous post then I have being trying to revise for this using something that IBM published just after DB2 10.1 LUW came out and it is very verbose and boring (sorry). It seems more like a technical manual of all the ins and outs of the application than revision guide. There are now some better resources produced and can be found here. This blog post is going to cover some of the things I learn’t from Part 2: Physical Design which can be found here, please take the time to read the doc for yourself it is full of good stuff some of which I think I will be trying out in future.

Physical Design – Exam notes

WITH CHECK OPTION

This seems to be a crafty little three word clause that can add a little security to your views, but it seems at the cost of having more of a maintenance overhead for an application. The example in the document says “WHERE dept = 10“, and goes on to say how if you tried the insert for any other dept no it would not work. This is good as it would stop insertion for dept 11, but would mean that you would need one of these views for each “dept” that you needed.

Informational Constraints

I have come across these before still not really sure what they are useful for, and there is not a good explanation of them in this doc. If they are ENFORCED then how is this different from a check constraint? and if it is ENABLED QUERY OPTIMIZATION then why would you want to make it difficult to find the wrong data rows?

Storage Groups / Multi Temperature data

This is a new feature to DB2 LUW 10.1 and I have found it hard so far to get a clear definition of how the original paths you supply a create database for its data paths and automated storage relate too multi temperature data storage groups. This doc does not make it as easy as I think I have now got it but I will give it a go in a series of bullet points (how I understand it):

  • The data paths that you define when you create a database go to create the default storage group : IBMSTOGROUP.
  • You can then create additional storage groups using the CREATE STOGROUP command for hot etc data
  • The three basic tables spaces that are created at database creation (USERSPACE1, TEMPSPACE1 and SYSCATSPACE1) are created in the default IBMSTOGROUP, that is on the paths that you specified to begin with.
  • Tablespaces are then created USING the newly created storage groups and rather than the data being stored in the default data path it is stored where specified.
  • If you are lucky enough to have workload manager then Storage groups can be given tags to optimize the performance of this – bonus.
  • As I understand it though if you only specify one path in you storage group then you data will not be striped as only one container will be created
  • You then create your partitioned tables (or tables) in these tablespaces appropriately. So the hot part of your partitioned table will go in the hot STOGROUP attached tablespace.

So hopefully the above makes this a little clearer, it does for me. We are currently looking at completely re architecturing our data-pile at the moment and a lot of the design decisions that this offers will be taken into consideration.

Range Clustered Tables

This seems like a most useful feature for star schema data warehouses where there are surrogate or business keys that are generally some form of whole number (SMALLINT, INTEGER or BIGINT). There does seem to be two massive GOTCHA’s in the text though (taken from the text):

  1. DB2 pre-allocates the disk space required for the RCT at creation time. This is done by calculating the number of distinct key values and multiplying it by the table row size. This mandates that the space required for holding the entire table should be available at creation time
  2. you cannot issue an ALTER TABLE statement to alter the physical characteristics of an RCT after its creation

Now I can see this being an issue if you have the full range of a BIGINT * the size of the table fields for each row allocated and used at time of creation, but you can limit this by using the “starting from” and “ending” clauses to the key sequences.

Range clustered tables also have another useful clause of “DISALLOW OVERFLOWS”, this apparently means that reorganization operations are not required, but you will not be allowed to insert any rows that exist outside the limits that you have set, so could be another way to stop GIGO.

Admin_move_table

This seems a most useful stored procedure for moving tables from say DEV to PRODUCTION or if you find that your creation script has accidentally left the wrong table space in the statement and your table needs moving out. Tables can be moved online but this takes more resource in terms of processor and disk space. Most interesting in terms of keeping your data online and generating a new compression dictionary (and therefore also a reorg?). The full command can be found here

Temporal Tables

These are new and will defiantly feature when we re-architecture the data dump that we have currently. I have had to code logic to capture the versions of the subject and then other objects to find the latest version efficiently across multi million row data sets. The new temporal tables will take care of this for you, and still allow you to access the old data in a nearly standard query. The other bonus is that these tables can also be partitioned, but you must make sure the history table can cope with the rows that will be inserted into it.

Conclusion

Well these are my high level take away’s on the extra stuff that seems to be on the paper compared too the V9.7 exam (that I never got round to sitting). There seems to be a lot less detail in this paper compared to the one that has taken me so long to read as it contain so much detail, which leaves me to wonder is the exam a middle ground of the two? Would highly suggest reading the serise and I will be doing more posts in the mean time on the other papers in the set, found at the link at the top of the article.



Posted in: ADMIN_MOVE_TABLE, DB2, DB2 Administration, DB2 Built-in Stored Procedures, DB2 DBA Certification, DB2 DBA Exam 611, IBM, IBM DB2 LUW, Physical Design, Physical Design / Tagged: ADMIN_MOVE_TABLE, Certification, DB2, DB2 Administration, DB2 DBA Exam 611, DB2 Development, Exam, IBM DB2 LUW, Informational Constraints, Multi temperature data, Storage groups, Stored Procedures, Temporal tables

Usage Lists – Big Brother is watching, but only where he is looking

April 22, 2013 12:00 pm / 1 Comment / dangerousDBA

From my holiday reading and from watching the most excellent DB2 Night Show and specifically an episode that was done a while back by Iqbal Goralwalla of Triton Consulting (@iqbalgoralwalla) on “DB2 LUW 10.1 Cool Features No One is Talking About” I have come across Usage Lists for tables and indexes in DB2.

Why use – Usage Lists

Have you got a table or an index and you never know when or how it is used; Stored Procedures, screens, systems or dynamic SQL, or do you want to monitor the SQL that runs against a table or index then and what work is done then this could save you ploughing through a lot of code, but means it won’t be an instant fix as the code has to run.

Usage Lists

Usage Lists in DB2 essentially allow you to monitor the SQL that runs against a tables or indexes that you have identified that you want monitoring. This does not come without costs and a list of the GOTCHA’s can be found on the “Notes” section of the page here and in “Chapter 26. Usage lists” of the “Preparation Guide for Exam 611” (not sure on how much these will come up in the exam?)

GOTCHA

  • Please note in the above paragraph the words “you have identified that you want monitoring” as you will see you will only get the stats if the table is monitored and you have set up the individual monitor!

Usage Lists – Creation

Not going to lie there is a page of the IBM Info Centre that has a version of this information but it is a little hard to find unless you type in the exact words but it can be found here, as you can see from the title then it is not really close to usage lists!

First you need to set a database configuration parameter MON_OBJ_METRICS:

db2 UPDATE DATABASE CONFIGURATION USING MON_OBJ_METRICS EXTENDED

On the page mentioned above then it says you need to set this so that "statistics are collected for each entry in the usage list" but on the small scale of the testing that I did I have not found any difference in captured data.

Then for each table that you want to monitor then you need to run at a minimum:

db2 CREATE USAGE LIST {Some Memorable Name} FOR TABLE {Schema}.{Table}

There are other parts to this command that can be found here and it has some useful parts like the ability too "turn its self off" when a certain number of different statements have been run by doing something like:

db2 CREATE USAGE LIST {Some Memorable Name} FOR TABLE {Schema}.{Table} LIST SIZE {Some Number} WHEN FULL DEACTIVATE

Or a rolling list, but this might create difficulties if you want repeatability:

db2 CREATE USAGE LIST {Some Memorable Name} FOR TABLE {Schema}.{Table} LIST SIZE {Some Number} WHEN FULL WRAP

From testing if you unless you specify a LIST SIZE then the collection will continue for as long as the list is active, which is the next statement to run to get it too work.

db2 SET USAGE LIST {Some Memorable Name} STATE = ACTIVE

And to disable it again:

db2 SET USAGE LIST {Some Memorable Name} STATE = INACTIVE

So above is a quick look at how to get this to work and the links to get a better lets move on to look at what it collects.

Usage Lists - The output

The output is quite useful and the full output of the MON_GET_TABLE_USAGE_LIST table function can be found here. It is also a little disappointing because this does not return the statement only an identifier (EXECUTABLE_ID) that you can supply to the MON_GET_PKG_CACHE_STMT table function which info for this can be found at here.

You can do something like this and potentially get a lot of data on what your MON_GET_TABLE_USAGE_LIST captured and the statements from MON_GET_PKG_CACHE_STMT when joined together:


SELECT *
FROM TABLE(MON_GET_TABLE_USAGE_LIST(NULL,{Some Memorable Name},0)) A 
   LEFT JOIN
     TABLE(MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2)) B
    ON A.EXECUTABLE_ID = B.EXECUTABLE_ID

Looking at a version that yields some more focused information:


SELECT B.STMT_TEXT AS SQL_STATEMENT, 
       A.LAST_UPDATED AS LAST_RUN,
       A.NUM_REF_WITH_METRICS AS NO_TIMES_RUN,
       A.ROWS_READ,
       A.ROWS_INSERTED,
       A.ROWS_UPDATED,
       A.ROWS_DELETED,
       A.LOCK_WAIT_TIME,
       A.OBJECT_DATA_L_READS AS BUFFERPOOL_READS,
       A.OBJECT_DATA_P_READS AS NON_BUFFERPOOL_READS
FROM TABLE(MON_GET_TABLE_USAGE_LIST(NULL,{Some Memorable Name},0)) A 
   INNER JOIN
     TABLE(MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2)) B
    ON A.EXECUTABLE_ID = B.EXECUTABLE_ID

This enables you to see how efficient the query is in terms of how often it is run and the number of times, with the work that they did and how much of the data resides in the bufferpools (BUFFERPOOL_READS) and how much has to come from disk (NON_BUFFERPOOL_READS). As you can see from my not very good test system query tracking below:


 SQL_STATEMENT                                                                                                 LAST_RUN             NO_TIMES_RUN     ROWS_READ     ROWS_INSERTED     ROWS_UPDATED     ROWS_DELETED     LOCK_WAIT_TIME     BUFFERPOOL_READS     NON_BUFFERPOOL_READS    
 ------------------------------------------------------------------------------------------------------------  -------------------  ---------------  ------------  ----------------  ---------------  ---------------  -----------------  -------------------  ----------------------- 
 insert into {schema}.{table} ({field},{field},{field}) VALUES ({value},{value},{value})  21/04/2013 10:10:54  1                0             1                 0                0                0                  1                    0                       

As you can see here this update all happened inside the bufferpool as it was very small on a table with no data. If you can find a statement that you are interested in because it has a large amount of non logical data reads you can use the captured code and pass it through db2advis to get suggestions on how to make the query better with indexes etc. Please see my blog post on db2advis if you are un-familar with it.

The future

I am currently looking at automating db2advis and monitoring its suggestions. Which once you are capturing the SQL becomes a lot easier.



Posted in: DB2, DB2 Administration, DB2 Built in commands, DB2 Development, DB2 Maintenance, DB2 Table Functions, IBM, IBM DB2 LUW, MON_GET_PKG_CACHE_STMT, MON_GET_TABLE_USAGE_LIST / Tagged: Create Usage List, DB2, DB2 Administration, DB2 Development, db2advis, IBM DB2 LUW, MON_GET_PKG_CACHE_STMT, MON_GET_TABLE_USAGE_LIST, Stored Procedures, Usage List, Usage List Status, V10.1

Record the size of your DB2 tables – SYSIBMADM.ADMINTABINFO

February 21, 2013 8:00 am / 2 Comments / dangerousDBA

Don’t know how your tables are growing or shrinking over time then this article should help you, and it uses built in DB2 administrative view called SYSIBMADM.ADMINTABINFO so nothing too complicated to do here; full details about SYSIBMADM.ADMINTABINFO can be found in the IBM Help Centre.

Below I will go through the DB2 objects that I have created to record this info and how you can implement this yourself.

The view using SYSIBMADM.ADMINTABINFO

So that I have something I can query during the day after I have added quantities of data or I can use it in an stored procedure to record the daily table sizes:


CREATE VIEW DB_MAIN.TABLE_SIZES AS (
    SELECT CURRENT_DATE AS STATS_DATE,
            TABNAME AS TABNAME,TABSCHEMA AS TABSCHEMA,TABTYPE AS TABTYPE,TOTAL_SIZE AS TOTAL_OBJECT_P_SIZE,DATA_SIZE AS DATA_OBJECT_P_SIZE,DICT_SIZE AS DICTIONARY_SIZE,INDEX_SIZE AS INDEX_OBJECT_P_SIZE,LOB_SIZE AS LOB_OBJECT_P_SIZE,LONG_SIZE AS LONG_OBJECT_P_SIZE,XML_SIZE AS XML_OBJECT_P_SIZE FROM table(SELECT 							
            TABNAME, 							
            TABSCHEMA, 							
            TABTYPE, 							
            DECIMAL(((data_object_p_size + index_object_p_size + long_object_p_size + lob_object_p_size + xml_object_p_size)/ 1024.0),10,3) as total_size, 							
      DECIMAL((DATA_OBJECT_P_SIZE / 1024.0),10,3) AS DATA_SIZE, 
      DECIMAL((DICTIONARY_SIZE / 1024.0),10,2) AS DICT_SIZE, 							
      DECIMAL((INDEX_OBJECT_P_SIZE / 1024.0),10,3) AS INDEX_SIZE, 
      DECIMAL((LOB_OBJECT_P_SIZE / 1024.0),10,3) AS LOB_SIZE, 							
      DECIMAL((LONG_OBJECT_P_SIZE / 1024.0),10,3) AS LONG_SIZE, DECIMAL((XML_OBJECT_P_SIZE / 1024.0),10,3) AS XML_SIZE 
    FROM SYSIBMADM.ADMINTABINFO WHERE TABSCHEMA NOT LIKE 'SYS%'							
    AND TABSCHEMA NOT LIKE 'SNAP%') as TABLESIZE
)

The view is not all the columns that are available in the view but are the ones that are the most useful for general day to day usage, there are many more here that you could use. The values are stored in Kb’s so need dividing by 1024 to get it too Mb’s. The other GOTCHA is that partitioned tables will appear as one row per partition.

Table sizes record table

Rubbish section title I know but have tried several different names. This is the meta table that will record the information from the cut down version of the view from the stored procedure below.


CREATE TABLE DB_MAIN.TABLE_SIZES_STATS  ( 
	STATS_DATE         	DATE NOT NULL,
	TABNAME            	VARCHAR(128),
	TABSCHEMA          	VARCHAR(128),
	TABTYPE            	CHARACTER(1),
	TOTAL_OBJECT_P_SIZE	DECIMAL(10,3),
	DATA_OBJECT_P_SIZE 	DECIMAL(10,3),
	DICTIONARY_SIZE    	DECIMAL(10,2),
	INDEX_OBJECT_P_SIZE	DECIMAL(10,3),
	LOB_OBJECT_P_SIZE  	DECIMAL(10,3),
	LONG_OBJECT_P_SIZE 	DECIMAL(10,3),
	XML_OBJECT_P_SIZE  	DECIMAL(10,3) 
	)
IN DB_MAIN_TS
COMPRESS YES

Please note that if you do not have the “Storage Optimisation Feature” from IBM then please do not include the line “COMPRESS YES”, otherwise if the big blue comes to do an audit you could be in trouble. The best thing to avoid this is set the licensing to hard

Stored procedure for recording table sizes using SYSIBMADM.ADMINTABINFO

This is the stored procedure that I use to stored the size of the at the time of running the SP.

CREATE PROCEDURE DB_MAIN.ADD_TABLE_SIZES_STATS   ()
LANGUAGE SQL
BEGIN
    INSERT INTO DB_MAIN.TABLE_SIZES_STATS
    SELECT *
    FROM DB_MAIN.TABLE_SIZES
    WITH UR;
END

What to do next

As stated earlier then you can use this to record the day to day table sizes, or if you are in the process of compressing your tables you can use this to record the sizes before and after. In a future article then I will be using this object created here to show how much table size has decreased in implementing adaptive compression.



Posted in: Blogging, DB2, DB2 Administration, DB2 Built in commands, DB2 built in Views, DB2 Data Types, DB2 Maintenance, DB2 Storage Optimisation, db2licm, Decimal, IBM, SYSIBMADM.ADMINTABINFO / Tagged: DB2, DB2 Administration, DB2 Development, db2licm, IBM DB2 LUW, Meta Data, SYSIBMADM.ADMINTABINFO, V10.1, V9.7

XML Shredding – Data Model and code

October 19, 2012 10:01 pm / Leave a Comment / dangerousDBA
ShreddingERD

In a previous post I wrote about shredding xml especially when you don’t know how your XML is formatted or it is always different, which where I work the XML is not XML it is a pseudo XML that conforms to an XML column.

What am I giving you

So below you will find some “light” documentation on the system that is like the system that I have created at my workplace to shred XML and store it in a way that will be easy to query. I will also go through some of the GOTCHA’s that I have seen so far in doing this. At the bottom of the page you will find a file with all the code you need.

GOTCHA’s

  1. You may find that no size of VARCHAR column is big enough to hold your XML data in the query especially when the initial stages of the recursive query therefore you may need to to use a CLOB column. I have found this is a little quicker than a very large VARCHAR column, and you will not need to create a temporary tablespace bufferpool so large to hold the query.
  2. You may have elements that are named the same thing (e.g. email, Email, EMAIL, eMAIL) and exist down the same XPATH then I have created the SHREDDED_REQUESTS_ELEMENTS_GROUP table and this will create a store for all the same named and located elements normalised around the XPATH and element name being lowered.
  3. The query code will produce entries that contain multiple values when the element is not the lowest element. So if you say had /booking/personal with /email and /name under it then you would get three rows output with /booking/personal with a value of the the values of /email and /name concatenated together, and the two additional rows of /booking/personal/email and /booking/personal/name, therefore you will need to build into the WHERE clause to exclude any paths like /booking/personal out of the result sets.

XML Shredding – ERD

XML Shredding – The Tables

There are four tables in the solution a staging table and then three others that make up a mini-flake like schema.

SHREDDED_REQUESTS_STAGE

The staging tables for the solution where the initial shredding of the XML is inserted into. This table is where the rest of the processing starts. The thing to watch out for here is the size of the XML_ELEMENT_VALUE column; here you need to make sure that you do not make this column too small. You could get rid of the date column but I find it useful if you are processing multiple days.

SHREDDED_REQUESTS

This table is where the data resides at the end of the process and resides for the rest of the data life cycle. It is essentially the same staging table but I have made a surrogate key of the the column XML_ELEMENT_NAME and XML_XPATH. This is to reduce the storage needs of the table and makes it easier to search.

SHREDDED_REQUEST_ELEMENTS

This is the store for the XML_ELEMENT_NAME and XML_XPATH once they have been made a surrogate. It also holds the key out too the grouping table.

SHREDDED_REQUESTS_ELEMENTS_GROUPS

This is the “final” table in the solution, and you may be able to leave it off because I had to create it too enable the normalisation of all the same element names in different case (e.g. EMAIL, Email, email and eMail) issues in the XML_ELEMENT_NAME and XML_XPATH.

XML Shredding – The Process

The process happens in four stages:

1.Shed the XML into the staging table

So this stage has to happen otherwise there is very little point in this process. It uses a bit of recursive SQL as out lined in a previous post and all the element values, XPaths and element names with the appropriate identifier.

To speed this up and in the code supplied in the attachment you will see that I use an EXPORT statement to unload and an LOAD to make sure that the process happens as fast as possible, with the minimum of log contention and use. It is basic maths that you are going to be inserting a lot of rows, say your site does 100,000 requests and there are 50 elements on each request that is 5,000,000 rows!

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS_STAGE(IN IN_START DATE, IN IN_END DATE)
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Takes in two dates
    --Does this using a EXPORT and LOAD so that it works quicker than insert
    --   as the actual query takes no time at all and it is only the inserting 
    --   that takes the time.
    --Uses the ADMIN_EST_INLINE_LENGTH Function from http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0054083.html 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process
            --Create the EXPORT string
            SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09 
                                WITH pathstable (REQUEST_ID,REQUEST_DATE, name, node, xpath, value) AS (
                                  SELECT HM.REQUEST_ID,
                                         HM.REQUEST_DATE,
                                         x.name AS name, 
                                         x.node AS xmlnode,
                                         ''/'' || x.name AS xpath,
                                         x.value as value
                                  FROM {SCHEMA}.{TABLE WITH XML COLUMN} HM,
                                       XMLTABLE(''$REQUEST_XML/*''
                                        COLUMNS
                                          name varchar(300) PATH ''./local-name()'',
                                          value varchar(12000) PATH ''xs:string(.)'',
                                          node    XML PATH ''.'') AS x
                                   WHERE HR.REQUEST_DATE BETWEEN ''' || IN_START || ''' AND ''' || IN_END || '''                                        
                                        AND ADMIN_EST_INLINE_LENGTH(HM.REQUEST_XML) {} -1
                                  UNION ALL
                                  SELECT REQUEST_ID,
                                         REQUEST_DATE,
                                         y.name AS name, 
                                         y.node AS xmlnode, 
                                         xpath|| ''/'' || y.name AS xpath,
                                         y.value as value
                                  FROM pathstable,
                                       XMLTABLE(''$XMLNODE/(*,@*)'' PASSING pathstable.node AS "XMLNODE"
                                        COLUMNS
                                         name varchar(300) PATH ''local-name()'',
                                         value varchar(12000) PATH ''xs:string(.)'',
                                         node    XML PATH ''.'') AS y
                                ) SELECT REQUEST_ID, REQUEST_DATE, name, xpath, value
                                  FROM pathstable
                                  WHERE NAME {} ''serialized''
                                    AND NAME {} ''_viewstate''
                                    AND NAME {} ''__shorturl_postdata'' 
                                    AND NAME {} ''API_Reply''
                                    AND NAME {} ''apiReply'' ';

            --Execute the EXPORT string
            CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

            --Create the LOAD string
            SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS_STAGE" OF DEL MODIFIED BY COLDEL0x09
                              METHOD P (1,2,3,4,5)
                              INSERT INTO {SCHEMA}.SHREDDED_REQUESTS_STAGE(
                                REQUEST_ID,
                                REQUEST_DATE,
                                XML_ELEMENT_NAME,
                                XML_XPATH,
                                XML_ELEMENT_VALUE
                              ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';

            --Execue the LOAD string
            CALL SYSPROC.ADMIN_CMD(LOADSTRING);

END

2.Gather the group elements

If you have nicely formed XML without upper / lower case and everything in-between you might not need this stage, but here I gather the element names and paths and normalise them to lower, and insert the new ones to be used in the next stage. All the normalised values are given an ID

So something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS_GROUPS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS(
        LOWER_XML_ELEMENT_NAME,
        LOWER_XML_XPATH
    )
    SELECT DISTINCT LOWER(HSRS.XML_ELEMENT_NAME) AS LOWER_XML_ELEMENT_NAME,
                    LOWER(HSRS.XML_XPATH) AS LOWER_XML_XPATH
     FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS
     WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSRE
                      WHERE LOWER(HSRS.XML_ELEMENT_NAME) = HSRE.LOWER_XML_ELEMENT_NAME
                        AND LOWER(HSRS.XML_XPATH) = HSRE.LOWER_XML_XPATH);

    COMMIT;
END

3.Gather the elements

At this stage I gather all the new elements and paths (un-normailised) and put them into the table matched against the grouped version. This gives you a table that can be used in the next stage to create the final table with a far more index friendly integer for the elements and paths.

Again an example of the SP:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUEST_ELEMENTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add based on what is not in the main elements table the new combinations
    --   from the shredded requests stage table that do not exist. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    INSERT INTO {SCHEMA}.SHREDDED_REQUEST_ELEMENTS(
        XML_ELEMENT_NAME,
        XML_XPATH,
        SHREDDED_REQUEST_ELEMENT_GROUP_ID
    )
    SELECT DISTINCT HSRS.XML_ELEMENT_NAME,
                    HSRS.XML_XPATH,
                    HSREG.SHREDDED_REQUEST_ELEMENT_GROUP_ID
    FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS_GROUPS HSREG ON LOWER(HSRS.XML_ELEMENT_NAME) = HSREG.LOWER_XML_ELEMENT_NAME
                                                                                                    AND LOWER(HSRS.XML_XPATH) = HSREG.LOWER_XML_XPATH
    WHERE NOT EXISTS (SELECT *
                      FROM {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE
                      WHERE HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                        AND HSRS.XML_XPATH = HSRE.XML_XPATH);

    COMMIT;
END
GO

4.Add to the permanent store

Use the table from the previous stage and load the data into the final table SHREDDED_REQUESTS. Loading only the surrogate key for the element and path of the XML value.

Something like:


CREATE PROCEDURE {SCHEMA}.ADD_SHREDDED_REQUESTS()
LANGUAGE SQL
BEGIN
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Add to the main table based on what is in the staging table. 
    --Everything to go over so if there is 1 day or 100 all goes!!
    --Using Export and load to get it done quicker and save on log space. 
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Declare Vars
        DECLARE EXPORTSTRING VARCHAR(3000);
        DECLARE LOADSTRING VARCHAR(3000);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Process

    --Create the EXPORT string
    SET EXPORTSTRING = 'EXPORT TO "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                        SELECT HSRS.REQUEST_ID,
                               HSRS.REQUEST_DATE,
                               HSRE.SHREDDED_REQUEST_ELEMENT_ID,
                               HSRS.XML_ELEMENT_VALUE
                        FROM {SCHEMA}.SHREDDED_REQUESTS_STAGE HSRS INNER JOIN {SCHEMA}.SHREDDED_REQUEST_ELEMENTS HSRE ON HSRS.XML_ELEMENT_NAME = HSRE.XML_ELEMENT_NAME
                                                                                                                AND HSRS.XML_XPATH = HSRE.XML_XPATH';

    --Execute the EXPORT string
    CALL SYSPROC.ADMIN_CMD(EXPORTSTRING);

    --Create the LOAD string
    SET LOADSTRING = 'LOAD FROM "/{some file system you can output too}/SHREDDED_REQUESTS" OF DEL MODIFIED BY COLDEL0x09
                      METHOD P (1,2,3,4)
                      INSERT INTO {SCHEMA}.SHREDDED_REQUESTS(
                        REQUEST_ID, 
                        REQUEST_DATE, 
                        SHREDDED_REQUEST_ELEMENT_ID, 
                        XML_ELEMENT_VALUE
                      ) COPY YES TO "/{some file system you can output too}-copy/" INDEXING MODE AUTOSELECT';
                      
    --Execue the LOAD string
    CALL SYSPROC.ADMIN_CMD(LOADSTRING);
END

XML Shredding – The file

As a reward for getting through all of the above, or for skipping it all and going for the juicy bit at the end here is the file – enjoy

GOTCHA’s
  1. I have compression on the servers at work, I have kept COMPRESS YES statement in, if you don’t have it then you will need to remove it other wise you will be in violation of your licences.
  2. You will have to change all {values} to get it to work, to your schemas etc that are relevant to your servers
  3. Please make sure you test before you use, you have been warned



Posted in: Date, DB2 Administration, DB2 Data Types, DB2 Development, Decompose XML, IBM DB2 LUW, Recursive Query, Recursive SQL, Shred XML, V10, V9.7, Vargraphic, XML, XMLTABLE / Tagged: ADMIN_EST_INLINE_LENGTH, DB2, DB2 Administration, DB2 Development, Decompose XML, EXPORT, IBM, IBM DB2 LUW, LOAD, Recursive Query, Recursive SQL, V10.1, V9.7, XML, XML PATH, XMLTABLE

DB2 10 for Linux, UNIX and Windows Bootcamp – Day 4

September 24, 2012 9:56 pm / Leave a Comment / dangerousDBA

Last day on the course, realisation of leaving the most excellent Hursley campus that reminded me a lot where I went to University at Keele, and that the migration from DB2 V9.7 to V10 on Tuesday. As promised on Thursday here is the late post for the final day of the course. A lot was covered on the last day so a quick round up

DB2 10 for Linux, UNIX and Windows Bootcamp – Day 4 – What have we done – my prospective

Enhancing Analytic’s with DB2

So the new tools that fall under this section were my main interest in this section, but that’s not to belittle the other work that IBM have done in this area with the improved index scanning and query performance.

You can now create JAVA based table functions that could be called to return the content of a flat file into the FROM part of your query. So this means that you could at the most basic level have something like :


SELECT *
FROM TABFUNC.GET_DATA_CSV('Some File Name') AS (Col1 Integer, Col2, Varchar(20))

It would then return you a table that can be queried and used in a sub-query etc

Continuous Data Ingest for Large Environments

The new INGEST utility, now there was a little bit of confusion over the usage according to the instructor, needing AESE edition of DB2, but the material supplied on the course says it is no additional cost! The hands on lab showed it off brilliantly, with the light ETL ability of the tool, I cant wait to start to use it.

DB2 pureScale

So this is touted as being better than HADR as it is nearly impossible to take offline unless major site failure and now to get round that IBM have developed the product offering to be geographically dispersed with examples that they can give on request of getting it too work over 30 to 40 km. This means that your OLTP processing system can have more 9’s added to the end of your 99%.

Workload Management in DB2

IBM have introduced a new way too regulate your users, along with all the old features there is now a way to use your storage groups to help this out. With a storage group is the new concept of a DATA TAG. This DATA TAG you can give your storage groups a relative higher or lower priority. Therefore the SSD storage groups (this quarters data) could have a DATA TAG of 1 and a SATA drive (this years data) could have a DATA TAG of 5 and a “really” old PATA drive (past five years data) to a TAG of 10. This then means you could have two queries, one that touches the most recent data with the DATA TAG of 1 being set up in Workload manager to run a lot quicker, than the other query which compared this month to a month five years ago, would get limited and “retarded” as it has hit the DATA TAG of 10, which is relatively lower than 1.

Data Concurrency & DB2 pureXML Overview

There has not been a lot change here so I have grouped these two topics together as I cant think of anything to mention here, sorry IBM.

Bootcamp Overview and Exam results

This was very worth going thank you to Mark for presenting and proctoring the exams. It gave a great overview of the new features, and hands on experience of features that I do not get to use at work. I am looking forward to moving tomorrow on to V10 and testing out new features like adaptive compression. I also passed the “DB2 V10 fundamentals” exam, so maybe at some point I will get round to doing the DBA exam.



Posted in: Bootcamp, Continuous Data Ingest for Large Environments, Data Concurrency, DB2 Administration, DB2 Built in commands, DB2 built in functions, DB2 Built-in Stored Procedures, DB2 Data Types, DB2 Development, DB2 pureScale, DB2 pureXML Overview, Enhancing Analytic's with DB2, IBM DB2 LUW, V10, V9.7, Workload Management in DB2 / Tagged: Bootcamp, DB2, DB2 Administration, DB2 Development, IBM DB2 LUW, V10.1, V9.7

Post Navigation

← Older Posts
 

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 757 other subscribers

Recent Posts

  • Self generating Simple SQL procedures – MySQL
  • Google Cloud Management – My Idea – My White Whale?
  • Position Tracker – The Stub – Pandas:
  • Position Tracker – The Stub
  • Position Tracker – In the beginning
  • Whats been going on in the world of the Dangerous DBA:
  • QCon London Day 1
  • Testing Amazon Redshift: Distribution keys and styles
  • Back to dangerous blogging
  • DB2 10.1 LUW Certification 611 notes 1 : Physical Design

Dangerous Topics

added functionality ADMIN_EST_INLINE_LENGTH Bootcamp colum convert data types DB2 db2 DB2 Administration DB2 Development db2advis db2licm Decompose XML EXPORT GCP Google IBM IBM DB2 LUW idug information centre infosphere IOT LOAD merry christmas and a happy new year Position Tracking python Recursive Query Recursive SQL Reorganisation Reorganise Reorganise Indexes Reorganise Tables Runstats sql statement Stored Procedures SYSPROC.ADMIN_CMD Time UDF User Defined Functions V9.7 V10.1 Varchar XML XML PATH XMLTABLE

DangerousDBA Links

  • DB2 for WebSphere Commerce
  • My Personal Blog

Disclaimer:

The posts here represent my personal views and not those of my employer. Any technical advice or instructions are based on my own personal knowledge and experience, and should only be followed by an expert after a careful analysis. Please test any actions before performing them in a critical or nonrecoverable environment. Any actions taken based on my experiences should be done with extreme caution. I am not responsible for any adverse results. DB2 is a trademark of IBM. I am not an employee or representative of IBM.

Advertising

© Copyright 2023 - Dangerous DBA
Infinity Theme by DesignCoral / WordPress