As part of my career development it has been on the cards for me to do some certification at my current company for about four and a half years, and now they are giving me some time for it again. As you know from a previous post then I have being trying to revise for this using something that IBM published just after DB2 10.1 LUW came out and it is very verbose and boring (sorry). It seems more like a technical manual of all the ins and outs of the application than revision guide. There are now some better resources produced and can be found here. This blog post is going to cover some of the things I learn’t from Part 2: Physical Design which can be found here, please take the time to read the doc for yourself it is full of good stuff some of which I think I will be trying out in future.
Physical Design – Exam notes
WITH CHECK OPTION
This seems to be a crafty little three word clause that can add a little security to your views, but it seems at the cost of having more of a maintenance overhead for an application. The example in the document says “WHERE dept = 10“, and goes on to say how if you tried the insert for any other dept no it would not work. This is good as it would stop insertion for dept 11, but would mean that you would need one of these views for each “dept” that you needed.
I have come across these before still not really sure what they are useful for, and there is not a good explanation of them in this doc. If they are ENFORCED then how is this different from a check constraint? and if it is ENABLED QUERY OPTIMIZATION then why would you want to make it difficult to find the wrong data rows?
Storage Groups / Multi Temperature data
This is a new feature to DB2 LUW 10.1 and I have found it hard so far to get a clear definition of how the original paths you supply a create database for its data paths and automated storage relate too multi temperature data storage groups. This doc does not make it as easy as I think I have now got it but I will give it a go in a series of bullet points (how I understand it):
- The data paths that you define when you create a database go to create the default storage group : IBMSTOGROUP.
- You can then create additional storage groups using the CREATE STOGROUP command for hot etc data
- The three basic tables spaces that are created at database creation (USERSPACE1, TEMPSPACE1 and SYSCATSPACE1) are created in the default IBMSTOGROUP, that is on the paths that you specified to begin with.
- Tablespaces are then created USING the newly created storage groups and rather than the data being stored in the default data path it is stored where specified.
- If you are lucky enough to have workload manager then Storage groups can be given tags to optimize the performance of this – bonus.
- As I understand it though if you only specify one path in you storage group then you data will not be striped as only one container will be created
- You then create your partitioned tables (or tables) in these tablespaces appropriately. So the hot part of your partitioned table will go in the hot STOGROUP attached tablespace.
So hopefully the above makes this a little clearer, it does for me. We are currently looking at completely re architecturing our data-pile at the moment and a lot of the design decisions that this offers will be taken into consideration.
Range Clustered Tables
This seems like a most useful feature for star schema data warehouses where there are surrogate or business keys that are generally some form of whole number (SMALLINT, INTEGER or BIGINT). There does seem to be two massive GOTCHA’s in the text though (taken from the text):
- DB2 pre-allocates the disk space required for the RCT at creation time. This is done by calculating the number of distinct key values and multiplying it by the table row size. This mandates that the space required for holding the entire table should be available at creation time
- you cannot issue an ALTER TABLE statement to alter the physical characteristics of an RCT after its creation
Now I can see this being an issue if you have the full range of a BIGINT * the size of the table fields for each row allocated and used at time of creation, but you can limit this by using the “starting from” and “ending” clauses to the key sequences.
Range clustered tables also have another useful clause of “DISALLOW OVERFLOWS”, this apparently means that reorganization operations are not required, but you will not be allowed to insert any rows that exist outside the limits that you have set, so could be another way to stop GIGO.
This seems a most useful stored procedure for moving tables from say DEV to PRODUCTION or if you find that your creation script has accidentally left the wrong table space in the statement and your table needs moving out. Tables can be moved online but this takes more resource in terms of processor and disk space. Most interesting in terms of keeping your data online and generating a new compression dictionary (and therefore also a reorg?). The full command can be found here
These are new and will defiantly feature when we re-architecture the data dump that we have currently. I have had to code logic to capture the versions of the subject and then other objects to find the latest version efficiently across multi million row data sets. The new temporal tables will take care of this for you, and still allow you to access the old data in a nearly standard query. The other bonus is that these tables can also be partitioned, but you must make sure the history table can cope with the rows that will be inserted into it.
Well these are my high level take away’s on the extra stuff that seems to be on the paper compared too the V9.7 exam (that I never got round to sitting). There seems to be a lot less detail in this paper compared to the one that has taken me so long to read as it contain so much detail, which leaves me to wonder is the exam a middle ground of the two? Would highly suggest reading the serise and I will be doing more posts in the mean time on the other papers in the set, found at the link at the top of the article.