Paul's OE DBA Blog

I have seen a lot of strange and interesting things over the past 20 years as a Progress OpenEdge DBA. Some of it is funny and some is just...bizarre. I'll try and share some of what I learn right here on my blog.

Dump and Load Strategies Part I: Type II Storage Areas

Dump and Load Strategies Part I: Type II Storage Areas

Congratulations! You finally decided to join the modern world and take advantage of Type II Storage Areas (SAII). Hey - better late than never. There's a lot of old, out-of-date or just-plain-wrong information out there, so ignore all that stuff. I have conveniently digested some useful, pertinent and correct information right here.

Basic stuff

  • You need OpenEdge 10
  • DB Block Size: Pretty self-explanatory. Typically 8Kb on UNIX and 4Kb on Windows and Linux
  • BPC = Blocks Per Cluster: The number of database blocks formatted each time an object requires more space
    • Valid values are 8, 64, 512
  • RPB = Records per Block: How many ROWIDs should be reserved for each database block
    • Valid values are 1, 2, 4, 8, 16, 32, 64, 128 or 256
  • Keep data, indexes and LOBs segregated 
  • Storage area are typically defined in a file called dbname.st

 

Controversial Assumptions 

A few seasoned DBAs will disagree with the following assumptions. These DBAs are wrong except in very specific cases (0.1% of you) and I can and have proven it mathematically.

  1. Use large files. On modern systems, there is no performance impact related to 1Gb or 10Gb or 100Gb database extents
  2. Use variable length extents for everything (AI, BI, D). If you disagree, show me the math in the comments section. Maybe for AI extents you can come up with something intelligent but even that won't affect the business or the users 99% of the time. I.e. the 2% performance "hit" is 2ms to the user and 2s to a batch process. Insignificant!
  3. Put everything (AI, BI, D) on the same filesystem and disks unless you can prove why they should be segregated
    1. Most of the time it's all going to the same SAN [cache] anyways
    2. Most of you wouldn't be able to recover that last 3 minutes of AI notes. This is the reason most people give to justify segregating AI files on separate disks.
  4. If you have a small physical server with a handful of disk (say less than a dozen or so), stop trying to be a RAID hero by carving out two disks for the O.S., two for AI, two for BI and 6 for the DB. What a waste of IOPS. OK - as you get to 8-10 I'll accept a mirrored pair for the O.S. But that's it. Or just get SSD drives.
  5. I don't talk about the Alternate Buffer Pool (-B2) in this blog post. That will change some recommendations and is another blog post for another day.
  6. I don't talk about fragmentation and CREATE/TOSS limits here. Again - another blog post for another day.

In reality, I am giving up a microscopic amount of supposed performance for a huge decrease in headaches and complexity. I think that's a spectacular ROI (return on investment).

 

The Structure File

The QA department at Progress was on vacation when they came up with the structure file format for data extents:

d "GL_History Data":22,32;64 .

Yes, that's the area type (d = data), then the "AREA NAME" in quotes, followed by a colon, then the area number, then a comma, then the RPB, then a semi-colon, then the BPC, then a space and the directory (where "." means "current working directory"). After the directory you can optionally add an "f" for "fixed" and then a file size, but you won't be doing that will you?

The area number must be 7 or higher and unique for each storage area. Easy. But what about RPB and BPC? To answer that, I need a DB Analysis. Let's use the sports2000 DB as an example:

  • Table                                    Records    Size   Min   Max  Mean                Count Factor  Factor
  • PUB.POLine                               5337  217.5K    40    44    41                 5337    1.0     1.0

The mean record size is 41 bytes so in a 4Kb block, I can fit about 100 of these records. Notice I said "about": there is a block header that consumes space but it's existence probably won't change our decisions 99.9% of the time. I want to pick an RPB that's close to 100 so my options are 64 and 128. If I pick 64, then most of the time I'll only use 64 * 41 = 2624 bytes, leaving about 1400 bytes not used. If I pick 128, then I'll fill the block with my +/- 100 records but I'll "waste" 25-30 ROWIDs. The 128 ROWIDs are reserved for each block so if you don't use them, well, they just don't get used. With 64 bit ROWIDs, the maximum number of records per storage areas is in the trillions so most of you probably don't have to worry about running out.  

But Paul, what about an index area? The answer is 64. RPB actually has no meaning for index objects BUT if you follow the often-quoted suggestion of RPB=1 and you accidentally create a table in that area then it's going to get really hilarious really fast. 100K POLine records @ 1 RPB with 4Kb block size = 400 Mg instead of 4Mg! People stop laughing when it becomes 10M records and 40 Gb.

What about BPC? Also easy: use 512 or 8.

Any tiny object (say less than 5Mg total size) goes in a storage area with BPC=8. Everything else goes in a storage area with BPC=512. Will you have a lot of empty blocks in your DB? Yes. Will your DB be much larger than it was before? Probably. Do I care (or should you)? No. Just make sure to use "-com" on your probkup command.

What Goes Where?

I already mentioned that you should segregate data, indexes and LOBs. Let's take this a step further:

  1. Every data area should have it's corresponding index area
  2. Really big and/or busy tables (relative to your DB) should probably have their own SAII
  3. Really small tables should go in a "Misc Data" SAII
  4. Everything in between can go in one or two (or more) data areas. There really isn't a hard and fast rule.

Take a typical ERP database that's 100 Gb in size and has 750 tables. GL History and a couple of other tables will be in the multiple Gb each. Each of those gets it's own SAII with appropriate RPB and BPC=512. 700/750 tables will be empty or only have a handful of records: put all of these in "Misc Data" with RPB=128 and BPC=8. The other 45 tables can all go in one or two "Data Areas" with RPB=128 and BPC=512.

 

Wait! What? This is IN-SANE!

Relax. If you're doing it a different way it's probably not wrong. The most important things are to segregate data/index/lob and to actually use SAII's. Unless you screwed up majorly, you'll get 90% of the bang if you just did those two things.

 

 

Paul

Download ProTop: The #1 Free OpenEdge DB Monitoring Tool: http://dbappraise.com/protop.html

Dump and Load Strategies Part II: Dumping Data
Is Progress Going to Audit Your Licences? YES!
 

Comments 4

Guest - Rick on Wednesday, 06 May 2015 16:49

Hi Paul,

Thanks for this interesting piece of information on SAII. I've been doing quite a few migrations from I to II for our customers. Usually databases are rather *small* (10 Gb max). I've been separating data, index en LOB, but I'm not sure about separation large/active tables.

In your post you state that *really* large or active tables should have their own area. I useally make a point of using analys to group tables together based on ideal RPB - nothing else. Why is is probably better to get large/active tables into their own SAII? Any pointers?

Thanks,
Rick

Hi Paul, Thanks for this interesting piece of information on SAII. I've been doing quite a few migrations from I to II for our customers. Usually databases are rather *small* (10 Gb max). I've been separating data, index en LOB, but I'm not sure about separation large/active tables. In your post you state that *really* large or active tables should have their own area. I useally make a point of using analys to group tables together based on ideal RPB - nothing else. Why is is probably better to get large/active tables into their own SAII? Any pointers? Thanks, Rick
Paul Koufalis on Wednesday, 06 May 2015 20:02

For such small databases (10 Gb), I wouldn't even bother with any of that. I would just create two SAII for data and two SAII for indexes:

Data Area - 128 RPB - 512 BPC and Index Area - 64 RPB - 64 BPC

These two would contain all the data and indexes of tables that have about a Mg or more of data. Yes you will waste some space with empty blocks within clusters but who cares? It's 10 Gb and if you backup with -com you won't backup those blocks.

Misc Data - 128 RPB - 8 BPC and Misc Index - 64 RPB and 8 BPC

Any table that has less than a Mg of data I would stick in here.

Regarding your method of grouping around ideal RPB: it's not any better or worse than any other idea. I used to do it that way but I stopped because it ended up being an exercise in futility. Really unless you have a HUGE table that needs 256 RPB, you can probably get away with 128 RPB for most areas.

Regarding your question "Why is it probably better to get large/active tables into their own SAII?" Some database maintenance activities are better/faster/easier. Truncate area is one that comes to mind right away: when purging historical data, it's often faster to dump what you want to keep, truncate the area then reload the data. Index rebuild is another: you can truncate the index area so that the scan doesn't pass it.

For such small databases (10 Gb), I wouldn't even bother with any of that. I would just create two SAII for data and two SAII for indexes: [u]Data Area - 128 RPB - 512 BPC and Index Area - 64 RPB - 64 BPC[/u] These two would contain all the data and indexes of tables that have about a Mg or more of data. Yes you will waste some space with empty blocks within clusters but who cares? It's 10 Gb and if you backup with -com you won't backup those blocks. [u]Misc Data - 128 RPB - 8 BPC and Misc Index - 64 RPB and 8 BPC[/u] Any table that has less than a Mg of data I would stick in here. Regarding your method of grouping around ideal RPB: it's not any better or worse than any other idea. I used to do it that way but I stopped because it ended up being an exercise in futility. Really unless you have a HUGE table that needs 256 RPB, you can probably get away with 128 RPB for most areas. Regarding your question "Why is it probably better to get large/active tables into their own SAII?" Some database maintenance activities are better/faster/easier. Truncate area is one that comes to mind right away: when purging historical data, it's often faster to dump what you want to keep, truncate the area then reload the data. Index rebuild is another: you can truncate the index area so that the scan doesn't pass it.
Guest - RichardS on Friday, 08 May 2015 11:07

In this statement there is a typo:
But Paul, what about an index area? The answer is 64. RPB actually has no meaning for index objects BUT if you follow the often-quoted suggestion of BPC=1 and you accidentally create a table in that area then it's going to get really hilarious really fast. 100K POLine records @ 1 RPB with 4Kb block size = 400 Mg instead of 4Mg! People stop laughing when it becomes 10M records and 40 Gb.

BPC=1 in the first line should be RPB=1.

In this statement there is a typo: But Paul, what about an index area? The answer is 64. RPB actually has no meaning for index objects BUT if you follow the often-quoted suggestion of BPC=1 and you accidentally create a table in that area then it's going to get really hilarious really fast. 100K POLine records @ 1 RPB with 4Kb block size = 400 Mg instead of 4Mg! People stop laughing when it becomes 10M records and 40 Gb. BPC=1 in the first line should be RPB=1.
Paul Koufalis on Friday, 08 May 2015 13:38

You are correct. I will fix.

You are correct. I will fix.
Already Registered? Login Here
Guest
Saturday, 21 April 2018

Captcha Image