Dump and Load Strategies Part I: Type II Storage Areas

Congratulations! You finally decided to join the modern world and take advantage of Type II Storage Areas (SAII). Hey – better late than never. There’s a lot of old, out-of-date or just-plain-wrong information out there, so ignore all that stuff. I have conveniently digested some useful, pertinent and correct information right here.

Basic stuff

    • You need OpenEdge 10
    • DB Block Size: Pretty self-explanatory. Typically 8Kb on UNIX and 4Kb on Windows and Linux
    • BPC = Blocks Per Cluster: The number of database blocks formatted each time an object requires more space
      • Valid values are 8, 64, 512
    • RPB = Records per Block: How many ROWIDs should be reserved for each database block
      • Valid values are 1, 2, 4, 8, 16, 32, 64, 128 or 256
    • Keep data, indexes and LOBs segregated
    • Storage area are typically defined in a file called dbname.st

Controversial Assumptions

A few seasoned DBAs will disagree with the following assumptions. These DBAs are wrong except in very specific cases (0.1% of you) and I can and have proven it mathematically.

  1. Use large files. On modern systems, there is no performance impact related to 1Gb or 10Gb or 100Gb database extents
  2. Use variable length extents for everything (AI, BI, D). If you disagree, show me the math in the comments section. Maybe for AI extents you can come up with something intelligent but even that won’t affect the business or the users 99% of the time. I.e. the 2% performance “hit” is 2ms to the user and 2s to a batch process. Insignificant!
  3. Put everything (AI, BI, D) on the same filesystem and disks unless you can prove why they should be segregated
    1. Most of the time it’s all going to the same SAN [cache] anyways
    2. Most of you wouldn’t be able to recover that last 3 minutes of AI notes. This is the reason most people give to justify segregating AI files on separate disks.
  4. If you have a small physical server with a handful of disk (say less than a dozen or so), stop trying to be a RAID hero by carving out two disks for the O.S., two for AI, two for BI and 6 for the DB. What a waste of IOPS. OK – as you get to 8-10 I’ll accept a mirrored pair for the O.S. But that’s it. Or just get SSD drives.
  5. I don’t talk about the Alternate Buffer Pool (-B2) in this blog post. That will change some recommendations and is another blog post for another day.
  6. I don’t talk about fragmentation and CREATE/TOSS limits here. Again – another blog post for another day.

In reality, I am giving up a microscopic amount of supposed performance for a huge decrease in headaches and complexity. I think that’s a spectacular ROI (return on investment).

The Structure File

The QA department at Progress was on vacation when they came up with the structure file format for data extents:

d “GL_History Data”:22,32;64 .

Yes, that’s the area type (d = data), then the “AREA NAME” in quotes, followed by a colon, then the area number, then a comma, then the RPB, then a semi-colon, then the BPC, then a space and the directory (where “.” means “current working directory”). After the directory you can optionally add an “f” for “fixed” and then a file size, but you won’t be doing that will you?

The area number must be 7 or higher and unique for each storage area. Easy. But what about RPB and BPC? To answer that, I need a DB Analysis. Let’s use the sports2000 DB as an example:

  • Table                                    Records    Size   Min   Max  Mean                Count Factor  Factor
  • PUB.POLine                               5337  217.5K    40    44    41                 5337    1.0     1.0

The mean record size is 41 bytes so in a 4Kb block, I can fit about 100 of these records. Notice I said “about”: there is a block header that consumes space but it’s existence probably won’t change our decisions 99.9% of the time. I want to pick an RPB that’s close to 100 so my options are 64 and 128. If I pick 64, then most of the time I’ll only use 64 * 41 = 2624 bytes, leaving about 1400 bytes not used. If I pick 128, then I’ll fill the block with my +/- 100 records but I’ll “waste” 25-30 ROWIDs. The 128 ROWIDs are reserved for each block so if you don’t use them, well, they just don’t get used. With 64 bit ROWIDs, the maximum number of records per storage areas is in the trillions so most of you probably don’t have to worry about running out.

But Paul, what about an index area? The answer is 64. RPB actually has no meaning for index objects BUT if you follow the often-quoted suggestion of RPB=1 and you accidentally create a table in that area then it’s going to get really hilarious really fast. 100K POLine records @ 1 RPB with 4Kb block size = 400 Mg instead of 4Mg! People stop laughing when it becomes 10M records and 40 Gb.

What about BPC? Also easy: use 512 or 8.

Any tiny object (say less than 5Mg total size) goes in a storage area with BPC=8. Everything else goes in a storage area with BPC=512. Will you have a lot of empty blocks in your DB? Yes. Will your DB be much larger than it was before? Probably. Do I care (or should you)? No. Just make sure to use “-com” on your probkup command.

What Goes Where?

I already mentioned that you should segregate data, indexes and LOBs. Let’s take this a step further:

  1. Every data area should have it’s corresponding index area
  2. Really big and/or busy tables (relative to your DB) should probably have their own SAII
  3. Really small tables should go in a “Misc Data” SAII
  4. Everything in between can go in one or two (or more) data areas. There really isn’t a hard and fast rule.

Take a typical ERP database that’s 100 Gb in size and has 750 tables. GL History and a couple of other tables will be in the multiple Gb each. Each of those gets it’s own SAII with appropriate RPB and BPC=512. 700/750 tables will be empty or only have a handful of records: put all of these in “Misc Data” with RPB=128 and BPC=8. The other 45 tables can all go in one or two “Data Areas” with RPB=128 and BPC=512.

Wait! What? This is IN-SANE!

Relax. If you’re doing it a different way it’s probably not wrong. The most important things are to segregate data/index/lob and to actually use SAII’s. Unless you screwed up majorly, you’ll get 90% of the bang if you just did those two things.

Paul
White Star Software

No Comments

Sorry, the comment form is closed at this time.