PostgreSQL Large Objects and space usage (part 2)

In my previous post I showed how large objects use space inside the table pg_largeobject when inserted.

Let’s see something more:

The table had 2 large objects (for a total of 1024 records):

Let’s try to add another random-padded file:

As expected, because a random sequence of characters cannot be compressed, the size increased again by 171 blocks (see my previous post for the explanation)

If you read this nice series of blog posts by Frits Hoogland, you should know about the pageinspect extension and the t_infomask 16-bit mask.

Let’s install it and check the content of the pg_largeobjects pages:

We already know the mathematics, but we love having all the pieces come together 🙂

We know that: The page header is 24 bytes, and that the line pointers use 4 bytes for each tuple.

The first 4 pages have the lower offset to 452 bytes means that we have (452-24)/4 = 107 tuples.

The 5th page (page number 4) has the lower to 360: (360-24)/4=84 tuples.

The remaining pages have the lower to 36: (36-24)/4 = 3 tuples.

Let’s check if we are right:

🙂
Now, let’s delete the 1Mb file and check the space again:

The space is still used and the tuples are still there.

However, we can check that the tuples are no longer used by checking the validity of their t_xmax. In fact, according to the documentation, if the XMAX is invalid the row is at the latest version:

[…] a tuple is the latest version of its row iff XMAX is invalid or t_ctid points to itself (in which case, if XMAX is valid, the tuple is either locked or deleted). […]
 (from htup_details.h lines 87-89).
We have to check the infomask against the 12th bit (2048, or 0x0800)
#define HEAP_XMAX_INVALID       0x0800  /* t_xmax invalid/aborted */

Here we go. The large objects are split in compressed chunks that internally behave the same way as regular rows!

If we import another lob we will see that the space is not reused:

Flagging the tuples as reusable is the vacuum’s job:

The normal vacuum does not release the empty space, but it can be reused now:

If we unlink the lob again and we do a vacuum full, the empty space is released:

PostgreSQL Large Objects and space usage (part 1)

PostgreSQL uses a nice, non standard mechanism for big columns called TOAST (hopefully will blog about it in the future) that can be compared to extended data types in Oracle (TOAST rows by the way can be much bigger). But traditional large objects exist and are still used by many customers.

If you are new to large objects in PostgreSQL, read here. For TOAST, read here.

Inside the application tables, the columns for large objects are defined as OIDs that point to data chunks inside the pg_largeobject table.

pg_lo

Because the large objects are created independently from the table columns that reference to it, when you delete a row from the table that points to the large object, the large object itself is not deleted.

Moreover, pg_largeobject stores by design all the large objects that exist in the database.

This makes housekeeping and maintenance of this table crucial for the database administration. (we will see it in a next post)

How is space organized for large objects?

We will see it by examples. Let’s start with an empty database with empty pg_largeobject:

Just one block. Let’s see its file on disk:

First evidence: the file is empty, meaning that the first block is not created physically until there’s some data in the table (like deferred segment creation in Oracle, except that the file exists).

Now, let’s create two files big 1MB for our tests, one zero-padded and another random-padded:

Let’s import the zero-padded one:

The large objects are split in chunks big 2048 bytes each one, hence we have 512 pieces. What about the physical size?

Just 40k! This means that the chunks are compressed (like the TOAST pages). PostgreSQL uses the pglz_compress function, its algorithm is well explained in the source code src/common/pg_lzcompress.c.

What happens when we insert the random-padded file?

The segment increased of much more than 1Mb! precisely, 1441792-40960 = 1400832 bytes. Why?

The large object is splitted again in 512 data chinks big 2048 bytes each, and again, PostgreSQL tries to compress them. But because a random string cannot be compressed, the pieces are still (average) 2048 bytes big.

Now, a database block size is 8192 bytes. If we subtract the size of the bloch header, there is not enough space for 4 chunks of 2048 bytes. Every block will contain just 3 non-compressed chunks.

So, 512 chunks will be distributed over 171 blocks (CEIL(512/3.0)), that gives:

1400832 bytes!

Depending on the compression rate that we can apply to our large objects, we might expect much more or much less space used inside the pg_largeobject table.

Which Oracle Databases use most CPU on my server?

Assumptions

  • You have many (hundreds) of instances and more than a couple of servers
  • One of your servers have high CPU Load
  • You have Enterprise Manager 12c but the Database Load does not filter by server
  • You want to have an historical representation of the user CPU utilization, per instance

Getting the data from the EM Repository

With the following query, connected to the SYSMAN schema of your EM repository, you can get the hourly max() and/or avg() of user CPU by instance and time.

Suppose you select just the max value: the result will be similar to this:

 

Putting it into excel

There are one million ways to do something more reusable than excel (like rrdtool scripts, gnuplot, R, name it), but Excel is just right for most people out there (including me when I feel lazy).

  • Configure an Oracle Client and add the ODBC data source to the EM repository:

odbc_emrep

  • Open Excel, go to “Data” – “Connections” and add a new connection:
    • Search…
    • New Source
    • DSN ODBC
  • Select your new ODBC data source, user, password
  • Uncheck “Connection to a specific table”
  • Give a name and click Finish
  • On the DSN -> Properties -> Definition, enter the SQL text I have provided previously

connection_properties_odbc_excel

The result should be something similar: ( but much longer :-))

first_step_excelPivoting the results

Create e new sheet and name it “pivot”, Click on “Create Pivot Table”, select your data and your dimensions:

pivotThe result:

pivotedCreating the Graph

Now that the data is correctly formatted, it’s easyy to add a graph:

just select the entire pivot table and create a new stacked area graph.

The result will be similar to this:

graph_cpu_load_excel

With such graph, it is easy to spot which databases consumed most CPU on the system in a defined period, and to track the progress if you start a “performance campaign”.

For example, you can see that the “green” and “red” databases were consuming constantly some CPU up to 17.05.2017 and then some magic solved the CPU problem for those instances.

It is also quite convenient for checking the results of new instance caging settings…

The resulting CPU will not necessarily be 100%: the SYS CPU time is not included, as well as the user CPU of all the other processes that are either not DB or not monitored with Enterprise Manager.

HTH

Ludovico

RMAN Catalog Housekeeping: how to purge the old incarnations

First, let me apologize because every post in my blog starts with a disclaimer… but sometimes it is really necessary. 😉

Disclaimer: this blog post contains PL/SQL code that deletes incarnations from your RMAN recovery catalog. Please DON’T use it unless you deeply understand what you are doing, as it can compromise your backup and recovery strategy.

Small introduction

You may have a central RMAN catalog that stores all the backup metadata for your databases. If it is the case, you will have a database entry for each of your databases and a new incarnation entry for each duplicate, incomplete recovery or  flashback (or whatever).

You should also have a delete strategy that deletes the obsolete backups from either your DISK or SBT_TAPE media. If you have old incarnations, however, after some time you will notice that their information never goes away from your catalog, and you may end up soon or later to do some housekeeping. But there is nothing more tedious than checking and deleting the incarnations one by one, especially if you have average big numbers like this catalog:

Where db, dbinc, bdf and brl contain reslectively the registered databases, incarnations, datafile backups and archivelog backups.

Different incarnations?

Consider the following query:

You can run it safely: it returns the list of incarnations hierarchically connected to their parent, by database name, key and level.

Then you have several types of behaviors:

  • Normal databases (created once, never restored or flashed back) will have just one or two incarnations (it depends on how they are created):

They are usually the ones that you may want to keep in your catalog, unless the database no longer exist: in this case perhaps you omitted the deletion from the catalog when you have dropped your database?

  • Flashed back databases (flashed back multiple times) will have as many incarnations as the number of flashbacks, but all connected with the incarnation prior to the flashback:

Here, despite you have several incarnations, they all belong to the same database (same DB_KEY and DBID), then you must also keep it inside the recovery catalog.

  • Non-production databases that are frequently refreshed from the production database (via duplicate) will have several incarnations with different DBIDs and DB_KEY:

This is usually the most frequent case: here you want to delete the old incarnations, but only as far as there are no backups attached to them that are still in the recovery window.

  • You may also have orphaned incarnations:

In this case, again, it depends whether the DBID and DB_KEY are the same as the current incarnation or not.

What do you need to delete?

Basically:

  • Incarnations of databases that no longer exist
  • Incarnations of existing databases where the database has a more recent current incarnation, only if there are no backups still in the retention window

How to do it?

In order to be sure 100% that you can delete an incarnation, you have to verify that there are no recent backups (for instance, no backups more rercent than the current recovery window for that database). If the database does not have a specified recovery window but rather a default “CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default”, it is a bit more problematic… in this case let’s assume that we consider “old” an incarnation that does not backup since 1 year (365 days), ok?

Getting the last backup of each database

Sadly, there is not a single table where you can verify that. You have to collect the information from several tables. I think bdf, al, cdf, bs would suffice in most cases.

When you delete an incarnation you specify a db_key: you have to get the last backup for each db_key, with queries like this:

Putting together all the tables:

Getting the  recovery window

The configuration information for each database is stored inside the conf table, but the retention information is stored in a VARCHAR2, either ‘TO RECOVERY WINDOW OF % DAYS’ or ‘TO REDUNDANCY %’

You need to convert it to a number when the retention policy is recovery windows, otherwise you default it to 365 days wher the redundancy is used. You can add a column and a join to the query:

and eventually, either display if it the incarnation is no more used or filter by usage:

Delete the incarnations!

You can delete the incarnations with this procedure:

This procedure will raise an exception (-20001, ‘Database not found’) when a database does not exist anymore (either already deleted by this procedure or by another session), so you need to handle it.

Putting all together:

I have used this procedure today for the first time and it worked like a charm.

However, if you have any adjustment or suggestion, don’t hesitate to comment it 🙂

HTH

Trivadis sessions at UKOUG Tech16

UKOUG Tech16 will start in less than a week. Trivadis will be there with many speakers, 10 sessions in total 🙂
If you are a delegate, come along and have a chat with us!

Super Sunday

Monday 05/12

Tuesday 06/12

Wednesday 07/12

See you there 🙂

DBMS_QOPATCH, datapatch, rollback, apply force

I am working for a customer on a quite big implementation of Cold Failover Cluster with Oracle Grid Infrastructure on Linux. I hope to have some material to publish soon about it! However, in this post I will be talking about patching the database in a cold-failover environment.

DISCLAIMER: I use massively scripts provided in this great blog post by Simon Pane:

https://www.pythian.com/blog/oracle-database-12c-patching-dbms_qopatch-opatch_xml_inv-and-datapatch/

Thank you Simon for sharing this 🙂

Intro

We are not yet in the process of doing out-of-place patching; at the moment the customer prefers to do in-place patching:

  • evacuate a node by relocating all the databases on other nodes
  • patching the node binaries
  • move back the databases and patch them with datapatch
  • do the same for the remaining nodes

I beg to disagree with this method, being a fan of having many patched golden copies distributed on all servers and patching the databases by just changing the ORACLE_HOME and running datapatch (like Rapid Home Provisioning does). But, this is the situation today, and we have to live with it.

Initial situation

  • Server 1, 2 and 3: one-off 20139391 applied
  • New database created

cfc_qopatch1When the DBCA creates a new database, in 12.1.0.2, it does not run datapatch by default, thus, the database does not have any patches installed.

However, this specific one-off patch does not modify anything in the database (sql_patch=false)

and the datapatch runs without touching the db:

Next step: I evacuate the server 2 and patch it, then I relocate my database on it

cfc_qopatch2

Now the database is not at the same level of the binaries and need to be patched:

The column CONSTITUENT is important here because it tells us what the parent patch_id is. This is the column that we have to check when we want to know if the patch has been applied on the database.

Now the patch is visible inside the dba_registry_sqlpatch:

Notice that the child patches are not listed in thie view.

Rolling back

Now, one node is patched, but the others are not. What happen if I relocate the patched database to a non-patched node?

cfc_qopatch3

The patch is applied inside the database but not in the binaries!

If I run datapatch again, the patch is rolled back:

The patch has been rolled back according to the datapatch, and the action is shown in the dba_registry_sqlpatch:

But if I look at the logfile, the patch had some errors:

Indeed, the patch looks still there:

If I try to run it again, it does nothing/it fails saying the patch is not there:

What does it say on the patched node?

Whaaat? datapatch there says that the patch IS in the registry and there’s nothing to do. Let’s try to force its apply again:

Conclusion

I’m not sure whether it is safe to run the patched database in a non-patched Oracle Home. I guess it is time for a new SR 🙂

Meanwhile, we will try hard not to relocate the databases once they have been patched.

Cheers

Ludo

Getting the Oracle Homes in a server from the oraInventory

The information contained in the oratab should always be updated, but it is not always reliable. If you want to know what Oracle installations you have in a server, better to get it from the Oracle Universal Installer or, if you want some shortcuts, do some grep magics inside the inventory with the shell.

The following diagram is a simplified structure of the inventory that shows what entries are present in the central inventory (one per server) and the local inventories (one per Oracle Home).

inventory_structureYou can use this simple function to get some content out of it, including the edition (that information is a step deeper in the local inventory).

HTH

Loading resolved Adaptive Plans in the SQL Plan Management

In my previous post, I have shown that loading Adaptive Plans in the SQL Plan Baseline leads to using the original plan. Well, actually, this is true when you capture them via the OPTIMIZER_CAPTURE_SQL_PLAN_BASELINES parameter.

Thanks to a tweet by Neil Chandler, I’ve realized that it was a good idea to show also the case when the plan is loaded manually.

When the adaptive plan switches to the alternative plan, the plan_hash_value also changes, and can be loaded manually in the baseline with DBMS_SPM.

Let’s reset everything and retry quickly to:

  • Capture the plan automatically (this will lead to the original plan)
  • Load the plan manually (I will specify to load the alternative plan, if resolved)
  • Drop the plan captured automatically
  • Use the newly accepted baseline

To recap:

  • The capture process will always load the original plan
  • It is possible to decide to load manually the original one or the alternative one (if resolved)
  • Using automatic capture is a bad idea

HTH

Ludo

How Adaptive Plans work with SQL Plan Baselines?

Disclaimer: after writing this post (but before publishing it) I have seen that other people already blogged about it, so I am ashamed of publishing it anyway… but that’s blogger’s life 🙂

Wednesday I have got a nice question after my presentation about Adaptive Features at the DOAG16 conference:

What happens when you load an adaptive plan in a SQL Plan Baseline?
Does it load only the final plan or does it load the whole plan including the inactive operations? Will the plan be evaluated again using the inflection point?

I have decided to do some tests in order to give the best possible answer. I did not spend the time to rethink about producing an adaptive plan. Tim Hall already did an excellent test case to create and alter an adaptive plan in his blog, so I have reused massively most of its code. Thanks Tim :-).

I will not post all the code (please find it in Tim’s post), I will go straight to the plans.

First: I have an adaptive plan that resolves to NESTED LOOPS:

Second: I load the plan (lazy way: using baseline capture at session level)

Third: re-run the statement and check the plan

It does not look adaptive, but I can also check from the function DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE:

Again, despite in the Note section it says it is adaptive, it does not look like an adaptive plan.

Can I trust this information? Of course I did not and tried to check the plan with and without baseline after changing the rows to force a plan switch to HJ (again taking Tim’s example):

After changing the rows:

  • when I do not use the baseline, the plan resolves to HASH JOIN
  • when I use it, the baseline forces to NESTED LOOPS.

So the plan in the baseline is not adaptive and it forces to what has been loaded. Is it the final plan or the original one? I have to capture it again to see if a new baseline appears:

A new baseline does not appear, so it looks that the original plan is considered by the capture process and not the resolved one! To be 100% sure, let’s try to drop the existing one and redo the test:

So, despite the fact that I have an adaptive plan that switches from NL to HJ, only the NESTED LOOPS operations are captured in the baseline, I can infer the only the original plan is loaded as SQL Plan Baseline.

References:

Autumn: a season of conferences and travels

It is not a news that autumn is the busiest season for people involved in the Oracle Community. Thanks to the OTN Nordic Tour this year I am setting my new record 🙂

In the next 2 months I will give 15 presentations in 8 distinct countries and in 3 distinct languages (Italian, French, English).

If you are based in one of those countries, you can join and say hello 🙂

Date/Time Event
11/10/2016
11:00 am - 12:00 pm
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb [Nordic Tour 2016 - Denmark]
Oracle Denmark, Ballerup
11/10/2016
2:10 pm - 3:10 pm
Migrating to 12c: 300 DBs in 300 days. What we learned [Nordic Tour 2016 - Denmark]
Oracle Denmark, Ballerup
12/10/2016
11:15 am - 12:00 pm
Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Norway]
Felix Conference Center, Oslo
12/10/2016
1:00 pm - 1:45 pm
Self-Service Database Operations made easy with APEX [Nordic Tour 2016 - Norway]
Felix Conference Center, Oslo
12/10/2016
3:00 pm - 3:45 pm
Database Migration Assistant for Unicode (DMU): a Real Customer Case [Nordic Tour 2016 - Norway]
Felix Conference Center, Oslo
13/10/2016
3:10 pm - 4:00 pm
Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Finland]
Accenture Finland, Helsinki
14/10/2016
9:00 am - 9:45 am
Migrating to 12c: 300 DBs in 300 days. What we learned. [Nordic Tour 2016 - Sweden]
Stockholm, Stockholm
14/10/2016
10:00 am - 10:45 am
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb. [Nordic Tour 2016 - Sweden]
Stockholm, Stockholm
11/11/2016
9:30 am - 10:15 am
Migrating to 12c: 300 DBs in 300 days. What we learned. [ITOUG Tech Day 2016]
UNA Hotel Century, Milano
11/11/2016
12:00 pm - 12:45 pm
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb.
UNA Hotel Century, Milano
16/11/2016
11:00 am - 11:45 am
Adaptive Features or: How I Learned to Stop Worrying and Troubleshoot the Bomb [DOAG 2016]
DOAG Konferenz 2016, Nürnberg
22/11/2016
10:50 am - 11:30 am
Montée en version de 300 bases de données vers Oracle 12c en 300 jours. Quels problèmes peut-on rencontrer ? [Swiss Data Forum 16]
Aquatis Hotel, Lausanne
23/11/2016
9:00 am - 12:00 pm
Migrating to Oracle Database 12c: 300 Databases in 300 Days [Oracle Tech Breakfast]
Oracle Business Breakfast, Oracle Suisse SA, Geneva
07/12/2016
12:30 pm - 1:15 pm
Upgrading 300 Databases to 12c in 300 Days. What Can Go Wrong? [UKOUG_Tech16]
International Convention Centre, Birmingham, Birmingham
07/12/2016
3:10 pm - 4:00 pm
Adaptive Features or: How I Learned to Stop Worrying & Troubleshoot the Bomb [UKOUG Tech16]
International Convention Centre, Birmingham, Birmingham

The updated list of upcoming events can be found here.