PostgreSQL Large Objects and space usage (part 3)

A blog post series would not be complete without a final post about vacuumlo.

In the previous post we have seen that the large objects are split in tuples containing 2048 bytes each one, and each chunk behaves in the very same way as regular tuples.

What distinguish large objects?
NOTE: in PostgreSQL, IT IS possible to store a large amount of data along with the table, thanks to the TOAST technology. Read about TOAST here.

Large objects are not inserted in application tables, but are threated in a different way. The application using large objects usually has a table with columns of type OID. When the application creates a new large objects, a new OID number is assigned to it, and this number is inserted into the application table.
Now, a common mistake for people who come from other RDBMS (e.g. Oracle), think that a large object is unlinked automatically when the row that references
it is deleted. It is not, and we need to unlink it explicitly from the application.

Let’s see it with a simple example, starting with an empty pg_largeobject table:

Let’s insert a new LOB and reference it in the table t:

Another one:

If we delete the first one, the chunks of its LOB are still there, valid:

If we want to get the rid of the LOB, we have to unlink it, either explicitly or by using triggers that unlink the LOB when a record in the application table is deleted.
Another way is to use the binary vacuumlo included in PostgreSQL.
It scans the pg_largeobject_metadata and search through the tables that have OID columns to find if there are any references to the LOBs. The LOB that are not referenced, are unlinked.
ATTENTION: this means that if you use ways to reference LOBs other than OID columns, vacuumlo might unlink LOBs that are still needed!

vacuumlo has indeed unlinked the first LOB, but the deleted tuples are not freed until a vacuum is executed:

So vacuumlo does not do any vacuuming on pg_largeobject table.

PostgreSQL Large Objects and space usage (part 2)

In my previous post I showed how large objects use space inside the table pg_largeobject when inserted.

Let’s see something more:

The table had 2 large objects (for a total of 1024 records):

Let’s try to add another random-padded file:

As expected, because a random sequence of characters cannot be compressed, the size increased again by 171 blocks (see my previous post for the explanation)

If you read this nice series of blog posts by Frits Hoogland, you should know about the pageinspect extension and the t_infomask 16-bit mask.

Let’s install it and check the content of the pg_largeobjects pages:

We already know the mathematics, but we love having all the pieces come together 🙂

We know that: The page header is 24 bytes, and that the line pointers use 4 bytes for each tuple.

The first 4 pages have the lower offset to 452 bytes means that we have (452-24)/4 = 107 tuples.

The 5th page (page number 4) has the lower to 360: (360-24)/4=84 tuples.

The remaining pages have the lower to 36: (36-24)/4 = 3 tuples.

Let’s check if we are right:

🙂
Now, let’s delete the 1Mb file and check the space again:

The space is still used and the tuples are still there.

However, we can check that the tuples are no longer used by checking the validity of their t_xmax. In fact, according to the documentation, if the XMAX is invalid the row is at the latest version:

[…] a tuple is the latest version of its row iff XMAX is invalid or t_ctid points to itself (in which case, if XMAX is valid, the tuple is either locked or deleted). […]
 (from htup_details.h lines 87-89).
We have to check the infomask against the 12th bit (2048, or 0x0800)
#define HEAP_XMAX_INVALID       0x0800  /* t_xmax invalid/aborted */

Here we go. The large objects are split in compressed chunks that internally behave the same way as regular rows!

If we import another lob we will see that the space is not reused:

Flagging the tuples as reusable is the vacuum’s job:

The normal vacuum does not release the empty space, but it can be reused now:

If we unlink the lob again and we do a vacuum full, the empty space is released: