PostgreSQL Large Objects and space usage (part 3)

A blog post series would not be complete without a final post about vacuumlo.

In the previous post we have seen that the large objects are split in tuples containing 2048 bytes each one, and each chunk behaves in the very same way as regular tuples.

What distinguish large objects?
NOTE: in PostgreSQL, IT IS possible to store a large amount of data along with the table, thanks to the TOAST technology. Read about TOAST here.

Large objects are not inserted in application tables, but are threated in a different way. The application using large objects usually has a table with columns of type OID. When the application creates a new large objects, a new OID number is assigned to it, and this number is inserted into the application table.
Now, a common mistake for people who come from other RDBMS (e.g. Oracle), think that a large object is unlinked automatically when the row that references
it is deleted. It is not, and we need to unlink it explicitly from the application.

Let’s see it with a simple example, starting with an empty pg_largeobject table:

Let’s insert a new LOB and reference it in the table t:

Another one:

If we delete the first one, the chunks of its LOB are still there, valid:

If we want to get the rid of the LOB, we have to unlink it, either explicitly or by using triggers that unlink the LOB when a record in the application table is deleted.
Another way is to use the binary vacuumlo included in PostgreSQL.
It scans the pg_largeobject_metadata and search through the tables that have OID columns to find if there are any references to the LOBs. The LOB that are not referenced, are unlinked.
ATTENTION: this means that if you use ways to reference LOBs other than OID columns, vacuumlo might unlink LOBs that are still needed!

vacuumlo has indeed unlinked the first LOB, but the deleted tuples are not freed until a vacuum is executed:

So vacuumlo does not do any vacuuming on pg_largeobject table.

The following two tabs change content below.

Ludovico

Principal Product Manager at Oracle
Ludovico is a member of the Oracle Database High Availability (HA), Scalability & Maximum Availability Architecture (MAA) Product Management team in Oracle. He focuses on Oracle Data Guard, Flashback technologies, and Cloud MAA.

2 thoughts on “PostgreSQL Large Objects and space usage (part 3)

  1. Pingback: Blog Post: PostgreSQL Large Objects and space usage (part 3)

  2. Pingback: PostgreSQL Large Objects and space usage (part 3) - Ludovico Caldara - Blogs - triBLOG

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.