compression - DBA survival BLOG

PostgreSQL uses a nice, non standard mechanism for big columns called TOAST (hopefully will blog about it in the future) that can be compared to extended data types in Oracle (TOAST rows by the way can be much bigger). But traditional large objects exist and are still used by many customers.

If you are new to large objects in PostgreSQL, read here. For TOAST, read here.

Inside the application tables, the columns for large objects are defined as OIDs that point to data chunks inside the pg_largeobject table.

Because the large objects are created independently from the table columns that reference to it, when you delete a row from the table that points to the large object, the large object itself is not deleted.

Moreover, pg_largeobject stores by design all the large objects that exist in the database.

This makes housekeeping and maintenance of this table crucial for the database administration. (we will see it in a next post)

How is space organized for large objects?

We will see it by examples. Let’s start with an empty database with empty pg_largeobject:

lob_test=# select count(*) from pg_largeobject;
 count
-------
     0
(1 row)

lob_test=# vacuum full pg_largeobject;
VACUUM

lob_test=# select pg_total_relation_size('pg_largeobject');
 pg_total_relation_size
------------------------
                   8192
(1 row)

lob_test=# select count(*) from pg_largeobject;

count

-------

(1 row)

lob_test=# vacuum full pg_largeobject;

VACUUM

lob_test=# select pg_total_relation_size('pg_largeobject');

pg_total_relation_size

------------------------

8192

(1 row)

Just one block. Let’s see its file on disk:

lob_test=# SELECT pg_relation_filepath('pg_largeobject');
 pg_relation_filepath
----------------------
 base/16471/16487
(1 row)

# ls -l base/16471/16487
-rw------- 1 postgres postgres 0 Jul 26 16:58 base/16471/16487

lob_test=# SELECT pg_relation_filepath('pg_largeobject');

pg_relation_filepath

----------------------

base/16471/16487

(1 row)

# ls -l base/16471/16487

-rw------- 1 postgres postgres 0 Jul 26 16:58 base/16471/16487

First evidence: the file is empty, meaning that the first block is not created physically until there’s some data in the table (like deferred segment creation in Oracle, except that the file exists).

Now, let’s create two files big 1MB for our tests, one zero-padded and another random-padded:

$ dd if=/dev/zero    of=/tmp/zeroes  bs=1024 count=1024
$ dd if=/dev/urandom of=/tmp/randoms bs=1024 count=1024
$ ls -l /tmp/zeroes /tmp/randoms
-rw-r--r-- 1 postgres postgres 1048576 Jul 26 16:56 /tmp/randoms
-rw-r--r-- 1 postgres postgres 1048576 Jul 26 16:23 /tmp/zeroes

$ dd if=/dev/zero of=/tmp/zeroes bs=1024 count=1024

$ dd if=/dev/urandom of=/tmp/randoms bs=1024 count=1024

$ ls -l /tmp/zeroes /tmp/randoms

-rw-r--r-- 1 postgres postgres 1048576 Jul 26 16:56 /tmp/randoms

-rw-r--r-- 1 postgres postgres 1048576 Jul 26 16:23 /tmp/zeroes

Let’s import the zero-padded one:

lob_test=# \lo_import '/tmp/zeroes';
lo_import 16491
lob_test=# select count(*) from pg_largeobject_metadata;
 count
-------
     1
(1 row)

lob_test=# select count(*) from pg_largeobject;
 count
-------
   512
(1 row)

lob_test=# \lo_import '/tmp/zeroes';

lo_import 16491

lob_test=# select count(*) from pg_largeobject_metadata;

count

-------

(1 row)

lob_test=# select count(*) from pg_largeobject;

count

-------

512

(1 row)

The large objects are split in chunks big 2048 bytes each one, hence we have 512 pieces. What about the physical size?

lob_test=# select pg_relation_size('pg_largeobject');
 pg_total_relation_size
------------------------
                  40960
(1 row)


bash-4.1$ ls -l 16487*
-rw------- 1 postgres postgres 40960 Jul 26 17:18 16487

lob_test=# select pg_relation_size('pg_largeobject');

pg_total_relation_size

------------------------

40960

(1 row)

bash-4.1$ ls -l 16487*

-rw------- 1 postgres postgres 40960 Jul 26 17:18 16487

Just 40k! This means that the chunks are compressed (like the TOAST pages). PostgreSQL uses the pglz_compress function, its algorithm is well explained in the source code src/common/pg_lzcompress.c.

What happens when we insert the random-padded file?

lob_test=# \lo_import '/tmp/randoms';
lo_import 16492

lob_test=# select count(*) from pg_largeobject where loid=16492;
 count
-------
   512
(1 row)

lob_test=# select pg_relation_size('pg_largeobject');
 pg_relation_size
------------------
          1441792
(1 row)

$ ls -l 16487
-rw------- 1 postgres postgres 1441792 Jul 26 17:24 16487

lob_test=# \lo_import '/tmp/randoms';

lo_import 16492

lob_test=# select count(*) from pg_largeobject where loid=16492;

count

-------

512

(1 row)

lob_test=# select pg_relation_size('pg_largeobject');

pg_relation_size

------------------

1441792

(1 row)

$ ls -l 16487

-rw------- 1 postgres postgres 1441792 Jul 26 17:24 16487

The segment increased of much more than 1Mb! precisely, 1441792-40960 = 1400832 bytes. Why?

The large object is splitted again in 512 data chinks big 2048 bytes each, and again, PostgreSQL tries to compress them. But because a random string cannot be compressed, the pieces are still (average) 2048 bytes big.

Now, a database block size is 8192 bytes. If we subtract the size of the bloch header, there is not enough space for 4 chunks of 2048 bytes. Every block will contain just 3 non-compressed chunks.

So, 512 chunks will be distributed over 171 blocks (CEIL(512/3.0)), that gives:

lob_test=# select ceil(1024*1024/2048/3.0)*8192;
 ?column?
----------
  1400832
(1 row)

lob_test=# select ceil(1024*1024/2048/3.0)*8192;

?column?

----------

1400832

(1 row)

1400832 bytes!

Depending on the compression rate that we can apply to our large objects, we might expect much more or much less space used inside the pg_largeobject table.

Oracle Database 12.1.0.2 is finally out, and as we all knew in advance, it contains the new in-memory option.

I think that, despite its cost ($23k per processor), this is another great improvement! 🙂

Consistent savings!

This new feature is not to be confused with Times Ten. In-memory is a feature that enable a new memory area inside the SGA that is used to contain a columnar organized copy of segments entirely in memory. Columnar stores organize the data as columns instead of rows and they are ideal for queries that involve a few columns on many rows, e.g. for analytic reports, but they work great also for all extemporary queries that cannot make use of existing indexes.

Columnar stores don’t replace traditional indexes for data integrity or fast single-row look-ups, but they can replace many additional indexes created for the solely purpose of reporting. Hence, if from one side it seems a waste of memory, on the other side using in-memory can lead to consistent memory savings due to all the indexes that have no more reason to exist.

Let’s take an example of a table (in RED) with nine indexes (other colors).

If you try to imagine all the blocks in the buffer cache, you may think about something like this:

Now, with the in-memory columnar store, you can get the rid of many indexes because they’ve been created just for reporting and they are now superseded by the performance of the new feature:

In this case, you’re not only saving blocks on disk, but also in the buffer cache, making room for the in-memory area. With columnar store, the compression factor may allow to easily fit your entire table in the same space that was previously required for a few, query-specific indexes. So you’ll have the buffer cache with traditional row-organized blocks (red, yellow, light and dark blue) and the separate in-memory area with a columnar copy of the segment (gray).

The in-memory store doesn’t make use of undo segments and redo buffer, so you’re also saving undo block buffers and physical I/O!

The added value

In my opinion this option will have much more attention from the customers than Multitenant for a very simple reason.

How many customers (in percentage) would pay to achieve better consolidation of hundreds of databases? A few.

How many would pay or are already paying for having better performance for critical applications? Almost all the customers I know!

Internal mechanisms

In-memory is enabled on a per-segment basis: you can specify a table or a partition to be stored in-memory.

Each column is organized in separate chunks of memory called In Memory Compression Units (IMCU). The number of IMCUs required for each column may vary.

Each IMCU contains the data of the column and a journal used to guarantee read consistency with the blocks in the buffer cache. The data is not modified on the fly in the IMCU, but the row it refers to is marked as stale in a journal that is stored inside the IMCU itself. When the stale data grows above a certain threshold the space efficiency of the columnar store decreases and the in-memory coordinator process ([imco]) may force a re-population of the store.
Re-population may also occur after manual intervention or at the instance startup: because it is memory-only, the data actually need to be populated in the in-memory store from disk.

Whether the data is populated immediately after the startup or not, it actually depends on the priority specified for the specific segment. The higher the priority, the sooner the segment will be populated in-memory. The priority attribute also drives which segments would survive in-memory in case of “in-memory pressure”. Sadly, the parameter inmemory_size that specifies the size of the in-memory area is static and an instance restart is required in order to change it, that’s why you need to plan carefully the size prior to its activation. There is a compression advisor for in-memory that can help out on this.

Conclusion

In this post you’ve seen a small introduction about in-memory. I hope I can publish very soon another post with a few practical examples.

DBA survival BLOG

DBA stuff and Oracle Data Guard

Tag Archives: compression

PostgreSQL Large Objects and space usage (part 1)

Oracle Database 12c in-memory option, a quick overview