Most of us know the database tables needed vacuum to clean the bloat. PostgreSQL has a utility to clean this bloat called Vacuum.
Vacuuming can be done in below 3 ways;
3. Vacuum Full.
This blog discribes which vacuum to be used and when briefly.
This feature is optional but highly recommended. Autovacuum is a postgresql's own daemon which checks for tables that have had a large number of inserted, updated or deleted tuples,
to update the statistics in the system catalogs. When the set thresholds reached on any of tables in the database the autovacuum kicks on and start vacuuming those tables.
As the autovacuum is one of the best utilities to improve the performance, in the same way if it is not properly tuned it will lower the performance too.
Challenging part is how we should set these settings, which we will discuss in detail in coming posts, stay tune !
Basically postgres uses below calculation to know when a table can be triggered to cleanup;
Number of dead tuples of a table exceeds the value of (vacuum base threshold + vacuum scale factor * number of tuples)
Here vacuum base threshold is autovacuum_vacuum_threshold ,
vacuum scale factor is autovacuum_vacuum_scale_factor of postgresql.conf and
number of tuples is of pg_class.reltuples.
The default values set in postgresql.conf are
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
Based on the above formula if the table has 10000 rows.
50+0.2*10000 = 2050
The autovacuum will be triggered when the number of dead tuples value exceeds 2050;
Vacuum is also used to claim the free space by cleaning bloat in the database like autovacuum,
the main difference here is vacuum needs to be run manually.
It only happens when you decide run, it can be through scripts,cron and manual run.
When we have autovacuum daemon why do we need Vacuum again ?
Autovacuum also depends on other factors to start triggering (we will discuss these in next posts),
hence it is difficult to say autovacuum is enabled and the performance is upto the mark.
Most systems have heavy and idle times, which can be determined by load average on the system.
Here you can't stop autovacuum to run on heavy times and can't force autovacuum to run in idle times.
In this case you can use vacuum to run at system idle times or you can run daily after midnight or weekly basis.
By this way you can manage the system load. Or tune your autovacuum settings aggressively to manage the load.
Make a note that you should tune autovacuum to maintain busy tables properly, rather than manually vacuuming them.
Both vacuum and autovacuum will clean the bloats and reclaim the space to allow new entries on the tables,
but you can't see the free space on operating system level. To claim space at OS level you have to use manual vacuum with Full option.
This vacuum full will have lot of drawbacks, is very expensive compared to a regular VACUUM. It rebuilds the entire table and all indexes from scratch,
and it holds a write lock on the table while running which will block all DML queries of your application. In short you need down time to run vacuum full.
Usually no one will suggests to run this because of these drawback unless you have space issues.
Stay tuned for aggressive autovacuum ...
Thanks a lot for explanation on autovacuum triggering.
But in the formula for determining auto-vacuuming on a certain table,
(vacuum base threshold + vacuum scale factor * number of tuples)
What does exactly "number of tuples" refer to?
Is this the number of total live tuples (i.e. SELECT count(*) FROM some_table),
or the number of (live tuples + dead tuples)?
Thanks a lot for your answer.
As far as I know, pg_class.reltuples is an esitmated value updated by ANALYZE.
What might happen if this value is outdated for some tables?
Can autovacuum still be triggered on these tables with outdated pg_class.reltuples value?
Number of obsolete tuples is obtained from a saperate process the 'Stats Collector'. So, even if the pg_calss.reltuples are outdated because of one of the criterias being number of tuples inserted, updated, or deleted the table which are captured by the stats collector will still be eligable for an analyse.