CouchDB Optimization

CouchDB was designed under the assumption that disk space is cheap. This may not be the case because some databases may grow exponentially and occupy large disk space.

There are two techniques (at the time of writing) that are used to manage the size of CouchDB database files, these are:

  1. Revision limit
  2. Compaction

Revision Limit

By default, CouchDB stores up to 1000 document revisions.

To reduce this size, CouchDB offers a parameter _revs_limit (revisions limit) which limits the number of revisions stored in a database.

To get the current revs_limit (revision limit) setting, run the following HTTP GET command:

curl -X GET http://localhost:5984/database/_revs_limit

To update revs_limit run the HTTP PUT command below:

curl  -u username:password -X PUT -d "10" http://localhost:5984/a-opensrp/_revs_limit

NB: Reducing the revisions limit increases the risk of getting conflicts during replication

Compaction

This operation is a way to reduce disk space usage by removing unused and old data from database or view index files similar to SQLite Vacuum.

It can be triggered manually or automatically

Manual trigger

To trigger compaction manually, run the following command:

curl -H "Content-Type: application/json" -u username:password -X POST http://localhost:5984/database/_compact

On success, HTTP status 202 Accepted is returned immediately:

HTTP/1.1 202 Accepted
Cache-Control: must-revalidate
Content-Length: 12
Content-Type: text/plain; charset=utf-8
Date: Wed, 19 Jun 2013 09:43:52 GMT
Server: CouchDB (Erlang/OTP)

{"ok":true}

Automatic Trigger

Automatic compaction is configured in CouchDB’s configuration files.

The compaction daemon is responsible for triggering the compaction.

[daemons]
#...
compaction_daemon={couch_compaction_daemon, start_link, []}
[compaction_daemon]
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 300
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it's not worth to compact them.
min_file_size = 131072

NB: min_file_size = 131072 means database files lower than 128KB will be ignored.

The criteria for triggering the compactions is configured in the "compactions" section.

[compactions]
; List of compaction rules for the compaction daemon.
; The daemon compacts databases and their respective view groups when all the
; condition parameters are satisfied. Configuration can be per database or
; global, and it has the following format:
;
; database_name = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ]
; _default = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ]

Possible Parameters

  • db_fragmentation: If the ratio (as an integer percentage), of the amount of old data (and its supporting metadata) over the database file size is equal to or greater then this value, this database compaction condition is satisfied. This value is computed as
    (file_size - data_size) / file_size * 100
    The data_size and file_size values can be obtained when querying a database's information URI (GET /dbname/).

  • view_fragmentation: If the ratio (as an integer percentage), of the amount of old data (and its supporting metadata) over the view index (view group) file size is equal to or greater then this value, then this view index compaction condition is satisfied. This value is computed as:
    (file_size - data_size) / file_size * 100
    The data_size and file_size values can be obtained when querying a view group's information URI (GET /dbname/_design/groupname/_info).

  • from _and_ to: The period for which a database (and its view groups) compaction is allowed. The value for these parameters must obey the format: HH:MM - HH:MM (HH in [0..23], MM in [0..59])

  • strict_window: If a compaction is still running after the end of the allowed period, it will be canceled if this parameter is set to 'true'. It defaults to 'false' and it's meaningful only if the *period* parameter is also specified.

  • parallel_view_compaction: If set to 'true', the database and its views are compacted in parallel. This is only useful on certain setups, like for example when the database and view index directories point to different disks. It defaults to 'false'.

Before a compaction is triggered, an estimation of how much free disk space is needed is computed. This estimation corresponds to 2 times the data size of the database or view index. When there's not enough free disk space to compact a particular database or view index, a warning message is logged.

Examples

  1. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}]
    The foo database is compacted if its fragmentation is 70% or more. Any view index of this database is compacted only if its fragmentation is 60% or more.

  2. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}]
    Similar to the preceding example but a compaction (database or view index) is only triggered if the current time is between midnight and 4 AM.

  3. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}]
    Similar to the preceding example - a compaction (database or view index) is only triggered if the current time is between midnight and 4 AM. If at 4 AM the database or one of its views is still compacting, the compaction process will be canceled.

  4. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}, {parallel_view_compaction, true}]
    Similar to the preceding example, but a database and its views can be compacted in parallel.

Default Configuration

The default configuration - if enabled - applies to all databases. For example:

_default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "23:00"}, {to, "04:00"}]

It may be a great idea to configure error trace databases differently. For example:

database_error_trace = [{db_fragmentation, "50%"}, {view_fragmentation, "50%"}, {from, "19:00"}, {to, "23:00"}]
_default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "23:00"}, {to, "04:00"}]


Also, note that compaction does not affect Changes Feed used by IBM Cloudant to sync data to OpenSRP client. Checkout this stackoverflow answer for more information.

References

http://eclipsesource.com/blogs/2012/07/11/reducing-couchdb-disk-space-consumption/

http://docs.couchdb.org/en/1.6.1/maintenance/compaction.html

https://wiki.apache.org/couchdb/Compaction/

http://www.sqlite.org/lang_vacuum.html

http://docs.couchdb.org/en/1.6.1/api/database/misc.html#put--db-_revs_limit

http://docs.couchdb.org/en/1.6.1/api/database/compact.html#post--db-_compact-ddoc