Skip to content

DBM Module Vacuuming #134004

Closed
Closed
@Andrea-Oliveri

Description

@Andrea-Oliveri

Feature or enhancement

Proposal:

Good afternoon,
The dbm module, and by extension shelve as well, don't provide any way to reclaim free space when lots of deletions from the database happen. This applies to all of the dbm submodules (dbm.dumb, dbm.sqlite, dbm.ndbm, dbm.gnu).

This can lead to hundreds of GB of wasted space when using them to store complex objects, such as when using them as a persistent cache.

Most of the underlying libraries, however, support ways to retrieve space on-demand:

  • VACUUM in sqlite3
  • gdbm_reorganize for gnu
  • None for ndbm
  • None for dumb (but this is simple to implement and I would be happy to contribute: in-place copies used parts of the binary file and updates the index. The advantage is this won’t use more disk space while vacuuming, but if program is interrupted during vacuum, DB will be corrupted (note: this is the case for many dbm.dumb operations already)

Additionally, I would like to update the documentation to highlight the disadvantages of dbm.dumb. For now they are only comments in the source code and are hidden from developers reading the doc:

  • Lack of support for any concurrency
  • Slowness linearly proportional to index size
  • (This will hopefully be fixed by the PR so it won't be included but otherwise also) never retrieves space of deleted items.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/dbm-module-add-vacuuming/91507

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions