Barecat#

class barecat.Barecat(path, shard_size_limit=None, readonly=True, overwrite=False, auto_codec=False, exist_ok=True, append_only=False, threadsafe=False, allow_writing_symlinked_shard=False)#

Bases: collections.abc.MutableMapping[str, Any]

Object for reading or writing a Barecat archive.

A Barecat archive consists of several (large) shard files, each containing the data of multiple small files, and an SQLite index database that maps file paths to the corresponding shard, offset and size within the shard, as well as metadata such as modification time and checksum.

The Barecat object provides two main interfaces:

  1. A dict-like interface, where keys are file paths and values are the file contents. The contents can be raw bytes, or automatically decoded based on the file extension, if auto_codec is set to True or codecs have been registered via register_codec().

  2. A filesystem-like interface consisting of methods such as open(), exists(), listdir(), walk(), glob(), etc., modeled after Python’s os module.

Parameters:
  • path (str) – Path to the Barecat archive, without the -sqlite-index or -shard-XXXXX suffixes.

  • shard_size_limit (Optional[int]) – Maximum size of each shard file. If None, the shard size is unlimited.

  • readonly (bool) – If True, the Barecat archive is opened in read-only mode.

  • overwrite (bool) – If True, the Barecat archive is first deleted if it already exists.

  • auto_codec (bool) – If True, automatically encode/decode files based on their extension.

  • exist_ok (bool) – If True, do not raise an error if the Barecat archive already exists.

  • append_only (bool) – If True, only allow appending to the Barecat archive.

  • threadsafe (bool) – If True, the Barecat archive is opened in thread-safe mode, where each thread or process will hold its own database connection and file handles for the shards.

  • allow_writing_symlinked_shard (bool) – If True, allow writing to a shard file that is a symlink. Setting it to False is recommended, since changing the contents of a symlinked shard will bring the original index database out of sync with the actual shard contents.

Properties#

num_files

The number of files in the archive.

num_dirs

The number of directories in the archive.

total_size

The total size of all files in the archive, in bytes.

total_physical_size_seek

Total size of all shard files, as determined by seeking to the end of the shard files.

total_physical_size_stat

Total size of all shard files, as determined by the file system's stat response.

total_logical_size

Total size of all files in the archive, as determined by the index database.

shard_size_limit

Maximum size of each shard file.

index

Index object to manipulate the metadata database of the Barecat archive.

Methods#

__getitem__(path)

Get the contents of a file in the Barecat archive.

get(path[, default])

Get the contents of a file in the Barecat archive, with a default value if the file does

items()

Iterate over all files in the archive, yielding (path, content) pairs.

keys()

Iterate over all file paths in the archive.

values()

Iterate over all file contents in the archive.

__contains__(path)

Check if a file with the given path exists in the archive.

__len__()

Get the number of files in the archive.

__iter__()

Iterate over all file paths in the archive.

__setitem__(path, content)

Add a file to the Barecat archive.

setdefault(key[, default])

D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

__delitem__(path)

Remove a file from the Barecat archive.

open(item[, mode])

Open a file in the archive, as a file-like object.

exists(path)

Check if a file or directory exists in the archive.

isfile(path)

Check if a file exists in the archive.

isdir(path)

Check if a directory exists in the archive.

listdir(path)

List all files and directories in a directory.

walk(path)

Recursively list all files and directories in the tree starting from a directory.

scandir(path)

Iterate over all files and sub directories, as BarecatFileInfo or BarecatDirInfo

glob(pattern[, recursive, include_hidden])

Find all files and directories matching a Unix-like glob pattern.

globfiles(pattern[, recursive, include_hidden])

Find all files matching a Unix-like glob pattern.

iglob(pattern[, recursive, include_hidden])

Iterate over all files and directories matching a Unix-like glob pattern.

iglobfiles(pattern[, recursive, include_hidden])

Iterate over all files matching a Unix-like glob pattern.

files()

Iterate over all file paths in the archive.

dirs()

Iterate over all directory paths in the archive.

readinto(item, buffer[, offset])

Read a file into a buffer, starting from an offset within the file.

read(item[, offset, size])

Read a file from the archive, starting from an offset and reading a specific number of

add_by_path(filesys_path[, store_path, dir_exist_ok])

Add a file or directory from the filesystem to the archive.

add(info, *[, data, fileobj, bufsize, dir_exist_ok])

Add a file or directory to the archive.

remove(item)

Remove (delete) a file from the archive.

rmdir(item)

Remove (delete) an empty directory from the archive.

remove_recursively(item)

Remove (delete) a directory and all its contents recursively from the archive.

rename(old_path, new_path)

Rename a file or directory in the archive.

merge_from_other_barecat(source_path[, ignore_duplicates])

Merge the contents of another Barecat archive into this one.

logical_shard_end(shard_number)

Logical end of a shard, in bytes, that is the position after the last byte of the last

physical_shard_end(shard_number)

Physical end of a shard, in bytes, that is the end seek position of the shard file.

check_crc32c(item)

Check the CRC32C checksum of a file in the archive.

verify_integrity([quick])

Verify the integrity of the Barecat archive.

register_codec(exts, encoder, decoder[, nonfinal])

Register an encoder and decoder for one or more file extensions.

defrag([quick])

Defragment the Barecat archive.

close()

Close the Barecat archive.

__enter__()

Enter a context manager.

__exit__(exc_type, exc_val, exc_tb)

Exit a context manager.