Storage
The storage module tries to stay as generic as possible, making no assumption about the transformation used or the configuration (e.g., the fingerprint). This is also where the functionalities exposed by the library are living.
Compressor
Abstraction used by the used by the store to be able to work independently of the underlying configuration and the transformation. This module is stateless, as it exposes functions which are purely functional. Thus, one store can scale the compressor module easily.
GD.Storage.Compressor
— TypeCompressor(chunksize, transformer, fingerprint)
Compresses/Extracts data according to the loaded configuration. A Compressor
is stateless. It is focused on data compression/extraction but does not store any value for deduplication.
fingerprint is a hashing function with the following signature:
fingerprint(data::Vector{Vector{UInt8}})::Vector{Vector{UInt8}}
Classic examples of fingerprints functions are CRC32
and SHA
from the standard library.
GD.Storage.compress
— Functioncompress(compressor, data)
Returns a compressed version of data
, as well as the bases which need to be used by compressor
for reconstructing data
.
GD.Storage.extract
— Methodextract(compressor, gdfile, bases)
Decompresses gdfile
into its original representation.
GD.Storage.hashes
— Functionhashes(compressor, data)
Hashes each element in data
with the compressor.fingerprint
and return an array of hashes.
GDFile
Data structure outputted by the compression process. This structure contains the hashes generated by the fingerprint and the deviations. padsize
indicates if the last chunk used for generating the file has been zero-padded. This occurs when the number of chunks given to the compressor is not a multiple of the configured chunksize
.
GD.Storage.GDFile
— TypeGDFile(hashes, deviations, padsize)
Data structure holding the compressed representation of data generated by a compressor. Suitable for storing (through serialization) or exchanging over the network.
Patching
This data structure can be patched by applying a simple delta compression algorithm through the functions patch
and unpatch
. The functionality can be leveraged when distributed stores are working together, as long as all the stores communicating posses the original file (gdfile2
) to either patch
or unpatch
the modified version (gdfile1
).
GD.Storage.patch
— Functionpatch(gdfile1, gdfile2)
Patches gdfile1
by replacing the hashes/deviations which are the same as gdfile2
by [0x00]
.
GD.Storage.unpatch
— Functionunpatch(gdfile1, gdfile2)
Unpachtes gdfile1
by repalcing [0x00]
from gdfile1
by the value contianed in gdfile2
.
Store
The store glues the other module together and offer an easy-to-use API for the (de)compression of chunks.
GD.Storage.Store
— TypeStore(compressor, database)
Unifies the Compressor
module and the database. The Store
handles the deduplication process by storing the bases generated by the Compressor
into the database
.
GD.Storage.compress!
— Functioncompress!(store, data)
Stores the bases generated from data
into the database
and returns a compressed version of data
as a GDFile
.
GD.Storage.extract
— Methodextract(store, gdfile)
Decompresses the gdfile
into its original representation. This methods assumes that a valide GDFile
is given as input (the validate()
method must return []
).
GD.Storage.get
— Functionget(store, hashes)
returns the values mapped to hashes
in store
.
GD.Storage.update!
— Functionupdate!(store, hashes}, bases)
Updates store.database
by mapping hashes
to bases
.
GD.Storage.validate
— Functionvalidate(store, gdfile)
Checks wether gdfile
can be extracted by store
or not by returning the list of unknown hashes used by gdfile
. The GDFile
is said valid if validate()
returns []
.