Storage
The storage module tries to stay as generic as possible, making no assumption about the transformation used or the configuration (e.g., the fingerprint). This is also where the functionalities exposed by the library are living.
Compressor
Abstraction used by the used by the store to be able to work independently of the underlying configuration and the transformation. This module is stateless, as it exposes functions which are purely functional. Thus, one store can scale the compressor module easily.
GD.Storage.Compressor — TypeCompressor(chunksize, transformer, fingerprint)Compresses/Extracts data according to the loaded configuration. A Compressor is stateless. It is focused on data compression/extraction but does not store any value for deduplication.
fingerprint is a hashing function with the following signature:
fingerprint(data::Vector{Vector{UInt8}})::Vector{Vector{UInt8}}Classic examples of fingerprints functions are CRC32 and SHA from the standard library.
GD.Storage.compress — Functioncompress(compressor, data)Returns a compressed version of data, as well as the bases which need to be used by compressor for reconstructing data.
GD.Storage.extract — Methodextract(compressor, gdfile, bases)Decompresses gdfile into its original representation.
GD.Storage.hashes — Functionhashes(compressor, data)Hashes each element in data with the compressor.fingerprint and return an array of hashes.
GDFile
Data structure outputted by the compression process. This structure contains the hashes generated by the fingerprint and the deviations. padsize indicates if the last chunk used for generating the file has been zero-padded. This occurs when the number of chunks given to the compressor is not a multiple of the configured chunksize.
GD.Storage.GDFile — TypeGDFile(hashes, deviations, padsize)Data structure holding the compressed representation of data generated by a compressor. Suitable for storing (through serialization) or exchanging over the network.
Patching
This data structure can be patched by applying a simple delta compression algorithm through the functions patch and unpatch. The functionality can be leveraged when distributed stores are working together, as long as all the stores communicating posses the original file (gdfile2) to either patch or unpatch the modified version (gdfile1).
GD.Storage.patch — Functionpatch(gdfile1, gdfile2)Patches gdfile1 by replacing the hashes/deviations which are the same as gdfile2 by [0x00].
GD.Storage.unpatch — Functionunpatch(gdfile1, gdfile2)Unpachtes gdfile1 by repalcing [0x00] from gdfile1 by the value contianed in gdfile2.
Store
The store glues the other module together and offer an easy-to-use API for the (de)compression of chunks.
GD.Storage.Store — TypeStore(compressor, database)Unifies the Compressor module and the database. The Store handles the deduplication process by storing the bases generated by the Compressor into the database.
GD.Storage.compress! — Functioncompress!(store, data)Stores the bases generated from data into the database and returns a compressed version of data as a GDFile.
GD.Storage.extract — Methodextract(store, gdfile)Decompresses the gdfile into its original representation. This methods assumes that a valide GDFile is given as input (the validate() method must return []).
GD.Storage.get — Functionget(store, hashes)returns the values mapped to hashes in store.
GD.Storage.update! — Functionupdate!(store, hashes}, bases)Updates store.database by mapping hashes to bases.
GD.Storage.validate — Functionvalidate(store, gdfile)Checks wether gdfile can be extracted by store or not by returning the list of unknown hashes used by gdfile. The GDFile is said valid if validate() returns [].