Storage

The storage module tries to stay as generic as possible, making no assumption about the transformation used or the configuration (e.g., the fingerprint). This is also where the functionalities exposed by the library are living.

Compressor

Abstraction used by the used by the store to be able to work independently of the underlying configuration and the transformation. This module is stateless, as it exposes functions which are purely functional. Thus, one store can scale the compressor module easily.

GD.Storage.CompressorType
Compressor(chunksize, transformer, fingerprint)

Compresses/Extracts data according to the loaded configuration. A Compressor is stateless. It is focused on data compression/extraction but does not store any value for deduplication.

fingerprint is a hashing function with the following signature:

fingerprint(data::Vector{Vector{UInt8}})::Vector{Vector{UInt8}}

Classic examples of fingerprints functions are CRC32 and SHA from the standard library.

source
GD.Storage.compressFunction
compress(compressor, data)

Returns a compressed version of data, as well as the bases which need to be used by compressor for reconstructing data.

source
GD.Storage.extractMethod
extract(compressor, gdfile, bases)

Decompresses gdfile into its original representation.

source
GD.Storage.hashesFunction
hashes(compressor, data)

Hashes each element in data with the compressor.fingerprint and return an array of hashes.

source

GDFile

Data structure outputted by the compression process. This structure contains the hashes generated by the fingerprint and the deviations. padsize indicates if the last chunk used for generating the file has been zero-padded. This occurs when the number of chunks given to the compressor is not a multiple of the configured chunksize.

GD.Storage.GDFileType
GDFile(hashes, deviations, padsize)

Data structure holding the compressed representation of data generated by a compressor. Suitable for storing (through serialization) or exchanging over the network.

source

Patching

This data structure can be patched by applying a simple delta compression algorithm through the functions patch and unpatch. The functionality can be leveraged when distributed stores are working together, as long as all the stores communicating posses the original file (gdfile2) to either patch or unpatch the modified version (gdfile1).

GD.Storage.patchFunction
patch(gdfile1, gdfile2)

Patches gdfile1 by replacing the hashes/deviations which are the same as gdfile2 by [0x00].

source
GD.Storage.unpatchFunction
unpatch(gdfile1, gdfile2)

Unpachtes gdfile1 by repalcing [0x00] from gdfile1 by the value contianed in gdfile2.

source

Store

The store glues the other module together and offer an easy-to-use API for the (de)compression of chunks.

GD.Storage.StoreType
Store(compressor, database)

Unifies the Compressor module and the database. The Store handles the deduplication process by storing the bases generated by the Compressor into the database.

source
GD.Storage.compress!Function
compress!(store, data)

Stores the bases generated from data into the database and returns a compressed version of data as a GDFile.

source
GD.Storage.extractMethod
extract(store, gdfile)

Decompresses the gdfile into its original representation. This methods assumes that a valide GDFile is given as input (the validate() method must return []).

source
GD.Storage.validateFunction
validate(store, gdfile)

Checks wether gdfile can be extracted by store or not by returning the list of unknown hashes used by gdfile. The GDFile is said valid if validate() returns [].

source