2. Features

2.1. Compression in ARX

Although the urgent need of compressing data is starting to fade as technology evolves, it shouldn't be considered unessesary. Depending on the type of data to be stored, compression may result in space gains of up to 90%. Examples for such great gains may be black and white mask images. Compression is also recommended for texts. Average plain texts may be compressed to 20% till 40%. In general of course every chunk data can be compressed to a certain degree.

Since ARX does not care about the type of data that is stored in the data block of an item, it will not use any compression algorithms that produce dataloss effects. So ARX will not help you pack your audio or video collections very well if you are not willing to preprocess the data with the appropriate encoders. Such algorithms are data-loss producing and therefore not applicable by ARX implementations, since ARX has no idea of the meaning of data. A loss-less algorithm is needed.

The ARX implementation libarxx currently utilizes the zlib library to compress and decompress any data. However, since an identifier for the encoder used to compress the data is saved along with the data and there is plenty of room in the identifier definition space other encoders can easily be integrated into any ARX implementation.

2.2. Structure in ARX

The first and easiest way to bring structure to a bunch of individuals is assigning numbers. ARX defines that each and every item in an ARX archive must be assigned a unique ID, allowing users of the archive to identify an item within an archive by a single unsigned 4-byte value. Arxx directs the responsibility for ensuring the uniqueness of such an ID to an Archive class and requires assurance of ID uniqueness. ARX does not state requirements for any order of the unique IDs. In fact ARX allows the process to find an unused ID to employ randomizing fascilities. Arxx additionally requires any implementation to enable the user to set the unique ID of any item to a certain value, while keeping the uniqueness assured.

After you have given everybody a number you can give them names as well. ARX does not define a unique name concept, thus allowing you to have multiple items with the same name in one archive.

What brings the most structure however is an item's capability to contain references to other items. This concept is very easy and used in every filesystem known to the user as directories. However in ARX, this item referencing is completely independant from the data stored in that item, allowing you to have directories that have data of any form as well. And there is yet another dimension added to your general directory-tree-understanding: items can have as many relations as you want, identifying them by a unique name. This way you can have multiple tree-structures overlapping. Or ring-like structures. Or commentary items for each item or whatever.

2.3. References in Arxx

The general way of referencing an Item is through its unique ID. The Archive object allows you to translate this unique ID into a pointer which is either 0, in case an Item with that particular ID doesn't exist, or is correctly pointing to the Item in question. This translation from unique ID to Item pointer will be referred to as resolving a unique ID and is only valid in context of an Archive.

One problem with this approach is that after resolving the unique ID and doing your work with it your go out of scope and your resolved Item is lost. So the next time you want to perform some operation on the very same Item you have to resolve it again. Now, resolving a unique ID into an item is not a very complex task as its complexity is in O(log(n)), where n is the number of items in the archive, but when dealing with structures you are surely about to do a lot more resolvings and the amount of time spent for searching Item pointers will rise drastically.

The next idea, of course, is to cache the resolved item in some structure and before asking the Library to resolve a unique ID ask the cache to do so. Using a good hash map you could certainly get some good results but what, if an Item is unregistered between two cache hits. The first time everything is OK but the second time will result in a segmentation fault because the cached item pointer is pointing to nowhere but is not 0. It would be nice if the cached item pointer would set itself to 0 when the Item is unregistered at the Archive.

Although this could be done using signals this approach would have high impact on the overall performance of any Arxx implementation. Instead another pattern is used: one instance / multiple representations. Every potential Item has one reference instance that holds the actual cahced resolving and may have multiple reference representations that use the single reference instance to get the data. A potential item is every unique ID you can think of. Of course not every such reference instance is allocated blindly but only those that you request at the Archive, for which you will get one reference representation. Since the Archive has a representation of every instance as well, it is able to update the representations which will forward the update requests to the single underlying instance, which will then be updated. Et voila, the representation outside the library has changed as well since it uses the same reference instance.