ARX: Advanced Resource files
Prev		Next

4. Uses of ARX

In this section the general usage of ARX and ARX archives has to be defined. Currently I do this as a matter of requirements engineering in order to define the extend of the capabilities to be captured by ARX and where to draw the line between the ARX concept and additional concepts provided by an API like Arxx.

4.1. Scenarios

Before we can dive into the syntactically exactness of UML use cases, which we don't know yet, let's try to cover the functionality of ARX in bigger terms. Use scenarios consist of a number of use cases which are not yet really defined and are given in a verbal and practical way so you can better understand what every aspect is meant to acomplish.

The didactic approach in this section is to set up a very basic usage example and expanding it incrementally to be ever more potent and hence more complicated.

This section of usage scenarios contains almost everything in ARX. You will stumble across each and every functionality that ARX offers and will see where it comes from. But only in the later sections of this documentation you will read a detailed and verbose explanation of these concepts. So this section may be regarded as a jump start for quick starters and a good place to start learning about ARX.

4.1.1. General layout of scenarios

A scenario may contain various parts regarding different aspects of the functionality in dicussion. What these part are about is described here shortly.

4.1.1.1. Subtitle

A very short description of the setting with some catch words to roughly sketch the extent of the scenario.

4.1.1.2. Setting

This part tries to explain the catch words from the subtitle. For each word a description is given how it influences the user's work with an ARX API.

4.1.1.3. Motivation

Sometimes it may be hard to understand why a certain feature is necessary at all. If you have doubts about the relevance of a feature please take a close look at the Motivation because sometimes the feature may not be selfsatisfactory or you don't even see a point in the Motivation. In this case we have a place to start discussion: the Motivation.

4.1.1.4. Considerations

Often it is necessary to think about a later stage of usage of ARX. This part is not about the development of ARX and its functionality but about scalability trying to see problems where are none yet even trying to broaden the view of the reader. Often Considerations starts with the assumption that some part of ARX gets very large due to high usage and what happens to performance then.

Quite possibly considerations should be dealt with in separate scenarios so they don't confuse the current line of thought and eventually will be so.

4.1.2. Scenario 1

A single local ARX archive as a flat id'ed resource storage.

4.1.2.1. Setting

This use scenario exhibits some basic and standard functioning of ARX interfaces and implementations. First of all this scenario is concerned with a single local file. This file we must be able to create, load, modify and save.

Secondly this scenario presents the resources in a flat way, meaning that no structure like directories or such is imposed on the data. The resource items are lying one next to the other in memory.

The resource items can be identified by a unique ID: an unsigned 32 bit long integer value.

There is one more hint in this scenario which, to the knowing reader, is easy to overlook. The item structures are merely the common overhead to general data buffers which can be filled with whatever data you like. So text, images or textures, music or samples, 3D models or heightmaps - you could even take an ARX archive and stuff it into an item. At this point it is interesting to notice that the data may even contain the unique IDs of other items thus allowing some kind of referencing or symlinking of data. Using this approach you could even structure the data yourself by creating items which only contain a list of unique IDs and call them directories but this method is partly discouraged, since data has no immediate semantic meaning to any ARX API and thus the APIs cannot help you with browsing the ARX archive in an object oriented way because they do not realize that the data is indeed a list of item references.

4.1.2.2. Motivation

Hm, if you don't see it I can't help you. Maybe ARX is not for you? :)

4.1.2.3. Implications

ARX APIs have to be able to save and load a local ARX archive at a given path.

A library must have means to resolve the relation between items and their unique IDs. Since this relation is equivalent two mappings have to be resolved:

Item -> Unique ID: Every item knows its own unique ID.
Unique ID -> Item: The library can return an Item with a certain ID.

Any ARX API implementation has to ensure that only one item with a certain unique ID exists in an ARX archive.

Strictly speaking it is not a requirement but in order to prevent orphaned items the library has to assign a unique ID to every item.

There is no restriction on the types of data to be stored in an ARX archive.

4.1.2.4. ARX archive file format details

ARX archive have a certain format that, in order to support later thoughts and arguments, I will roughly sketch here. An ARX archive consists at this point of thought of two parts: archive header, items.

The archive header contains at least a version number identifying the general file format of the archive file and the count of items in the archive.
Items contain the unique ID and the data. Additionally, since the data is completely arbitrary the item also needs a field specifying the length of the data.

There is no padding between the archive header and the item data following immediately. Hence there is no need to add an offset where item data starts to the archive header.

4.1.2.5. Implementation Implications

To assemble a Library with a number of Items it is imperative to have the ability to add Items to a Library. This process, in order to not use a word as general as add, is instead called registering an Item at a Library. In analogy unregistering is the process of removing an Item from a Library.

Every Item is either in exactly one Library or in no Library.

The unique ID is a property of the Item and therefore it must supply functions to get and set this unique ID.

Regarding the implementation it is convenient to have a specified "invalid" unique ID (0xFFFFFFFF).

There is one more reason why register is a better word for adding an Item to a Library: Add does not necessarily leave room for failure which Register does somewhat more. The registering can actually fail if you try to register an Item with a unique ID that already exists in the Library.

The uniqueness of a certain ID can only be evaluated by the Library that contains all the Items in question. So if you want to set an Item's unique ID you not only need the Item which to modify but also the Library to assert the uniqueness of this new unique ID. To realize the least possible impact on the surrounding application or library software design, therefore every Item has to know which Library it belongs to if any. Since this relation is optional and Items do not necessarily have to be in a Library the library reference is a pointer.

To resolve the mapping Unique ID -> Item and keep runtime at a reasonable level inside the library the items are stored in a container ordered by the items' unique ID.

Arxx implementations have to supply functions to safely handle the data of an Item. This includes a function to get the length of the associated data. All these functions concern items only and are therefore methods of an Item.

4.1.2.6. Implementation Details

4.1.2.6.1. Setting an item's unique ID

This is the very first issue that is automatically handled by Arxx implementations to provide a very basic idea of "structure". Since Arxx implementation users should not be dependant on arbitrary IDs to identify the items in an ARX archive, a facility has to be provided to set the unique ID to a certain value. This functionality should be supplied by the Item since the unique ID is a property of the Item.

It is important to note that every Item has to have a unique ID. These IDs are in fact the only way to properly distinguish two items in a library even if everything else is identical because there can be no two Items having the same unique ID and any Item belonging to an Archive may not have an invalid unique ID.

Of course an Item that does not belong to an Archive can have any unqiue ID as it is the only item in its context. Items out of an Archive may also have an invalid unique ID (0xFFFFFFFF).

These conditions imply the following: An Archive has to guarantee that every Item it contains is assigned a unqiue ID. To conceive a consistent behavior pattern for this lets look at changing unqiue IDs.

There are two situations in which the unique ID can change:

The Item does not belong to an Archive.
In this case the process is quite easy as every passed ID is unique in the items (emtpy) context. We can just set the unqiue ID and be done with it.
The Item is already member of an Archive.
In this case only the Archive that the Item in question belongs to is able to decide on the uniqueness of the desired new ID. This implies at the very least that the library has to be asked to change the ID and update its sorted container of Items. Currently this is done by unregistering the item at the Archive, changing the unique ID to the desired value and reregistering the Item at the Archive.

4.1.3. Scenario 2

A single large local ARX archive as a flat id'ed resource storage.

4.1.3.1. Setting

In this scenario basically everything stays the same but we try to cope with a library of a size that makes it impossible to load all the data into memory at one time.

To cope with such libraries we decouple the points in time where we load the library and the items' data. By doing so ARX API implementations is extended by the ability to load data on demand.

4.1.3.2. Motivation

Imagine an ARX archive storing chunks of audio data. It is easy to see that such a library can become very large if you store thousands of pieces of music or speech even if each is very small. What I am talking about are archives of a size of several giga bytes. It is easy to see that there is no reliable way to load such a vast archives in one go because memory would be consumed at a horrible rate. However you rarely need giga bytes of data at one time so it might be reasonable to make the loading of libraries slightly more intelligent by skipping the data parts and only reading the item headers as meta information. I will refer to this idea as delayed data fetching.

4.1.3.3. Implications

This new functionality basically is meant to concern the programmer only and should not change the way ARX archives are built or how applications using ARX API implementations behave. However there is one point that actually concerns the enduser of programs or libraries employing ARX. Since there is an arbitrary amount of time between the loading of the library and the loading of an item's data from the same file we have to think about what could happen to the file in the meantime.

Basically there is one problem that might occure when trying to access the ARX archive a second time: the data may not be at the location we require it to be. A changed file is even worse than a deleted or unreadable one but the result is the same in that we don't get what we want.

As I understand it, UNIX system provide a way to work around this by keeping the file open for reading. I don't know about other operating system so here is some room for discussion. So from a operation system point of view we keep the file open and the rest is implementation.

4.1.3.4. Implementation Implications

Fetching the data on demand is, adhering to the design principles, not transparent in usage meaning that you have to do it by hand calling an item's FetchData function.

Since this function may be called from anywhere in the surrounding application FetchData doesn't need any parameters to call as this would lead to enforcing a specific way of software design around the ARX API.

If the application needs to touch every item's data in a library you end up calling FetchData on every item and memory savings would be lost completely. Therefore data can be unfetched with the UnfetchData function of the Item which in fact is only a fancy name for simply throwing the data away.

Using UnfetchData must not make the data inaccessible thereafter as there may be the need to access the data again. Therefore refetching of data is possible. In conclusion this means that all the information for fetching the data must remain intact over the lifetime of the Item.

In order to let the surrounding application decide whether to call FetchData or UnfetchData in a certain situation another function has to be added about whether the data has already been fetched or not.

Now there is another problem with keeping the file open. In order to correctly close the file again we need a structure which holds the necessary data to do so.

The first idea could be to let the Library handle this structure or keeping the file open itself. Every Item asked to fetch its data could then propagate the task to its corresponding Library. But as noted in Scenario 1 the association between a Library and an Item should be regarded logical only, not functional as in this case. This becomes obvious if we extend our example by unregistering the Item from the Library. Then the Item has no corresponding Library to pass the task of fetching the data to and FetchData had to fail.

The second idea is to copy the structure with information necessary to access the data in the open file and pass it to the Item. However the question is then: When is the file closed by whom? Obviously the last Item to access its data should close the file after it is done. However that is not possible since this last Item does not know that it is indeed the last one. So what we need is a global reference counter for the file structure and on the way we can put the file structure itself in the global structure so we don't have to copy it.

Since there is nothing to be done for the library once it has finished the loading process the Library does not have to reference the file structure.

So one set of questions remains: Who creates, manages and deletes the global file structure? Obviously the structure has to be filled with meaningful information in the loading function of the Library and then passed to the global scope to enable global management.

4.1.3.5. Implementation Details

When reading the item data together with the item header we don't have to worry about the current file position since we do one item after the other and the file position is always right for the current item. However with delayed data fetching we lost this self-enforcing context and have to preserve the file position from which to read the data.

Since an Item can be unregistered from the Library, only the LocalLibraryDataFetcher or the Item itself can store this additional information. However the LocalLibraryDataFetcher is meant to serve as a data fetcher for more than one particular Item this piece of information has to be stored in the Item structure.

4.1.4. Scenario 3

A single local ARX archive as a flat id'ed external local resource reference.

4.1.4.1. Setting

This example only adds one new aspect to ARX but quite a big one. Here the actual data is not contained in the archive hence the use of the word reference instead of storage. The ARX archive merely refers to the data by pointing to the locations where it can be found and in this way acts as a meta data accumulator giving additional information about a set of data items.

4.1.4.2. Motivation

This usage of ARX is motivated by three ideas:

Using this method of external data you have the possibility to keep data like icons or help files out of the ARX archive to enable non-ARX applications to access them easily. An application's icon which has to be in local file space to be usable by desktop managers can also be used inside the application as window icon. Another example are configuration files under /etc which can be edited by hand but can also be imported into the item tree that is spanned by an ARX archive.
Secondly this helps packagers of ARX packed application data. If one chunk of data changed in the ARX archive you'd have to redistribute the whole archive instead of only the part that changed. Using this mechanism you only have to redistribute the one file that actually changed. Note, however, that at a later point of ARX development ARX archives can be merged and thus this point is not completely valid.
The third, and possibly most important, point is that this is mainly done in preparation for a later, much extended version of external data positioning.

4.1.4.3. Considerations

If you imagine a rather large application having all of its thousand icons in subfolders somewhere below "/usr/local/share/applicationname/icons/" the ARX archive would contain thousand copies of this locator string accumulating to at least 999 times 39 unnecessary bytes. It would be good to have a simplifiying algorithm packing all these duplicates into one occurence and only saving the parts that actually differ. This could be done applying an approach of nested locators. Saving the location "/usr/local/share/applicationname/icons/" once as a location and giving it a certain ID (0x01) you could create additional locations which are relatively below this location, like "stock/ok.png" and "cursors/arrow.png". However, taking for granted that "ok" is not the only stock icon, you would rather have one more phony location "stock/" (0x02) which is below location 0x01 and two locations "ok.png" and "cancel.png" which are below (0x02). This way you could build the whole tree of external resource locators in a tree structure which's inner nodes are the greatest common nested locator strings and which's leafs are actual resource locators relative to the concatenation of all phony locations above.

4.1.4.4. Implications

In a general way this means that an item's data source is to be held variable. Instead of having the data simply appended to the item it should also be possible to specify the location where the data can be retrieved from.

4.1.4.5. ARX archive file format details

The requirement of the external resource locators enforce a more general format of the ARX archive. Archives should now be devided into three general parts: Archive header, item headers and data.

The library header remains unchanged. Its structure is defined in use scenario 1.
The next part contains all the items of the file but only their meta information. That is according to use scenario 1 the unique ID and the length of the data. Additionally it contains an offset into the data part of the ARX archive where the data can be read. The unsigned offset is relative to the start of the data section.
The data part is devided into three sections: data section header, external data locators, internal data
- The data section header contains only the length of the external data locators section.
- The section on external data locators consists of a list of locator structures which contain at least three bits of information: a unique locator identifier, the locator string and the unique identifier of the parent locator or 0xFFFFFFFF if the locator is the root node.
- The section of internal data is just a concatenation of all the data that is saved inside the ARX archive.
Since the length of the external data locators section is known explicitely an item's data offset into the data section decides whether the data is to be read directly from the archive or from an external source which the resolved locator at this offset points to.

4.1.4.6. Implementation Implications

First of all there is the necessity to set up an ARX archive in such a way. So what ARX API implementations need is a method of setting the data location for an item and this is a function of the Item obviously. It requires you to pass the locator string for the data.

It is only sensible then to also add a function for getting the data location string.

Additionally there is a need to be able to retrieve the data from the external source and according to the Design Principles this has to be an explicite call on the Item.

So in order to let the application choose whether to call the function to fetch the external data from the location a function has to be added about whether the data is already fetched or not.

With respect the considerations from scenario 1 there is a possible way to help the surrounding application and the user to keep memory usage at a reasonable level. The idea is to unfetch data which is another function for the item.

This implies that the data locator string is to be kept at hand even if the data has already been fetched. For that matter any structure or data concerning the external data fetching should be available throughout the entire lifetime of an Item.