Workflow for mod installation

Rough workflow

  1. simod init (run this only once): creates the database and fills it using data from the Infinity Engine files (key/bif and existing override files).
  2. simod add *target* (run this one for each mod component): installs a mod component by running the mod script to modify the database.
  3. simod save (run this once the database has been modified): compiles the changes from the database back to the game directory.

See simod --help for (slightly) more details.

Current status

This tool is at the “proof-of-concept” stage.

Since a lot of the infrastructure is still being built, its capabilities are very restricted: it can only access and edit game items.

API for modders

TODO.

Resource identifiers and namespacing

In-game resources are identified by “resrefs”, which are 8-byte ASCII strings. These resrefs are also used as file names in the override directory, which puts a number of additional constraints on allowed characters: namely, the characters "+<>/\|?*: are forbidden.

Inside the simod database however, resources are identified by arbitrary strings (hereafter designated as “longrefs”).

Namespacing

To protect from resource conflicts, mod components generally do not access them globally.

To each mod component is attached a namespace, in the form of a string. Whenever a mod accesses a resource identifier R it is translated to a longref in the following way, assuming that N is the current namespace.

  1. if the name R does not contain a slash character, then the namespace N is prepended: the longref is thus "N/R";
  2. if R contains a slash, then it is assumed to be a fully-qualified longref.
  3. as a special case, resources from the base game are accessible using the empty namespace, e.g. as "/sw1h01".

These rules allow referring to either a base game resource (case 3) or a resource from another mod (case 2), while still providing namespace separation by default (case 1).

Namespacing and storage

When stored inside the database, longrefs are stored as fully-qualified strings (in the form "namespace/identifier"). Original game resrefs are stored in their original form as 8-byte strings.

Implicit resref access

TODO: the Lua interface has the following feature: whenever a structure field is of the “resref” type, it is possible to assign a full structure to this field; this will result in the assignment being made with the reference to this structure instead.

For instance,

small_sword = simod.item("/sw1h01");
store.inventory:push(small_sword); -- only pushes the resref "/sw1h01"

Conversion to game format

The resource identifier strings are converted to the resref format when the database is saved to the filesystem.

More precisely, the resref_dict table contains a dictionary between string identifiers and resrefs. This table is filled by triggers: whenever a game value pointing to a resource is modified, the values ("long_resource_name", null) are inserted. As part of the save operation, The compiler takes care of replacing all null fields by game-unique resref as needed.

(These resrefs are deduced from the long resource name by truncating and enumerating).

Internal representation of resrefs from the base game

Internally, resrefs from the base game are imported untouched as longrefs (no slash character is prepended; rule 3. above actually removes the slash character). This means that longrefs without a slash always designate base resources, while longrefs with a slash always designate mod-owned resources.

String references

In-game strings are collected in one or two files, dialog.tlk and (depending on user language) dialogF.tlk, and referred to by 4-byte integers (“strrefs”) in game structures.

In the database, strings are identified by “native strings”, which are arbitrary strings. (The native string is usually the game string itself in the mod's native language).

Namespacing

Since game strings are not used as references, namespacing rules do not apply for native strings. The only rule is that strings from different namespaces will never be merged.

Conversion to strref

The strref_dict table is the dictionary between native strings and strrefs.

The new_strings view is a list of all the native strings introduced by currently installed mods, together with their game flags (TODO explain why the flags).

As a part of the save operation, the compiler rebuilds strref_dict from the list of strings present in new_strings. This procedure happens in two steps:

  1. entries absent from new_strings and entries with a too-high strref value are purged from the dictionary,
  2. then entries form new_strings are inserted in strref_dict, each one successively using the lowest available strref.

For step 1: Let \(C\) be the number of constant strrefs (e.g. \(C=34000\) for a BG1EE install) and \(S\) be the number of entries in new_strings. Then the new resrefs need to be allocated in the interval \([0,C+S-1]\). This means that entries where \(\mathtt{strref} \geq C+S \) need to be purged from the strref_dict table.

The string_keys view is used for sequential generation of strrefs.

Design goals

Obviously, there already exists a perfectly fine tool for IE modding. However, starting from scratch allows us to design with a number of useful properties in mind.

Robustness

Replacing the mod stack by a proper database

The database offers at any given time a coherent view of all currently-installed mods. Uninstalling a single mod can be done by running a somewhat simple (set of) SQL DELETE statement(s). In particular, this is very fast (quasi-constant time w.r.t the number of mods installed) compared to WeiDU's stack model (when modifying a mod deep down the stack requires recomputing the whole stack, which has O(n) cost).

Easier conflict detection between mods

With whole access to the database it becomes trivial to detect when two mods are trying to access the same resource.

(This is still TODO however; mostly, we need to fix an interface about what to do in the case of conflict).

Namespacing

Identifiers for game resources and strings are abstracted as strings and namespaced per mod component. This completely removes the need for using mod prefixes and fitting names in 8 bytes (minus the prefix). (On the other hand, the namespace model still allow access to original game resources, and even resources from other mods, when this is really needed).

Moreover, this also circumvents a number of “bad behaviours” by mod authors, such as fully overwriting a game file or using inconsistent case for file names.

Translations

This tool uses .po files for string translations. This format is easy to edit; a number of free and open source tools exist, and even a plain text editor will work in most cases. This format has also proven to be quite robust e.g. when strings evolve between versions of software. (This is in contrast with WeiDU's .tra files, which are very brittle: a single missing translation when a mod is updating will crash the component).

TODO: the translation manager also contains a number of features making it easy to annotate syntactically ambiguous sentences (e.g. “Guard” may be either a verb or a noun in English; both cases have different translations in most languages).

Portability

The tool is mostly written in Rust, which takes great pains to be as portable as possible; and mod scripts written in Lua should be portable by construction.

In particular, it is a design goal to prevent mod authors from needing to run shell scripts or batch files (which is a nightmare from a maintenance POV).

Ease-of-use

Mod writing in SQL or Lua

This tool offers two levels of API for accessing the database. The first level is plain SQL given by the description of the database; for instance, UPDATE "items" SET "enchantment"=5 WHERE "itemref"='sw1h34' is a perfectly valid mod and will update Albruin's enchantment.

However, it is expected that most of mods will use the higher level represented as Lua scripts. Indeed, this tools offer the option to run a Lua script in an environment where a simplified API to the database is exposed. For example, the following code has the same effect as the SQL statement above:

albruin = simod.item("sw1h34")
albruin.enchantment = 5

Lua is a easy-to-use programming language (and definitely easier to handle for a beginner than WeiDU); moreover, it is already the language used in some parts of the games themselves.

The SQL interface also allows authors to write mods in any language containing a library for SQL access.

Mod manager

TODO: define a common API from the Lua side for describing a mod + metadata (author, description, compatibility list) and write on the an interactive mod selection tool which uses this mod database.

Performance

Any single mod installation only accesses the SQLite database; thus all work is deferred to SQLite, which is quite fast. Access to the game files is done only once when compiling the full mod database to the override directory. Moreover, this tool supports differential compilation: only those files which did change since the previous compilation will be regenerated.

Internals: Database structure

ALL STRINGS ARE UTF-8. No exceptions; we live in the 21st century.

The exposed interface for accessing game objects is through a number of views:

  • items, item_abilities, item_effects for game items,
  • (other views are TODO: we are building high rather than wide for now).

These views implement all the infrastructure necessary for inserting and updating game objects; in case more detail is needed, see the “Internals” section below.

This implies that game modding can be performed directly as SQL queries on a small number of tables with structure mirroring that of game files. An ad-hoc library is also being built to make this comfortable for mod authors (TODO).

Resource view

For each resource X:

  • X is the user-facing view of all resources (original and modded). This is the
  • load_X is the table of all original resources;
  • add_X is the table of all mod-inserted resources;
  • edit_X is the table of all mod changes on this resource;
  • save_X is the view used for saving game resources.

In general, mods should only interact with the main view X. The structure of all other views listed here is unstable.

The columns of the table X are the following:

  • the primary key is always called "id"; for the top-level resources, it is the resource identifier, while for sub-resources this is a numeric key;
  • for sub-resources only, a column called "parent", which refers to the primary key of the parent resource, followed by a column called "position", which is used as a sort key for collecting sub-resources;
  • then all “payload” fields as described in e.g. IESDP. All the fields describing sub-resources (offset, count etc.) are removed from this list, since sub-resources are described in their own tables.

For top-level resources, a few additional tables are used to mark their status in the database with respect to the override directory:

  • dirty_X is the table listing all resources which have been modified and which need to be recompiled to override;
  • orphan_X is the list of all resources which have been removed from the database, but not yet from the override directory.

A (large) number of triggers are attached to the main view X:

  • attempts at modifying X are propagated back to the appropriate table (either add_X or edit_X);
  • at the same time, modifying X records the resource as dirty in dirty_X;
  • deleting entries from X marks resources as orphan in orphan_X.

The load_X table always contains exactly the resources found in key/bif and pre-existing override files. This table is never touched again after it is built by simod init.

Game lists (IDS files)

TODO.

Game tables (2DA files)

TODO.

Scripts

TODO.

Binary resources

TODO:

Binary resources generally do not need concurrent access between various mods and are not handled by the database. Where a resref pointing to a binary resource is expected, the database uses instead a string referring to a file in the filesystem.

When the database is saved to the filesystem, the filename for these resources is translated to a resref using the general algorithm; this resref in turn gives the name of the override file under which the resource will be saved.

TODO

Special cases:

  • dialog,
  • IDs,
  • 2da,
  • script (de)compiling,

Translations

Language identifiers

Languages are represented by their 5-letter name (as in "en_US") or, for female variants, by their 6-letter name (as in "fr_FRF").

Inside the database, female variants are handled as distinct languages from male variants; however a number of rules exist

(The 5- or 6-letter language identifier is lossless from the game's "dialog.tlk" path; this simplifies backups etc.)

Translations

String translation is handled through the use of the translations_X table, where X is a 5- or 6- language identifier. This table contains data similar to the game's .tlk file, except that it is indexed by native strings instead of strrefs.

Any native string absent from this dictionary is left untranslated; only its markers (see below) are discarded. This is intended as a sane default value where the player, when a string has not (yet) been translated, will see the string in the original language, which is most often English.

Markers

Native strings can contain markers of the following form: {?text}. Such markers are discarded when a native string is used as the default value for a game string in the absence of an appropriate translation.

These markers, however, are seen by translators when translating the native string to another game language; most tools will even highlight them (these look like Python formatting parameters). It is thus strongly recommended to include markers in native strings to signal any possible grammatical ambiguity including at least the following cases:

  • distinguishing verbs from nouns etc.: "{?verb}guard" vs. `"{?noun}guard";
  • marking grammatical gender where it is not obvious: e.g. "come here, my dear {?male}friend".

Keep in mind that translators, when translating a string, will generally not have access to context beyond the string itself. Make liberal use of markers to help them produce quality work.

Gendered languages

When producing translations for gendered languages, the general case is that a single translation will be produced for both gender variants.

Gender markers

The empty marker {?} is a special case. Including this marker in a native string marks this string as needing two translations for gendered languages. This means that, while a single native string is present in the source file, the translator will be prompted to translate two strings, where the empty marker will be replaced by either {?M} or {?F}.

This marker has no special meaning for non-gendered languages; in this case the translator will see only a single string, still bearing its empty marker.

Conversion to strref

Conversion between native strings and strref is performed iby the strref_dict table in a way similar to the resref_dict table. The algorithm for generating new strrefs is of course different (the lowest available value is used).

Core functions in the LUA API

The simod library contains the lowest-level API exported to Lua mod scripts. The remainder of the API is built on top of these functions.

Most of the functions in this library take, as their first parameter, a string table containing the name of one of the game's resource tables ("items", "item_abilities" etc.). String matching is case-sensitive.

simod.list(table, [parent]): list primary keys in a table

If table describes a top-level resource (e.g. "items") then the [parent] value is not allowed. This will return an array containing the list of all primary keys appearing in this table.

If table describes a sub-resource (e.g. "item_abilities") then [parent] is a primary key for the table's parent; this will return a list of all sub-resources whose parent attribute matches this primary key.

If no rows match the query, then an empty table is returned.

simod.insert(table, row): insert a new game object

row is a table containing the row to be inserted, as key-value pairs; the keys are strings matching the column headers for this table (with the same case).

This returns the number of lines inserted. If the row does not match the format for this table then an error is thrown.

simod.get(table, fieldname, primary): read a single field entry

This returns the value of column fieldname on the line where the primary key is primary.

If no such line (or column) exists then an error is thrown.

simod.select(table, primary): read one row in a table

This returns the content of the row with primary key primary in the table, as a Lua table whose keys are strings corresponding to the table's column names.

If no row with the given primary key exists, then an error is thrown.

simod.set(table, fieldname, primary, value): modify an entry in a table.

This updates the row with primary key primary in the given table, setting the field with column name fieldname to the given value. The fieldname must be a string corresponding to one of the table's column names (otherwise an error is thrown).

This returns a boolean value, which is true if a row was updated and false if no row with the given primary key was found.

simod.delete(table, primary): delete an entry in a table.

This deletes the row with the given primary key from the table.

This returns a boolean value, which is true if a row was deleted, and false otherwise.

Note that rows from the base game (those present in the load_* tables) currently cannot be deleted, since the data they represent is stored in the BIF files and not in the override directory.

simod.schema

This contains the description for the format of the game resource tables. For example, the entry simod.schema.items contains the description of items, as the following fields:

  • simod.schema.items.fields: list of fields and types, as key-pair values;
  • simod.schema.items.is_subresource: nil since this is not a subresource (otherwise true);

Representation of objects for the Lua API

Resources

Resources are stored as a table {_table, _key}, where _table is a string containing the name of the SQL table to which the resource is attached, and _key contains the primary key for the row mapped to the resource.

Access to the contents of the resource via resource.field is overloaded to a function which returns the value currently found in a database. Thus, the value of resource.field will remain up-to-date even if a SQL operation modified the contents of the database.

Resource vectors

Resource vectors (e.g. the list of abilities of an item) are stored as a table {_table, _parent}, where _table is a string containing the name of the SQL table containing this kind of resource (e.g. "items_abilities") and _parent is the primary key of the parent resource.

Accessing the i-th entry of this resource vector triggers a SQL request which retrieves the corresponding primary key.