Workflow for mod installation
Rough workflow
simod init
(run this only once): creates the database and fills it using data from the Infinity Engine files (key/bif and existing override files).simod add *target*
(run this one for each mod component): installs a mod component by running the mod script to modify the database.simod save
(run this once the database has been modified): compiles the changes from the database back to the game directory.
See simod --help
for (slightly) more details.
Current status
This tool is at the “proof-of-concept” stage.
Since a lot of the infrastructure is still being built, its capabilities are very restricted: it can only access and edit game items.
API for modders
TODO.
Resource identifiers and namespacing
In-game resources are identified by “resrefs”, which are 8-byte ASCII
strings.
These resrefs are also used as file names in the override directory,
which puts a number of additional constraints on allowed characters:
namely, the characters "+<>/\|?*:
are forbidden.
Inside the simod
database however, resources are identified by
arbitrary strings (hereafter designated as “longrefs”).
Namespacing
To protect from resource conflicts, mod components generally do not access them globally.
To each mod component is attached a namespace, in the form of a string.
Whenever a mod accesses a resource identifier R
it is translated to a longref in the following way,
assuming that N
is the current namespace.
- if the name
R
does not contain a slash character, then the namespaceN
is prepended: the longref is thus"N/R"
; - if
R
contains a slash, then it is assumed to be a fully-qualified longref. - as a special case, resources from the base game are accessible
using the empty namespace, e.g. as
"/sw1h01"
.
These rules allow referring to either a base game resource (case 3) or a resource from another mod (case 2), while still providing namespace separation by default (case 1).
Namespacing and storage
When stored inside the database,
longrefs are stored as fully-qualified strings
(in the form "namespace/identifier"
).
Original game resrefs are stored in their original form
as 8-byte strings.
Implicit resref access
TODO: the Lua interface has the following feature: whenever a structure field is of the “resref” type, it is possible to assign a full structure to this field; this will result in the assignment being made with the reference to this structure instead.
For instance,
small_sword = simod.item("/sw1h01");
store.inventory:push(small_sword); -- only pushes the resref "/sw1h01"
Conversion to game format
The resource identifier strings are converted to the resref format when the database is saved to the filesystem.
More precisely, the resref_dict
table contains a dictionary
between string identifiers and resrefs.
This table is filled by triggers: whenever a game value
pointing to a resource is modified,
the values ("long_resource_name", null)
are inserted.
As part of the save
operation,
The compiler takes care of replacing all null
fields by
game-unique resref as needed.
(These resrefs are deduced from the long resource name by truncating and enumerating).
Internal representation of resrefs from the base game
Internally, resrefs from the base game are imported untouched as longrefs (no slash character is prepended; rule 3. above actually removes the slash character). This means that longrefs without a slash always designate base resources, while longrefs with a slash always designate mod-owned resources.
String references
In-game strings are collected in one or two files, dialog.tlk
and
(depending on user language) dialogF.tlk
,
and referred to by 4-byte integers (“strrefs”) in game structures.
In the database, strings are identified by “native strings”, which are arbitrary strings. (The native string is usually the game string itself in the mod's native language).
Namespacing
Since game strings are not used as references, namespacing rules do not apply for native strings. The only rule is that strings from different namespaces will never be merged.
Conversion to strref
The strref_dict
table is the dictionary
between native strings and strrefs.
The new_strings
view is a list of all the native strings
introduced by currently installed mods,
together with their game flags (TODO explain why the flags).
As a part of the save
operation,
the compiler rebuilds strref_dict
from the list of strings present in new_strings
.
This procedure happens in two steps:
- entries absent from
new_strings
and entries with a too-highstrref
value are purged from the dictionary, - then entries form
new_strings
are inserted instrref_dict
, each one successively using the lowest available strref.
For step 1:
Let \(C\) be the number of constant strrefs
(e.g. \(C=34000\) for a BG1EE install)
and \(S\) be the number of entries in new_strings
.
Then the new resrefs need to be allocated in the interval
\([0,C+S-1]\).
This means that entries where \(\mathtt{strref} \geq C+S \)
need to be purged from the strref_dict
table.
The string_keys
view is used for sequential generation of strrefs.
Design goals
Obviously, there already exists a perfectly fine tool for IE modding. However, starting from scratch allows us to design with a number of useful properties in mind.
Robustness
Replacing the mod stack by a proper database
The database offers at any given time a coherent view of all
currently-installed mods. Uninstalling a single mod can be done by
running a somewhat simple (set of) SQL DELETE
statement(s).
In particular, this is very fast (quasi-constant time w.r.t the number of
mods installed) compared to WeiDU's stack model (when modifying a mod
deep down the stack requires recomputing the whole stack, which has O(n)
cost).
Easier conflict detection between mods
With whole access to the database it becomes trivial to detect when two mods are trying to access the same resource.
(This is still TODO however; mostly, we need to fix an interface about what to do in the case of conflict).
Namespacing
Identifiers for game resources and strings are abstracted as strings and namespaced per mod component. This completely removes the need for using mod prefixes and fitting names in 8 bytes (minus the prefix). (On the other hand, the namespace model still allow access to original game resources, and even resources from other mods, when this is really needed).
Moreover, this also circumvents a number of “bad behaviours” by mod authors, such as fully overwriting a game file or using inconsistent case for file names.
Translations
This tool uses .po
files
for string translations.
This format is easy to edit; a number of free and open source tools
exist, and even a plain text editor will work in most cases.
This format has also proven to be quite robust e.g. when strings evolve
between versions of software.
(This is in contrast with WeiDU's .tra
files, which are very brittle:
a single missing translation when a mod is updating will crash the
component).
TODO: the translation manager also contains a number of features making it easy to annotate syntactically ambiguous sentences (e.g. “Guard” may be either a verb or a noun in English; both cases have different translations in most languages).
Portability
The tool is mostly written in Rust, which takes great pains to be as portable as possible; and mod scripts written in Lua should be portable by construction.
In particular, it is a design goal to prevent mod authors from needing to run shell scripts or batch files (which is a nightmare from a maintenance POV).
Ease-of-use
Mod writing in SQL or Lua
This tool offers two levels of API for accessing the database.
The first level is plain SQL given by the description of the database;
for instance, UPDATE "items" SET "enchantment"=5 WHERE "itemref"='sw1h34'
is a perfectly valid mod and will update Albruin's
enchantment.
However, it is expected that most of mods will use the higher level represented as Lua scripts. Indeed, this tools offer the option to run a Lua script in an environment where a simplified API to the database is exposed. For example, the following code has the same effect as the SQL statement above:
albruin = simod.item("sw1h34")
albruin.enchantment = 5
Lua is a easy-to-use programming language (and definitely easier to handle for a beginner than WeiDU); moreover, it is already the language used in some parts of the games themselves.
The SQL interface also allows authors to write mods in any language containing a library for SQL access.
Mod manager
TODO: define a common API from the Lua side for describing a mod + metadata (author, description, compatibility list) and write on the an interactive mod selection tool which uses this mod database.
Performance
Any single mod installation only accesses the SQLite database; thus all work is deferred to SQLite, which is quite fast. Access to the game files is done only once when compiling the full mod database to the override directory. Moreover, this tool supports differential compilation: only those files which did change since the previous compilation will be regenerated.
Internals: Database structure
ALL STRINGS ARE UTF-8. No exceptions; we live in the 21st century.
The exposed interface for accessing game objects is through a number of views:
items
,item_abilities
,item_effects
for game items,- (other views are TODO: we are building high rather than wide for now).
These views implement all the infrastructure necessary for inserting and updating game objects; in case more detail is needed, see the “Internals” section below.
This implies that game modding can be performed directly as SQL queries on a small number of tables with structure mirroring that of game files. An ad-hoc library is also being built to make this comfortable for mod authors (TODO).
Resource view
For each resource X
:
X
is the user-facing view of all resources (original and modded). This is theload_X
is the table of all original resources;add_X
is the table of all mod-inserted resources;edit_X
is the table of all mod changes on this resource;save_X
is the view used for saving game resources.
In general, mods should only interact with the main view
X
. The structure of all other views listed here is unstable.
The columns of the table X
are the following:
- the primary key is always called
"id"
; for the top-level resources, it is the resource identifier, while for sub-resources this is a numeric key; - for sub-resources only, a column called
"parent"
, which refers to the primary key of the parent resource, followed by a column called"position"
, which is used as a sort key for collecting sub-resources; - then all “payload” fields as described in e.g. IESDP. All the fields describing sub-resources (offset, count etc.) are removed from this list, since sub-resources are described in their own tables.
For top-level resources, a few additional tables are used to mark their status in the database with respect to the override directory:
dirty_X
is the table listing all resources which have been modified and which need to be recompiled tooverride
;orphan_X
is the list of all resources which have been removed from the database, but not yet from theoverride
directory.
A (large) number of triggers are attached to the main view X
:
- attempts at modifying
X
are propagated back to the appropriate table (eitheradd_X
oredit_X
); - at the same time, modifying
X
records the resource as dirty indirty_X
; - deleting entries from
X
marks resources as orphan inorphan_X
.
The load_X
table always contains exactly the resources found in
key/bif
and pre-existing override files.
This table is never touched again after it is built by simod init
.
Game lists (IDS
files)
TODO.
Game tables (2DA
files)
TODO.
Scripts
TODO.
Binary resources
TODO:
Binary resources generally do not need concurrent access between various mods and are not handled by the database. Where a resref pointing to a binary resource is expected, the database uses instead a string referring to a file in the filesystem.
When the database is saved to the filesystem, the filename for these resources is translated to a resref using the general algorithm; this resref in turn gives the name of the override file under which the resource will be saved.
TODO
Special cases:
- dialog,
- IDs,
- 2da,
- script (de)compiling,
Translations
Language identifiers
Languages are represented by their 5-letter name (as in "en_US"
)
or, for female variants, by their 6-letter name (as in "fr_FRF"
).
Inside the database, female variants are handled as distinct languages from male variants; however a number of rules exist
(The 5- or 6-letter language identifier is lossless from the game's
"dialog.tlk"
path; this simplifies backups etc.)
Translations
String translation is handled through the use of the translations_X
table, where X
is a 5- or 6- language identifier.
This table contains data similar to the game's .tlk
file,
except that it is indexed by native strings instead of strrefs.
Any native string absent from this dictionary is left untranslated; only its markers (see below) are discarded. This is intended as a sane default value where the player, when a string has not (yet) been translated, will see the string in the original language, which is most often English.
Markers
Native strings can contain markers of the following form: {?text}
.
Such markers are discarded when a native string is used as the default
value for a game string in the absence of an appropriate translation.
These markers, however, are seen by translators when translating the native string to another game language; most tools will even highlight them (these look like Python formatting parameters). It is thus strongly recommended to include markers in native strings to signal any possible grammatical ambiguity including at least the following cases:
- distinguishing verbs from nouns etc.:
"{?verb}guard"
vs. `"{?noun}guard"; - marking grammatical gender where it is not obvious:
e.g.
"come here, my dear {?male}friend"
.
Keep in mind that translators, when translating a string, will generally not have access to context beyond the string itself. Make liberal use of markers to help them produce quality work.
Gendered languages
When producing translations for gendered languages, the general case is that a single translation will be produced for both gender variants.
Gender markers
The empty marker {?}
is a special case.
Including this marker in a native string marks this string as
needing two translations for gendered languages.
This means that, while a single native string is present in the source file,
the translator will be prompted to translate two strings,
where the empty marker will be replaced by either {?M}
or {?F}
.
This marker has no special meaning for non-gendered languages; in this case the translator will see only a single string, still bearing its empty marker.
Conversion to strref
Conversion between native strings and strref is performed iby the
strref_dict
table in a way similar to the resref_dict
table.
The algorithm for generating new strrefs is of course different (the
lowest available value is used).
Core functions in the LUA API
The simod
library contains the lowest-level API exported to Lua mod
scripts. The remainder of the API is built on top of these functions.
Most of the functions in this library take, as their first parameter,
a string table
containing the name of one of the game's resource tables
("items"
, "item_abilities"
etc.).
String matching is case-sensitive.
simod.list(table, [parent])
: list primary keys in a table
If table
describes a top-level resource (e.g. "items"
) then
the [parent]
value is not allowed.
This will return an array containing the list of all primary keys
appearing in this table.
If table
describes a sub-resource (e.g. "item_abilities"
) then
[parent]
is a primary key for the table's parent;
this will return a list of all sub-resources whose parent attribute
matches this primary key.
If no rows match the query, then an empty table is returned.
simod.insert(table, row)
: insert a new game object
row
is a table containing the row to be inserted, as key-value
pairs; the keys are strings matching the column headers for this table
(with the same case).
This returns the number of lines inserted. If the row does not match the format for this table then an error is thrown.
simod.get(table, fieldname, primary)
: read a single field entry
This returns the value of column fieldname
on the line where the primary key is primary
.
If no such line (or column) exists then an error is thrown.
simod.select(table, primary)
: read one row in a table
This returns the content of the row with primary key primary
in the table, as a Lua table whose keys are strings
corresponding to the table's column names.
If no row with the given primary key exists, then an error is thrown.
simod.set(table, fieldname, primary, value)
: modify an entry in a table.
This updates the row with primary key primary
in the given table,
setting the field with column name fieldname
to the given value
.
The fieldname
must be a string corresponding to one of the table's
column names (otherwise an error is thrown).
This returns a boolean value, which is true
if a row was updated
and false
if no row with the given primary key was found.
simod.delete(table, primary)
: delete an entry in a table.
This deletes the row with the given primary key from the table.
This returns a boolean value, which is true
if a row was deleted,
and false
otherwise.
Note that rows from the base game (those present in the load_*
tables)
currently cannot be deleted,
since the data they represent is stored in the BIF files
and not in the override directory.
simod.schema
This contains the description for the format of the game resource tables.
For example, the entry simod.schema.items
contains the description of
items, as the following fields:
simod.schema.items.fields
: list of fields and types, as key-pair values;simod.schema.items.is_subresource
:nil
since this is not a subresource (otherwisetrue
);
Representation of objects for the Lua API
Resources
Resources are stored as a table {_table, _key}
,
where _table
is a string containing the name of the SQL table
to which the resource is attached,
and _key
contains the primary key for the row mapped to the resource.
Access to the contents of the resource via resource.field
is overloaded to a function which returns
the value currently found in a database.
Thus, the value of resource.field
will remain up-to-date
even if a SQL operation modified the contents of the database.
Resource vectors
Resource vectors (e.g. the list of abilities of an item)
are stored as a table {_table, _parent}
,
where _table
is a string containing the name of the SQL table
containing this kind of resource (e.g. "items_abilities"
)
and _parent
is the primary key of the parent resource.
Accessing the i
-th entry of this resource vector
triggers a SQL request which retrieves the corresponding
primary key.