LINGO LANGUAGE FILES

This document describes how the program Lingo uses language files,
and how these files could be enhanced or new ones created.
Finally, this document describes how Ergane language files
could be converted to be used with Lingo.


Structure of Lingo Language Files
---------------------------------
The lingo language files are ordinary database files, created
with the DATA application. They are made Read-only to protect
against accidental changes. The names of the files follow the
ISO 639 convention. The structure of the files consists of two
columns; the first column k is a 5-digit number, the second
column n is a 28-character text.
Here is an excerpt of some of these files:

EO            Ś EN           Ś FR            Ś DE
k     n       Ś k     n      Ś k     n       Ś k     n
1             Ś 1            Ś 1             Ś 1
...           Ś ...          Ś ...           Ś ...
11800 facila  Ś 11800 easy   Ś 11800 facile  Ś 11800 leicht
              Ś 11800 facile Ś               Ś    
              Ś              Ś               Ś    
16125 hela    Ś 16125 bright Ś 16125 clair   Ś 16125 hell
              Ś 16125 clear  Ś               Ś 16125 licht
              Ś 16125 light  Ś               Ś 16125 lichtvoll
              Ś              Ś               Ś
24047 leghera Ś 24047 light  Ś 24047 léger   Ś 24047 leicht
              Ś              Ś               Ś
24799 luma    Ś 24799 bright Ś 24799 clair   Ś 24799 hell
              Ś 24799 light  Ś 24799 lumineuxŚ
              Ś              Ś               Ś
24837 lumo    Ś 24837 light  Ś 24837 lumičre Ś 24837 Licht
              Ś              Ś               Ś
25991 malpeza Ś 25991 light  Ś 25991 léger   Ś
              Ś              Ś               Ś
...           Ś ...          Ś ...           Ś ...
59999         Ś 59999        Ś 59999         Ś 59999

One thing that this example wants to show is, that the english
word 'light' has different meanings. The example wants also to
show that an esperanto word has only one meaning, even if some
esperanto words could mean the same thing (like 'hela'='luma').

Because esperanto words are (almost) never ambiguous, Lingo
(and its father Ergane) are based on this language.
An example will show the benefit:
suppose that I want to translate 'leicht' from german to french.
I see that 'leicht' has two meanings, 'facila' and 'leghera' in
esperanto. The corresponding words in french are 'facile' and
'léger'.
If I would have used english as auxiliary language, I would
have got some erronous translations: 'leicht' could be 'light'
in english, and for 'light' there are several more translations
in french, like 'lumičre' (which is a noun and not an adjective).

Now, Lingo does not really look in the esperanto-file for
translating. It just uses the number column k in both the input
and output languages to join the words (for every number k,
there is a unique word in the esperanto file).

More in detail, if an input word (or a pattern with wildcards) is
given, the first step of the search algorithm makes a list of
numbers k matching the input word. In a second step, the program
looks up in the output language all the words corresponding to
one of the numbers in the list; it then makes couples of translations
and eliminates duplicated couples. Lingo expects that in the language
files, the k column is sorted ascending; thus when reading through
the files searching for word number x, the searching stops when
a word with a number greater than x is read.


Extending or Creating Lingo Language Files
------------------------------------------
With the information given above, it is possible to extend and
construct other language files.

Rule 1: if a new language is created, it must have a name of two
letters, as Lingo (for simplicity) only displays these languages.

Rule 2: again, if a new language-file is created, the structure
of the DATA file upon which it is based must be exactly as described
earlier. The simplest way is to open an existing language-file
with the DATA application, and select CreateNewFile; this makes
an empty language-file with the correct structure.

Rule 3: if a synonym is added, it must be put at the place with
the correct meaning. Example: if we know that the word 'light'
could also translate to 'leuchtend' in german, then we could
add this word at position k=24799 in the german language file;
but certainly not at k=24047. It is an advantage to know esperanto,
to find the right place. 

Rule 4: if a missing word in one language is added, again the
correct place must be located. Example: in the excerpt list above,
it seems that in the german language, a new entry (k=25991,n=leicht)
could be added.

Rule 5: if a couple of translations is added, then a new place
must be defined in both (or more) languages.
Example: suppose you know that the english word 'foobar' will
translate to 'Dingbat' in german. None of these words is already
in the language files; thus you can add (k=99999,n=foobar) in
EN, and add (k=99999,n=Dingbat) in DE. The number k can be freely
choosen, but should be greater than 60000 (as the 'official' Lingo
language files will use k in the range 1 .. 60000).

Rule 6: in any case when words are added to a language-file
(Rule 3 .. Rule 5), the DATA file must remain in ascending order.
If an entry is added in the DATA application, this entry is inserted
at the end of the file, even if during display it seems that the
line is inserted at the correct place of the file. The result is that
Lingo will ignore words added in the output language. To fix this
problem, another manipulation is needed: after the words are added,
ask the DATA application to resort on column k, then export the
whole database to a textfile, then make an empty language file,
and finally reimport data from the textfile. In this way, the
modified language-file is OK for Lingo.


Importing files from Ergane
---------------------------
The language files at the website of Ergane (www.travlang.com/Ergane)
are downloaded as ZIP-files.
In the ZIP-file, there is one Access-file (with extension mdb).
In this file, there is a table with the same name as the language.
This table has several columns; the only columns that are interesting
us are EspKey and XEntry: they are exactly the columns named k and
n of lingo databases. First, the table must be sorted ascending on
column EspKey. Then, the two columns must be cut and pasted to a
textfile. It is suggested to format the textfile like this:
"1","word or expression1"
"2","word2"
"4","another word"
...
This file is transfered without conversion (we have done enough
conversations now!) to the EPOC machine.
Here, the Data application is started with an empty lingo language
file (see Rule2 above), and the textfile is ImportFromText'ed.

For some Ergane files, the process described here is not possible:
- the Access table may be empty (the words are stored in another of
the ZIP-files, in a different format)
- the Access table contains many special characters (corresponding
to local characters or accents, that are not representable in the
standard PC and EPOC fonts).

- - - -
Patrick Hahn
phahn@vo.lu