LINGO LANGUAGE FILES This document describes how the program Lingo uses language files, and how these files could be enhanced or new ones created. Finally, this document describes how Ergane language files could be converted to be used with Lingo. Structure of Lingo Language Files --------------------------------- The lingo language files are ordinary database files, created with the DATA application. They are made Read-only to protect against accidental changes. The names of the files follow the ISO 639 convention. The structure of the files consists of two columns; the first column k is a 5-digit number, the second column n is a 28-character text. Here is an excerpt of some of these files: EO ¦ EN ¦ FR ¦ DE k n ¦ k n ¦ k n ¦ k n 1 ¦ 1 ¦ 1 ¦ 1 ... ¦ ... ¦ ... ¦ ... 11800 facila ¦ 11800 easy ¦ 11800 facile ¦ 11800 leicht ¦ 11800 facile ¦ ¦ ¦ ¦ ¦ 16125 hela ¦ 16125 bright ¦ 16125 clair ¦ 16125 hell ¦ 16125 clear ¦ ¦ 16125 licht ¦ 16125 light ¦ ¦ 16125 lichtvoll ¦ ¦ ¦ 24047 leghera ¦ 24047 light ¦ 24047 léger ¦ 24047 leicht ¦ ¦ ¦ 24799 luma ¦ 24799 bright ¦ 24799 clair ¦ 24799 hell ¦ 24799 light ¦ 24799 lumineux¦ ¦ ¦ ¦ 24837 lumo ¦ 24837 light ¦ 24837 lumière ¦ 24837 Licht ¦ ¦ ¦ 25991 malpeza ¦ 25991 light ¦ 25991 léger ¦ ¦ ¦ ¦ ... ¦ ... ¦ ... ¦ ... 59999 ¦ 59999 ¦ 59999 ¦ 59999 One thing that this example wants to show is, that the english word 'light' has different meanings. The example wants also to show that an esperanto word has only one meaning, even if some esperanto words could mean the same thing (like 'hela'='luma'). Because esperanto words are (almost) never ambiguous, Lingo (and its father Ergane) are based on this language. An example will show the benefit: suppose that I want to translate 'leicht' from german to french. I see that 'leicht' has two meanings, 'facila' and 'leghera' in esperanto. The corresponding words in french are 'facile' and 'léger'. If I would have used english as auxiliary language, I would have got some erronous translations: 'leicht' could be 'light' in english, and for 'light' there are several more translations in french, like 'lumière' (which is a noun and not an adjective). Now, Lingo does not really look in the esperanto-file for translating. It just uses the number column k in both the input and output languages to join the words (for every number k, there is a unique word in the esperanto file). More in detail, if an input word (or a pattern with wildcards) is given, the first step of the search algorithm makes a list of numbers k matching the input word. In a second step, the program looks up in the output language all the words corresponding to one of the numbers in the list; it then makes couples of translations and eliminates duplicated couples. Lingo expects that in the language files, the k column is sorted ascending; thus when reading through the files searching for word number x, the searching stops when a word with a number greater than x is read. Extending or Creating Lingo Language Files ------------------------------------------ With the information given above, it is possible to extend and construct other language files. Rule 1: if a new language is created, it must have a name of two letters, as Lingo (for simplicity) only displays these languages. Rule 2: again, if a new language-file is created, the structure of the DATA file upon which it is based must be exactly as described earlier. The simplest way is to open an existing language-file with the DATA application, and select CreateNewFile; this makes an empty language-file with the correct structure. Rule 3: if a synonym is added, it must be put at the place with the correct meaning. Example: if we know that the word 'light' could also translate to 'leuchtend' in german, then we could add this word at position k=24799 in the german language file; but certainly not at k=24047. It is an advantage to know esperanto, to find the right place. Rule 4: if a missing word in one language is added, again the correct place must be located. Example: in the excerpt list above, it seems that in the german language, a new entry (k=25991,n=leicht) could be added. Rule 5: if a couple of translations is added, then a new place must be defined in both (or more) languages. Example: suppose you know that the english word 'foobar' will translate to 'Dingbat' in german. None of these words is already in the language files; thus you can add (k=99999,n=foobar) in EN, and add (k=99999,n=Dingbat) in DE. The number k can be freely choosen, but should be greater than 60000 (as the 'official' Lingo language files will use k in the range 1 .. 60000). Rule 6: in any case when words are added to a language-file (Rule 3 .. Rule 5), the DATA file must remain in ascending order. If an entry is added in the DATA application, this entry is inserted at the end of the file, even if during display it seems that the line is inserted at the correct place of the file. The result is that Lingo will ignore words added in the output language. To fix this problem, another manipulation is needed: after the words are added, ask the DATA application to resort on column k, then export the whole database to a textfile, then make an empty language file, and finally reimport data from the textfile. In this way, the modified language-file is OK for Lingo. Importing files from Ergane --------------------------- The language files at the website of Ergane (www.travlang.com/Ergane) are downloaded as ZIP-files. In the ZIP-file, there is one Access-file (with extension mdb). In this file, there is a table with the same name as the language. This table has several columns; the only columns that are interesting us are EspKey and XEntry: they are exactly the columns named k and n of lingo databases. First, the table must be sorted ascending on column EspKey. Then, the two columns must be cut and pasted to a textfile. It is suggested to format the textfile like this: "1","word or expression1" "2","word2" "4","another word" ... This file is transfered without conversion (we have done enough conversations now!) to the EPOC machine. Here, the Data application is started with an empty lingo language file (see Rule2 above), and the textfile is ImportFromText'ed. For some Ergane files, the process described here is not possible: - the Access table may be empty (the words are stored in another of the ZIP-files, in a different format) - the Access table contains many special characters (corresponding to local characters or accents, that are not representable in the standard PC and EPOC fonts). - - - - Patrick Hahn phahn@vo.lu