LangueDOC
Home | Archives | Languages | TechHelp | Movie | Course materials | Team | Publications | RUSSIAN
 

Fieldworks (FLEx) to ELAN conversion

In present there exists a Java program written by Tom Myers which converts interlinear texts exported from Fieldworks Language Explorer (FLEx) into ELAN format (EAF) for further time-alignment. There is ongoing work by ELAN developers to include this conversion as an import facility into ELAN.

Conversion steps

  • Export from FLEx into a specially prepared XML
  • Convert XML to EAF
  • Open the file in ELAN and save before working with it further

Programs needed

  1. Fieldworks Language Explorer (FLEx)
  2. XSL transformation which rearranges items in the XML exported from FLEx [download, 2 Kb]
  3. The Flex2EAF converter proper (a Java program) [download, 16 Kb]
  4. Java Runtime Environment (to run Java programs)

Installation (instructions for Windows XP)

Export from FLEx

  • Copy xml4eaf.xml and itemsFirst.xsl to the folder
    C:\Program Files\SIL\FieldWorks\Language Explorer\Export Templates\Interlinear
  • After this, you can export interlinear texts from FLEx choosing the option "Generic XML for conversion to EAF".

Flex2EAF converter

  • Create a folder and put the converter files there, e.g. D:\Java\flex2eaf.
    (The folder should contain four files: Flex2EAF.java, Flex2EAF.class, Flex2EAF$tierObject.class and Flex2EAF$itemObject.class).
  • Add path to this folder to the CLASSPATH system variable (see below).

Java

  • If you have no Java installed (check whether there is a "Java Control Panel" item in the Control Panel), download and install Java (http://www.java.com/ru/download/index.jsp).
  • Set system variables with necessary paths for Java:
    • JRE_HOME = path to the folder containing Java, (e.g. « C:\Program Files\Java\jre6 »)
    • CLASSPATH = a list of paths to various classes (if exists, append to the end, separated by semicolon):
      • path containing only the dot (« .; »)
      • path to the convertor (see above), e.g. « D:\Java\flex2eaf »
    Both paths together: « .;D:\Java\flex2eaf »

The system variables are set in Control Panel/System/Advanced/Environment variables/System variables. Reboot needed for the new values to be effective. Setting paths is not obligatory: you can specify paths each time you run the converter.

 

Running Flex2EAF

Files needed

  • Interlinear text in XML (*.xml) exported from FLEx
  • A sound (*.wav) or video (*.mpg) file. In general, you can specify whatever filename and link another file later in ELAN.

The result is an ELAN annotation file (*.eaf).

Running the converter

The converter is launched from the command line.

You should know, at least approximately, the duration of your sound/video recording and provide the duration in milliseconds. Flex2EAF will divide this duration into segments of equal length for each aligned annotation (e.g. sentence).

Flex2EAF takes the following parameters (type java Flex2EAF to get this help message):

usage: java Flex2EAF flexFile.xml [timeInMSec [mediaFile [noSynch [renamings [showDebug]]]]]
e.g.: java Flex2EAF khinalug.xml 37000 khinalug.mpg word word-txt-en:word,word-txt-ru:word true
which uses renaming to merge two tiers into the single tier 'word' and shows debugging output
or : java Flex2EAF khinalug.xml
which defaults to 300000 msec, i.e. five minutes and elan-example1.mpg

In slightly greater detail:

java Flex2EAF khinalug.xml 37000 khinalug.mpg word word-txt-en:word,word-txt-ru:word true

can be read as follows:

khinalug.xml:
convert the Flex file khinalug.xml to EAF as output (stdout), with one tier for each tagType-itemType-itemLang combination, e.g. <phrase> containing <item type="txt" lang="en">...</item> provisionally becomes tier phrase-txt-en;

37000:
interpolate time-values from 0 to 37000 milliseconds for the file;

khinalug.mpg:
use "khinalug.mpg" as the media file;

word:
don't generate time-values for Flex tags below "word" (treat those as ref tiers);

word-txt-en:word,word-txt-ru:word:
map specified provisional tier names into actual tiernames, in this case provisional tiers "word-txt-en" and "word-txt-ru" are both to be relabelled as the tier "word";

true:
generate debug output (showing what goes on and off various stacks) mixed in with real output.