Multilingual Translation Using Okapi + OmegaT
1. The multilingual translation process with the combination of Okapi and OmegaT was brought into attention by the needs of localizing client’s XML content generated from their Content Manage System, where Unicode support for some rare languages such as Khmer and Lao is a must on the one hand, and taking advantage of Computer Assisted Translation (CAT) tool to maximally attain quality and consistency is necessary on the other hand. The process has been tested on translators’ end to be feasible, but whether it is supported on client’s DTP machine needs to be further tested.
2. This document is prepared for Project Manager who manages multilingual translation project and requires pre- and post-process files to be translated into multiple languages.
3. This document provides a step-by-step guide on how to use the Text Extraction and the Text Merging utilities from open source Okapi tool to pre- and post-process files to be translated using OmegaT, one of the free translation tools available in the open source community.
4. The open source Okapi Framework is a set of interface specifications, format definitions, components and applications that provides an environment to build interoperable tools for the different steps of the translation and localization process. Details could be found at http://okapi.sourceforge.net.
5. OmegaT® is a free and open source multiplatform CAT tool with fuzzy matching, translation memory, keyword search, glossaries, and translation leveraging into updated projects. Details please refer to http://sourceforge.net/projects/omegat
- You must have Okapi tool installed on your machine. The download link is: http://prdownloads.sourceforge.net/okapi/Rainbow-R00021.zip?download
- You must have OmegaT installed on your machine. The download link is: http://downloads.sourceforge.net/omegat/OmegaT_1.7.3_04_Windows.exe
- You must have Unicode font installed for the languages you are going to translate. The following is the Unicode font required for the three rare languages, e.g. Khmer, Lao and Vietnamese.
|Language||Unicode Font||Download Link|
In this document we have one sample XML file (100001000014322_421_6.0_frc.xml) to translate into 6 target languages: Thai, Indonesian, Malay, Vietnamese, Lao, and Khmer. The client has provided us the full set of XML related config files, e.g. DTD and Stylesheet file. The recommended file structure organization is listed below:
We want to translate the file using OmegaT. We can use the Okapi Text Extraction and Text Merging utilities to pre- and post-process the file, and work, between the two processes, in an OmegaT project.
The pre-processing is done as follow:
Start Okapi Rainbow.
Select the command New from the File menu (or press Ctrl+N). Press Save button and save the project in the project folder Demo with name ipas.rbp, e.g. “D:Demoipas.rbp”.
To add files to the first input file list: Go to the Input List 1 tab if you are not yet there, then select the command Add from the Input menu (or press Insert). A dialog box titled Add Files to Input List 1 should open.
Select the files to insert into the input file list. In this example we are using the file located in the folder: “D:Demosourcefiles100001000014322_421_6.0_frc.xml”. Highlight the file in that folder and click Open.
Now you should have the file listed in Rainbow:
Now that we have our input file listed we need to make it is associated correctly with a filter. In this case Rainbow has recognized the XML file format and has associated it with the proper filter: okf_xml
If needed you could change the filter association using the Properties command in the Input menu. In this example, In order to avoid the problem with white spaces in inline tags, there are a few additional steps need to be followed as illustrated bellow.
Open the ‘Properties’ menu by clicking ‘Input’ then ‘Properties’
Choose the Xliff file format and choose ResX in the Parameters File, then click on Edit
Make sure ‘Write empty elements with an ending space’ has not been checked
Languages and Encodings
Go to the Options tab of Rainbow. This is where input and output languages and encodings are defined.
In this example, the input language is English. And you can set the target language to whatever language you want. For the convenience of distributing file to multi-language translators without having to create a separate file for each language, we still choose English as the target language but it’s important to choose Unicode (UTF-8) as the encoding, see the following screenshot.
Now is time to do the extraction. Select Text Extraction from the Utilities menu. This opens the Text Extraction Utility dialog box.
Move to the Format tab. This is where we specify the way the output files are to be generated.
Select the “XLIFF for OmegaT” option.
Move to the Options tab.
Make sure the option Create a TMX output file with any pre-translated entries found is set. This will ensure that if one of the input file is bilingual (like the XLIFF file) if there are entries already translated, they will be put in a TMX translation memory that you will be able to re-use with OmegaT.
The TMX file generated will go in the tm folder of the OmegaT project.
Move to the Package tab.
Set the option Create an OmegaT project file. This will select automatically other options necessary for the project in this tab.
Enter the path where you want the OmegaT project to be created. The Package name is the name of the project. In this example, we enter the path and the package name as illustrated below:
Now that we have all parameters defined, we are finally ready to run the Text Extraction utility.
If the folders you have specified for the output exist already, you will be prompted to select what you want to do with the files possibly there already. The following dialog box will pop up:
Click Yes to delete all files and sub-folders in the specified output folder. Note that the utility will not be able to remove files that are still open with other applications or sub-folders that are activated. If this happens, you will get a warning message in the Log.
Click No to not delete any existing file or sub-folders. However, any output file generated during this process will overwrite any existing file with the same path.
Click Cancel to not delete anything and stop the process.
For this example, if prompted, select Yes.
All warning and error messages are stored in Log window. At the end of the process, if there was no error and no warning the Log window is closed automatically. If any error or warning occurs, the Log window remains opened after the process is done.
When the process is completed, you can open the folder where the output files were generated by selecting the command Open Last Output Folder from the Utilities menu (or press Ctrl+L). This will open the project folder.
The project has the following folders:
- The Original folder contains the two original source files. They will be used after translation for merging.
- The source folder contains the two prepared files in XLIFF format (.xlf). These are the files to translate with OmegaT.
- The target folder contains two files: _Merge.bat and _Merge.rbp. These files will be used to merge the translated files. This folder is also where OmegaT will generate the translated files.
- The tm folder contains the TMX output generated from the input files, where any potential text already translated is stored. In many cases, like for this example, this TM will be empty because there was no pre-translated text.
- The other folders are empty. They are needed for the OmegaT project to open properly. You can put additional TM and glossaries in the appropriate folders.
Since we are going to translate into 6 languages, we need to manage each language separately. To do that, we copy the entire structure under the package folder “ipas” to the respective language folder. The following screenshot is an example of Lao folder for Lao language.
Our project manager then zips up the whole Lao folder and distributes the Lao.zip to Lao translator for translation. In the subsequent translation stage, we will also take Lao language as the example for translation.
To translate the file:
Unzip the file Lao.zip.
Select the Open command from the Project menu. Select the OmegaT project we have created above and open it. At this point OmegaT will get the list of the files in the source folder and parse them. The files are loaded in the program and the Project Files window is opened.
Go to the Options menu and choose the Font command, choose the installed Saysettha OT Unicode font for the Lao language, see the following screenshot. For the other languages to translate, choose the corresponding Unicode font in order to show the target characters correctly in the program.
Select the file to work on and start translating.
- XLIFF supports inline codes. In OmegaT they are represented by <xN/> and <gN>…</gN> codes inside your text. The translator can move them around but not delete any or add any. The translator can use OmegaT’s Validate Tags command to check if all codes are properly set.
- The translator may have to adjust the segmentation rules depending on the type of file he/she is working with. In general the default segmentation works just fine.
Once the file is translated, select the Save and the Create Target Documents from the Project menu. This creates the translated XLF file in the target folder. Close OmegaT. The translator then zips up the whole Lao folder and sends back to the project manger. The project manager can then do the post-process step.
Open Okapi Rainbow.
To add the translated XLF files to the first input file list: Go to the Input List 1 tab, then select the command Add from the Input menu, or press Insert, or press the button on the toolbar. A dialog box titled Add Files to Input List 1 should open.
Navigate to the target folder (in this example: D:DemoLaotarget) and add the file to process: 100001000014322_421_6.0_frc.xml.xlf.
Now is time to do the merging. Select Text Merging from the Utilities menu. This opens the Text Extraction Merging dialog box. In this example, select D:DemoLaoOriginal as the Root for the skeleton files, and D:DemoLaotarget as the Root for the output merged files.
…and the translated merged files are saved in the target folder D:DemoLaotarget (with the same file name as the original file: 100001000014322_421_6.0_frc.xml).
Open the target XML file in Internet Explorer. It shall be displayed correctly as below.
If not, please check the following points:
- The target XML file has been placed in the correct folder relative to the project DTD and Stylesheet file.
- Make sure you have installed the correct Unicode font for the language you are currently displaying.
- Open the translation project in OmegaT and run OmegaT’s Validate Tags command to check if all codes are properly set.
Now you are done.