XTM - Technology

XML-INTL uses the following advanced technology to reduce translation costs and shorten turn around times:

Advanced xml techniques

Extensive use is made of advanced xml techniques to overlay a linguistic interpretation onto your text data. This overlay is not obtrusive.


Author memory - control over the source language material

We concentrate our technology on the source language text and are able to monitor changes between revisions of a document at the sentence level. Control over the source language is an important cornerstone to reducing translation costs.


DOM Differencing

Using author memory we are able to accurately work out what has changed within a document during revisions at the sentence level. DOM stands for Document Object Model. We use the hierarchical DOM structure to work out what has changed in terms of text between revisions of a document. DOM differencing is highly beneficial to reducing translation costs.


Perfect Matching® - reducing translation costs

Using all of the previous techniques we can guarantee that, for text that has not changed between updates, exactly the same target language translation will be used. This means that the translation does not need to be reviewed by a translator. This is an important distinction compared with the type of leveraged translation memory process offered by translation agencies. With leveraged matching a translator still has to review the matches to check for correct grammatical context. This adds to the costs.


Advanced Linguistic Matching and Web Search Technology

We used advanced text searching techniques based on Bayesian belief networks and encompassing linguistic techniques such as morphological reduction. These are the same technology that is used by advanced web search engines. Most of the existing translation memory systems use very primitive technology in comparison, and produce poorer results.


Advanced Linguistic Terminology Management

We have a built in terminology management system. When preparing new text for translation a linguistic analysis of the source material is made. Noun phrases are analyzed and searched for against the terminology database for target language terms. These are then offered to the translators as the required terminology.


Linguistic analysis of text units to help reduce word counts

Where text elements contain non-translatable data such as only numeric, alpha-numeric, part number, punctuation or measurement data, then the text from these elements will not be included in the word count. Translation agencies will include such data in word counts increasing the costs to the customer. Such elements can be protected from translation within our systems.


All the necessary data is prepared for the translators

All the necessary data including leveraged, fuzzy matched and terminology data is extracted and put into a special 'package' ready for translation. We use the OpenTag format as the basis for text extraction. This package can be used to provide on-line web access to the translator directly from your web site. Translation, review and publishing can all be undertaken from this interface, substantially reducing turn-around times. There are no middlemen to slow down the process and add to costs. All word counts are automatically produced as part of the process. The customer is in control of the word counts and of all of the costs.


Provide translators with an on-line web browser interface directly from your web site

The translator is presented with an on-line interface via the internet. The translator can key in the translations directly, with full access to terminology and leveraged and fuzzy matched data. The same interface can be used by the translator to have a formatted view of the text as well as for review and QA purposes. The data can also be merged automatically using the same interface to create the new target language version of the data. This can significantly reduce turn-around times. The translated data can also be reviewed and QA'ed on-line as well. We can also provide translators with formatted preview of the material if required.


Leveraging existing standards

We utilize existing standards whenever possible. Thus xml:tm uses the LISA OSCAR SRX standard for segmentation and OASIS XLIFF 1.0


Supported Platforms:

We have designed our software to run on Solaris, HPUX, Windows NT/2000/XP and Linux. XTM is written entirely in Java and will run on any platform that has is supported by Java. XTM requires a database to be installed such as Oracle, PostgreSQL or MySQL.

 
A translation memory, or TM, is a type of database that is used in software programs designed to aid human translators. Some software programs that use translation memories are known as translation memory managers (TMM). Translation memories are typically used in conjunction with a dedicated computer assisted translation (CAT) tool, word processing program, terminology management systems, multilingual dictionary, or even raw machine translation output. A translation memory consists of text segments in a source language and their translations into one or more target languages. These segments can be blocks, paragraphs, sentences, or phrases. Individual words are handled by terminology bases and are not within the domain of TM. Research indicates that many companies producing multilingual documentation are using translation memory systems. A translator first supplies a source text (that is, a text to be translated) to the translation memory. Some translation memories systems search for 100% matches only, that is to say that they can only retrieve segments of text that match entries in the database exactly, while others employ fuzzy matching algorithms to retrieve similar segments, which are presented to the translator with differences flagged. It is important to note that typical translation memory systems only search for text in the source segment. The flexibility and robustness of the matching algorithm largely determine the performance of the translation memory, although for some applications the recall rate of exact matches can be high enough to justify the 100%-match approach. Translation memory The unique identifiers are remembered during translation so that the target language document is 'exactly' aligned at the text unit level. If the source document is subsequently modified, then those text units that have not changed can be directly transferred to the new target version of the document without the need for any translator interaction. This is the concept of 'exact' or 'perfect' matching to the translation memory. xml:tm can also provide mechanisms for in-document leveraged and fuzzy matching. TMX Translation Memory Exchange format. This standard enables the interchange of translation memories between translation suppliers. TMX has been adopted by the translation community as the best way of importing and exporting translation memories. The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data. TBX Termbase Exchange format. This standard allows for the interchange of terminology data including detailed lexical information. The framework for TBX is provided by two ISO 12620, ISO 12200 and ISO Committee Draft 16642, known as TMF or Terminological Markup Framework. ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. TMF includes a structural metamodel for Terminology Markup Languages in general, regardless of which XML style of representation is used. SRX Segmentation Rules Exchange format. SRX is intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively. The ability to specify the segmentation rules that were used in the previous translation increases the leveraging that can be achieved. GMX GILT Metrics. GILT stands for (Globalization, Internationalization, Localization, and Translation). The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics. The proposed GILT Metrics standard is tasked with quantifying the workload and quality requirements for any given GILT task. OLIF Open Lexicon Interchange Format. OLIF is an open, XML-compliant standard for the exchange of terminological and lexical data. Although originally intended as a means for the exchange of lexical data between proprietary machine translation lexicons, it has evolved into a more general standard for terminology exchange. XLIFF XML Localisation Interchange File Format. It is intended to provide a single interchange file format that can be understood by any localization provider. XLIFF is the preferred way of exchanging data in XML format in the translation industry. TransWS Translation Web Services. TransWS specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects. It is intended as a detailed framework for the automation of much of the current localization process by the use of Web Services. xml:tm xml:tm This approach to translation memory is based on the concept of text memory which comprises author and translation memory. xml:tm has been donated to Lisa OSCAR by XML-INTL.