XTM - Engine
  qa logo

XTM is a revolutionary new approach to the problem of how to store and use translation memory. It is totally integrated with XML and uses XML and advanced database technology to outperform traditional translation memory systems. By introducing new concepts of matching it enables substantial translation cost reductions during the typical document life cycle.

XTM also incorporates advanced web search technology which uses morphological reduction to improve leveraged memory searching.

XTM Engine converts the original file into XML if necessary, extracts the text from the document, and segments the text into sentences or phrases. Then it applies the existing translation memory to provide the translator with in-context exact matching, leveraged matching and fuzzy matching. At any stage of the process XTM engine provides a preview of the translated document as either .pdf or .html. Following translation and review it creates the translated document and stores the new translation memory.

XTM stands for XML based “text memory”. XTM implements the proposed xml:tm standard as a means of embedding text memory data into the XML document itself. It employs a very powerful notation called namespace which enables a text memory overlay to be added to any XML document.

Text memory comprises the following distinct concepts:

  1. Author Memory
  2. Translation Memory
  3. Leveraged Memory

Author memory

Author memory starts by dividing the text into segments which are usually single sentences or phrases. Each text segment is given a unique identifier. As a document goes through its life cycle these unique identifiers are maintained. These sentences can also be loaded into an Author Memory database, and fed back into the authoring process so that authors can be encouraged to reuse the same exact text where possible. If a sentence has already been authored once, then a translation will most probably already exist for it in the target languages. This is a very simple and effective way of reducing translation costs while insuring consistency.

Translation Memory

Once a document is sent to translation the unique identifiers allow the target language text to be exactly aligned with the source language. When the source language document is modified, the changes are identified precisely. For text that has not changed the exact previous translation will be reused. This concept is known as Perfect Matching®.

Leveraged Memory

The source and target language segments can also be stored in traditional leveraged lookup translation memory databases and can be used to find both exact (leveraged) and fuzzy matches. XTM uses advanced linguistic web search technology to improve the quality of the matching.

Architecture

XTM is an enterprise level tool for managing text and translation memory. It offers significant benefits to the customer in terms of cost and time control.

For more details, please download our brochure.
 
A translation memory, or TM, is a type of database that is used in software programs designed to aid human translators. Some software programs that use translation memories are known as translation memory managers (TMM). Translation memories are typically used in conjunction with a dedicated computer assisted translation (CAT) tool, word processing program, terminology management systems, multilingual dictionary, or even raw machine translation output. A translation memory consists of text segments in a source language and their translations into one or more target languages. These segments can be blocks, paragraphs, sentences, or phrases. Individual words are handled by terminology bases and are not within the domain of TM. Research indicates that many companies producing multilingual documentation are using translation memory systems. A translator first supplies a source text (that is, a text to be translated) to the translation memory. Some translation memories systems search for 100% matches only, that is to say that they can only retrieve segments of text that match entries in the database exactly, while others employ fuzzy matching algorithms to retrieve similar segments, which are presented to the translator with differences flagged. It is important to note that typical translation memory systems only search for text in the source segment. The flexibility and robustness of the matching algorithm largely determine the performance of the translation memory, although for some applications the recall rate of exact matches can be high enough to justify the 100%-match approach. Translation memory The unique identifiers are remembered during translation so that the target language document is 'exactly' aligned at the text unit level. If the source document is subsequently modified, then those text units that have not changed can be directly transferred to the new target version of the document without the need for any translator interaction. This is the concept of 'exact' or 'perfect' matching to the translation memory. xml:tm can also provide mechanisms for in-document leveraged and fuzzy matching. TMX Translation Memory Exchange format. This standard enables the interchange of translation memories between translation suppliers. TMX has been adopted by the translation community as the best way of importing and exporting translation memories. The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data. TBX Termbase Exchange format. This standard allows for the interchange of terminology data including detailed lexical information. The framework for TBX is provided by two ISO 12620, ISO 12200 and ISO Committee Draft 16642, known as TMF or Terminological Markup Framework. ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. TMF includes a structural metamodel for Terminology Markup Languages in general, regardless of which XML style of representation is used. SRX Segmentation Rules Exchange format. SRX is intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively. The ability to specify the segmentation rules that were used in the previous translation increases the leveraging that can be achieved. GMX GILT Metrics. GILT stands for (Globalization, Internationalization, Localization, and Translation). The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics. The proposed GILT Metrics standard is tasked with quantifying the workload and quality requirements for any given GILT task. OLIF Open Lexicon Interchange Format. OLIF is an open, XML-compliant standard for the exchange of terminological and lexical data. Although originally intended as a means for the exchange of lexical data between proprietary machine translation lexicons, it has evolved into a more general standard for terminology exchange. XLIFF XML Localisation Interchange File Format. It is intended to provide a single interchange file format that can be understood by any localization provider. XLIFF is the preferred way of exchanging data in XML format in the translation industry. TransWS Translation Web Services. TransWS specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects. It is intended as a detailed framework for the automation of much of the current localization process by the use of Web Services. xml:tm xml:tm This approach to translation memory is based on the concept of text memory which comprises author and translation memory. xml:tm has been donated to Lisa OSCAR by XML-INTL.