TAUS Translation Technology Landscape Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Translation Technology Landscape Report

April 2013

Copyright TAUS 2013

Funded by LT-Innovate

Translation Technology Landscape Report

Translation Technology Landscape Report

Target Audience Any individual interested in the business of translation will gain from this report. The report will help beginners to understand the main uses for different types of translation technology, differentiate offerings and make informed decisions. For more experienced users and business decision-makers, the report shares insights on key trends, future prospects and areas of uncertainty. Investors and policymakers will benefit from analyses of underlying value propositions.

Report Structure This report has been structured in discrete chapters and sections so that the reader can consume the information needed. The report does not need to be read from start to finish. The convenience afforded from such a structure also means there is some repetition.

COPYRIGHT TAUS 2013 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system of any nature, or transmitted or made available in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of TAUS. TAUS will pursue copyright infringements.

Authors: Rahzeb Choudhury and Brian McConnell

In spite of careful preparation and editing, this publication may contain errors and imperfections. Authors, editors, and TAUS do not accept responsibility for the consequences that may result thereof. Published by TAUS BV, De Rijp, The Netherlands. For further information, please email [email protected]

Reviewers - Jaap van der Meer and Rose Lockwood Our thanks to LT-Innovative for funding this report.
3

Copyright TAUS 2013

Copyright TAUS 2013

Translation Technology Landscape Report

Translation Technology Landscape Report

CONTENTS
1. Translation Technology Landscape in a Snapshot 2. A Brief History of Translation Technology 3. Tools for the Professional Translation Industry
3.1 Types of Tools 3.1.1 Translation Tools 3.1.1.1 Client/Server Based CAT Tools 3.1.1.2 Web Based CAT Tools 3.1.1.3 Mobile Translation Tools 3.1.1.4 Stand Alone Utilities 3.1.2 Translation Management Systems 3.1.2.1 Document TMS Systems 3.1.2.2 Localization TMS 3.1.2.3 Translation Memory & Terminology Management 3.1.2.4 QA Tools & Processes 3.1.3 Translation Processes & Features 3.1.3.1 Translation Memory 3.1.3.2 Advanced Leveraging 3.1.3.3 Translation Process Management 3.1.3.4 Terminology Management 3.1.3.5 Controlled Authoring 3.1.3.6 Quality Assurance 3.2 Translation Technology Trends 3.2.1 From Desktop to Server and now Cloud 3.2.2 From Licensing to Professional Services and SaaS 3.2.3 Integration with Content Management Systems 3.3 Translation Technology Value Chain 3.3.1 Supply 3.3.1.1 Types of Providers 3.3.1.1.1 Translation Management Systems 3.3.1.1.2 Localization Management Systems 3.3.1.1.3 Translation Memory (Stand Alone) 3.3.1.1.4 Audio/Video Captioning Systems 3.3.1.1.5 Interpretation Systems 3.3.1.2 Business Models 3.3.1.2.1 Licensed
4

8 11 15
15 16 16 17 18 18 19 19 20 20 21 21 21 22 22 23 23 24 25 25 25 26 26 26 26 26 27 27 28 28 29 29
Copyright TAUS 2013

3.3.1.2.2 Cloud/SaaS 3.3.1.2.3 Translation Services 3.3.1.3 Channels and Platforms 3.3.2 Demand 3.3.2.1 Individual Translators 3.3.2.2 Language Service Providers (Translation Agencies) 3.3.2.3 Publisher Organizations 3.4 Opportunities and Challenges in Translation Technology 3.4.1 Interoperability and Standards 3.4.2 Measuring and Benchmarking Quality

29 29 30 30 31 31 32 33 33 33

4. Machine Translation

4.1 Approaches to Machine Translation 4.1.1 Rules Based 4.1.1.1 Products and Practitioners 4.1.2 Example Based 4.1.2.1 Products and Practitioners 4.1.3 Statistical 4.1.3.1 Products and Practitioners 4.1.4 Hybrid 4.1.4.1 Products and Practitioners 4.2 Machine Translation Trends 4.2.1 Customized Engines 4.2.2 Real-time Customization 4.2.3 Open Source Technology 4.2.4 Data Sharing 4.2.5 Human/Machine Translation 4.2.6 From Licensing to Professional Services 4.3 Machine Translation Value Chain 4.3.1 Supply 4.3.1.1 Types of Providers 4.3.1.2 Business Models 4.3.1.3 Channels and Platforms 4.3.2 Demand 4.3.2.1 Language Service Providers 4.3.2.2 Consumer/Individuals Direct 4.3.2.3 SME and Enterprise Direct 4.3.2.4 Government/Institutions

37

37 37 38 38 38 39 39 40 41 41 41 41 42 42 42 43 43 43 43 44 44 44 44 45 45 46
5

Copyright TAUS 2013

Translation Technology Landscape Report

Translation Technology Landscape Report

4.4 Machine Translation Breakthroughs 4.4.1 Intractable Problems 4.4.2 Solvable Problems 4.5 Opportunities and Challenges in the Machine Translation Industry 4.5.1 Interoperability and Standards 4.5.2 Measuring and Benchmarking Quality 4.5.3 Cost of Customization 4.5.4 The Search for Talent

46 46 47 47 47 47 48 48

Trend Convergence Drives Translation as a Utility

5. Trends That Influence the Translation Technology Industry


5.1 Cloud 5.2 Crowd 5.3 Big Data 5.4 Mobile 5.5 Social 5.6 Convergence 5.6.1 Technology convergence 5.6.2 Functional convergence

51

51 51 52 52 52 53 53 54

6. Paradigm Shift and Counter Forces


6.1 Translation as a Utility 6.2 Counter Forces

56

56 57

7. Translation Data and Technology

7.1 Opportunities 7.1.1 Terminology 7.1.2 Customized Machine Translation 7.1.3 Global Market and Customer Analytics 7.1.4 Quality management 7.1.5 Interoperability 7.2 Access to Translation Data 7.3 Sharing Translation Data

58

59 59 60 60 61 61 61 62

8. Drivers and Inhibitors 9. Methodology Addendum: Clarifying Copyright on Translation Data


6

63 65 66
Copyright TAUS 2013

Source: TAUS

Copyright TAUS 2013

Translation Technology Landscape Report

Translation Technology Landscape Report

1.

Translation Technology Landscape in a Snapshot

The translation technology segment is at a deeply transformative point in its evolution. For one generation the segment was largely comprised of solutions that serviced the professional translation industry. These tools improve the productivity and consistency of human translation and they remain relevant for their steadily growing target market. However, in the last five years the grandfather of all translation automation technologies, machine translation, began to be adopted by the professional industry en masse and is increasingly ubiquitous on the worldwide web. The ubiquity of continuously improving data-driven or statistical machine translation is one of three factors creating a paradigm shift in the demand for translation. In addition, global economic growth is shifting to nonEnglish speaking markets and globalization is leading to previously unseen levels of cultural exchange. Together these major technology, economic and social trends are converging to take translation from a cost of doing business for a few thousand organizations and a luxury professional service for almost everyone else, to a utility sector. That is to say, a necessary service for all players in the global information society, with differentiated quality expectations dependent on purpose. Demand for translation automation technology, and in particular machine translation, will grow: - From language service providers and individual translators - From internationally operating enterprises and organizations - Through being embedded in enterprise systems, social software, consumer devices and over time all digital touch points The coming years herald a Convergence era where technologies, such as speech, search and others will continue to be combined with machine translation to create new solutions. There will be functional convergence within enterprises and across supply chains as machine translation is increasingly embedded with unexpected innovations as a result. Cloud based services and open-source technology, such as the Moses toolkit, level the playing field for innovative new providers. That said, translation data, the fuel for data driven machine translation, is required on a massive scale to satisfy demand. Demand will come from verticals requiring customized domain specific machine translation. Demand will come from nations wanting to be better served by translation technology. Whether markets for translation data are open or closed is a key factor affecting the nature of evolution in the segment, the cost of solutions and the motivation to innovate. Last year we saw the number of Android apps exceed that of Apples iOS. Androids open source approach shifted power dynamics and created economic opportunities for many in an extremely high growth area. Opening up access to translation data has the potential to be just as powerful an enabler.
8
Copyright TAUS 2013 Copyright TAUS 2013

Translation Technology Landscape Report

Translation Technology Landscape Report

Content Explosion
Content and translation volume keep growing explosively, from translation of paper documents to localization of software, through globalization and this current phase of integration of translation technology into devices and applications. With the onset of unified, seamless personalized user experiences across digital touch points there will be a manifold increase in the content explosion. Translation technology solutions will continue to evolve for use scenarios arising for each phase of evolution. 1980 - Translation 1990 - Localization

2.

A Brief History of Translation Technology

It is debatable whether the translation industrys response to rapid globalization and growth in content has been the right one. Has the industry made best use of technology to raise its capacity and stay profitable? Or has the content explosion marginalized an industry of artisans? To understand the current landscape we need to take a ride back into translation technology history. The onset of the cold war highlighted the need for machines that could translate with both the US and Russia wanting to keep tabs on each other. However, despite significant investment, by the late 1960s it became clear that machines could not translate anywhere near well enough. Machines and translation were rarely sighted together for the next fifteen years. A notable exception was the European Commissions use of the Systran system for gisting purposes beginning in 1976. By the mid-1980s a few service providers in a burgeoning localization industry, focusing on translation of IT manuals, began to provide translation memory tools, which enabled recycling of translations, along side glossary tools for translators. These tools helped translators to increase their productivity and consistency. While these translation technology tools became more scalable and the feature set matured, the core technology for improving productivity was left largely untouched for the next twenty years. The localization industry continued to grow rapidly, if not innovate, and many in the traditional translation industry were largely left in its wake.

No Translation Technology

Translation Memory and Glossary Technology

2000 - Globalization

2010 - Integration

Workflow Systems

Machine Translation and Advanced Leveraging

2020 - Convergence

Source: TAUS
10

Machine Translation and other Language Technologies Embedded in all Digital Touchpoints
Copyright TAUS 2013 Copyright TAUS 2013

11

Translation Technology Landscape Report

Translation Technology Landscape Report

Many translators demonstrated reluctance to adopting translation technology for a range of reasons. Some well founded, others based on ill-informed fears of replacement. The lack of innovation also meant low barriers to entering the translation technology segment. Many translation services companies developed their own computer-aided translation tools, creating a litany of mediocre systems with no differentiation. While the translation services industry flourished, the technology segment remained small. In that same period the cold war ended, economic growth began shifting from English speaking to non-English speaking nations, digital communications largely replaced print, consumers and citizens became publishers en-masse, and people on two-dollars a day got connected via handy mobile phones. It is only in the last five years that we have begun to see wholesale changes in the technology and business models being used for translation. For the large part change leaders have entered from outside the industry. The most prominent examples are Baidu, Google, Microsoft and Yandex.

12

Copyright TAUS 2013

Copyright TAUS 2013

13

Translation Technology Landscape Report

Translation Technology Landscape Report

3.
Translation Industry Evolution
Translation to Globalization

Tools for the Professional Translation Industry


Types of Tools

3.1

Translation technology can be broken down into several major components. Translation tools, also known as CAT (computer-aided translation), enhance the productivity and consistency of human translators, and include several component technologies, including: translation memory (which enables translators to re-use and learn from previous work), translation management systems (which automate project management and publication), terminology management systems (which encourage the use of standard terms, names, and translations) and quality assurance tools. The workflow diagram below provides an overview of the many stages for translation and localization of documents. The translation technologies outlined in this section seek to optimize and where possible automate the process.

A Translation/Localization Workflow

Source: TAUS

Between the mid-1980s and around 2000, demand grew for translation of documents to software and then websites. Products went from being launched market by market to multiple markets simultaneously (Simship). Companies increasingly served international customer bases. To service the market evolution, the professional translation industry adopted translation memory (TM), terminology and translation/globalization process management technology.

Source: TAUS

14

Copyright TAUS 2013

Copyright TAUS 2013

15

Translation Technology Landscape Report

Translation Technology Landscape Report

3.1.1

Translation Tools

Computer aided translation (CAT) tools have been in use since the introduction of personal compu ters. Their primary purpose is to improve translator productivity and accuracy by providing tools such as document editors, glossaries, translation memory, in a single integrated environment or workbench. These tools have evolved along with the computing and networking industries, first as stand-alone tools to be used on a single computer, to client-server tools to be used on a company network, to web based tools where the service and tools are delivered via the Internet.

Some client/server CAT tools, such as those provided by Across Systems, have web accessible interfaces. However in these systems, web access tends to be a second-class citizen compared to the native client app, which is typically written for Microsoft Windows. This is often explained by the products history and the timing of its development. Most of the client/server tools trace their origins to the late 90s and early 00s, before SaaS had caught on, and before non-Windows operating systems such as Mac OS X, Android and iOS became commonplace. The Google Translator Toolkit (GTT), which is offered as a standalone CAT tool for translators, is an interesting example of how powerful web based tools can be. It is fully integrated with Google Translate, so translators can pre-translate and post-edit texts using machine translation followed by human review. GTT was also recently integrated with YouTube to support user-generated captions. Other web-based CAT tools, with integration to machine translation include XTM cloud, MemSource, and Lingotek, among others. In general, companies that are not already vested in this class of tool should consider web oriented systems described in section 3.1.1.2. Established vendors are re-tooling their client server solutions as cloud based products as well. On the other hand, companies and government agencies that have stringent security requirements will probably want to continue hosting these services in their own data centers, or at least on computers they directly control.

3.1.1.1 Client/Server Based CAT Tools Client/server computer aided translation tools (CAT) have been in use for approaching twenty years, and are well entrenched within enterprise translation and localization departments. As a general rule, these tools were developed before the web and cloud based services caught on. These products represent a mature technology and include nearly every feature users expect from a translation platform. The typical implementation consists of a central server, usually hosted on the customer premises, but not always, which in turn interacts with client software, typically Windows based that is installed on employee and translator machines. This was the standard configuration for most types of enterprise software until about ten years ago, when vendors began migrating to web based environments. Newer systems that have been built from the ground up since then largely bypass this model. These systems, because of their age, offer a complete set of component technologies including translation memory, glossaries, document editing and spellchecking tools, and in some cases, project management tools. While they offer a complete feature set, many of these tools have significant disadvantages relative to new tools, among them: - Lack of cross-platform support, most are heavily focused on Windows, a problem for organizations that use other operating systems. - Lack of native mobile support. Translators and reviewers increasingly demand mobile applications so they can work where and when they want to. - Steep learning curve. These systems were designed primarily for translation and localization professionals, and can be intimidating to newcomers. They also have user interfaces that are pretty dated by todays standards. - High IT costs, due to the need to maintain the servers, update software and install client software on a potentially large number of machines.

3.1.1.2 Web Based CAT Tools Web based CAT tools are an important new class of translation tool, and are likely to replace traditional client/server tools (e.g. Windows apps) in many use scenarios for several reasons: - Cross-platform access via Windows, Mac, Linux and other operating systems. - Ability to support users on mobile devices either via native mobile apps or HTML5/Javascript, Java is also an attractive language for building highly capable, cross platform apps. - Support for agile development processes, where server side software is continually improved without forcing labor intensive client software upgrades. - Cloud based asset management, translation memory, and other features to centralize project and asset management. Users can work on a job from different devices at different times without losing work. - SaaS business models, with per user or volume based licenses enable customers to scale up and down as needed.

16

Copyright TAUS 2013

Copyright TAUS 2013

17

Translation Technology Landscape Report

Translation Technology Landscape Report

Web based translation agencies typically develop their own translation workbench and CAT tools, with varying levels of sophistication depending on the company and its customer base. Examples of companies in this category include: Gengo, Straker Translations, Fox Translate, Elanex (aka ExpressIT), and One Hour Translations. These companies provide clients with web based and API based ordering and project management interfaces (front end), and provide translators with an entirely web based editing and CAT environment (back end).

3.1.2

Translation Management Systems

Translation management systems provide an additional layer of process management and automation. They are used to manage how translation work is assigned to different participants, and to import/export documents and their translations to and from other systems, such as content management systems.

3.1.2.1 Document TMS Systems 3.1.1.3 Mobile Translation Tools Mobile translation tools are currently a novelty in the computer aided translation space, but will become a material requirement in most translation management systems because translators, especially in developing economies, use mobile devices as their primary means of accessing the Internet. Tools that do not support mobile access, either via HTML5/Javascript, or via native iOS/Android applications, will find it increasingly difficult to bring these translators into their workforce. Hong Kong based OneSky (www.oneskyapp.com) offers an example of how translation can be done via mobile devices. Their service is specifically geared to mobile app localization, so it fits well with a mobile translation-editing tool. The primary challenge in making CAT environments accessible via mobile devices is to deal with the restrictive display and user input interfaces common to these devices. Typically this means constraining the type of work mobile translators can do, for example, by having them work on shorter texts and documents. This limitation is less of an issue for tablet type devices since their displays are similar in size and resolution to a PC or laptop display. Translation management systems (TMS) provide a similar function as content management systems. These systems enable operators to: - Centrally manage resources to be translated (documents, video captions, localization files, etc). - Control which languages each project or resource is to be translated to. - Invite and assign translators, editors and reviews to each project/language. - Define workflows for translation, for example whether translations are auto-approved on receipt (from trusted sources) or must be independently reviewed. - Receive quality feedback and defect reports, and automatically route these to the appropriate translators and project managers. - Use machine translation (for pre-translation), translation memory, term glossary and advanced leveraging to re-use previously completed translations, to boost efficiency, and to increase quality and consistency. - Export completed translations to external systems (e.g. content management system, e-commerce platform, etc). These systems are fairly mature products in terms of functionality. The major shift underway today is a migration from customer premise based client/server systems (where the customer deploys and mana ges their TMS system) to cloud based SaaS (software as a service) offerings. Virtually all translation vendors starting up in the last few years offer cloud-based solutions. These include companies like MemSource and XTM International. Major translation technology vendors such as SDL have been retooling their product offerings across the board as SaaS services.

3.1.1.4 Stand Alone Utilities Stand-alone CAT tools are designed for use by independent translators who do a lot of their work offline. Their primary advantage is the ability to work independently of Internet connectivity, so a translator can download a project, copy it into their CAT tool, complete the task, and then check it back into the translation management system when they return. These tools are well entrenched and heavily used by professional translators, and will enjoy continued success in the marketplace, especially if they are upgraded to inter-operate with cloud based translation management systems. These tools combine a number of functions including: document editing, translation memory, glossary, and spellchecking. Examples of these tools include MemoQ and SDL/Trados.

18

Copyright TAUS 2013

Copyright TAUS 2013

19

Translation Technology Landscape Report

Translation Technology Landscape Report

3.1.2.2 Localization TMS Localization management systems are a specialized form of translation management system that are used primarily or exclusively for software localization. Localization has a unique set of requirements compared to document translation, so this differentiation makes sense. For example, when localizing software, it is very important to pay attention to word length, as this affects the layout of a user interface. For example, German, with its longer average word length and abundance of compound words, can easily break the layout of a webpage or application. Localization management tools are designed with these issues in mind, whereas document oriented tools are less concerned with issues like this. Key localization translation management system providers for websites include Transperfect and MotionPoint, who use a long established and popular proxy server approach. There has been a proliferation of vendors that offer localization as a turnkey SaaS offering over the past couple of years. Examples include companies like GetLocalization, OneSky, Smartling, Tethras, and Transifex. These companies provide turnkey, cloud based services that enable customers to upload their prompt files and other assets, control how they will be translated and to what languages, and if needed, order bulk translations from a professional language service provider that has integrated with the service. The customer can generally combine machine, crowd (bring your own translators) and professional (outsourced) translation.

3.1.2.4 QA Tools & Processes Translation management systems typically employ a number of tools and processes to maximize quality at different stages of the project. This is covered in section 3.1.3.6.

3.1.3

Translation Processes & Features

The features highlighted in this section can be found throughout the translation technology tool chain. These are best thought of, not as products, but as technologies or features that are embedded within larger systems. These features are generally utilized to increase translator productivity, improve efficiency/ cost, reduce errors, and encourage the use of consistent style and terminology, all of which are important to delivering high quality output.

3.1.3.1 Translation Memory Translation memory is a common feature in most computer aided translation (CAT) tools. It comes in two distinct forms, one of which is used to eliminate redundant work, and one which is used to assist translators in recalling similar translations from previous jobs. Exact match translation memory simply looks for translations whose source text exactly matches a previously completed translation. This is typically used to auto-fill translations for source texts that have already been translated. For applications where there are a lot of repeating texts, this can yield significant cost savings. Most translation memory technologies only retrieve exact matches from small project specific TMs. Many users only apply exact matches automatically if the previous sentence and the following sentence are also exact matches. This is sometimes called an ICE match. Fuzzy match translation memory looks up not just exact matches, but also looks for similar source texts, and scores them by their degree of similarity to the index text. While this is conceptually simple, calculating the distance between texts is not trivial. At best these systems can generate an approximate distance between candidate texts. Because of this, fuzzy match translation memory is best used as a style and memory aid for human translators, who can pull up a list of similar texts and their translations, and then use the relevant pieces as needed. This form of translation memory generally does not yield significant cost savings (the translator has to manually decide which texts to use and how), but is best used to reduce errors, and to encourage the use of a consistent writing style. Translation memory was first found to be useful when the user documentation was updated for new versions of software. There is typically a high reuse rate in these documents.

3.1.2.3 Translation Memory & Terminology Management Translation memory and terminology glossaries are generally implemented as a feature within larger systems. These services are used to improve translator efficiency and accuracy. Term glossaries are translation dictionaries that are built from frequently occurring words or phrases, for example technical terms, brand names, etc. These dictionaries are used to pre-translate recurring words and phrases, and to assist translators in using consistent translations, and also to avoid translating items that should be left as is, such as brand names. Translation memory is a record of previously created human translations. Typically, this is used to display similar source texts and their translations, both as a memory aid for translators, and as a style guide. Translation memory can also be used to boost translator efficiency, for example by enabling them to make small changes to previously created texts (see section 3.1.3.1 for information on translation memory). Both services are typically integrated into a translation management system, but there are examples of stand-alone translation memory services, such as MemSource.

20

Copyright TAUS 2013

Copyright TAUS 2013

21

Translation Technology Landscape Report

Translation Technology Landscape Report

3.1.3.2 Advanced Leveraging Advanced leveraging, otherwise known as sub-segment analysis, is used to construct a translation using sentence fragments. This type of CAT tool tries to stitch together translations from different segments, for example by identifying translations for phrases which, in turn, appear in longer texts elsewhere. This technology is useful as a translator productivity enhancer, but like fuzzy match translation memory, cannot be fully automated. It is an intermediate approach between translation memory and machine translation. Advanced leveraging has been offered as a feature in a small group of tools, including MultiCorporas MultiTrans and Atrils De Ja Vu, for a number of years now. The main adopters have been public bodies and the finance sector. These sectors do not benefit greatly from translation memory as their focus is not product updates. Instead advanced leveraging tools help them to efficiently reuse translations from large translation datasets, which are called corpora. By using such technology they aim to enhance productivity and terminological consistency. There is a general trend away from large sets of documentation to smaller volume, sometimes continuous streams of publishing. Given this trend there is scope for growing adoption of advanced leveraging technology for years to come.

- Decide who is allowed to review/post-edit returned translations or mark translations as complete/ published. - Decide who is allowed to edit or translate assets, on a per site, project or asset level (e.g. invite professional or crowd translators to a project).

3.1.3.4 Terminology Management Terminology management systems enable users to create, translate and manage dictionaries or term glossaries on a customer, project or asset level. Term glossaries are useful for defining how a set of words, phrases or proper names should be translated (or not translated). This is used not so much to reduce translation cost, but to encourage consistent vocabulary and style, and to prevent dissonant translations of phrases that recur frequently. While glossary terms can be auto-translated, the best practice is to recommend their use to translators, who then click on an accept button or link, to paste the recommended phrase into their translation editing environment. Translators should be able to override the recommendation because there are often situations where the translated text needs to be edited for grammatical correctness. This is basically a required feature for any serious CAT tool, and is available in some form on almost every translation and localization platform or tool in use today. There are also a few vendors that specialize in terminology management tools, such as Interverbum Tech.

3.1.3.3 Translation Process Management Translation process management (TPM) refers to adminstrative and workflow control tools that enable a system owner or administrator to control when and how translations are done, the quality assurance/review process, etc. The details of how this is implemented vary widely from system to system. Some systems provide the administrator with the ability to set broad controls on how things are done, while others allow the administrator to control the translation process at a fine-grained (per job or task) level. Generally speaking, TPM tools allow the system administrator to control the following: - Decide which assets or asset classes are routed for translation. - Decide which translation resource or service to use for an asset class and target language (e.g. machine/human translation, select LSP, SLA, etc). - Decide whether returned translations should be auto-approved, or queued for review and post-editing.

3.1.3.5 Controlled Authoring Controlled authoring tools, such as Acrolinx, are used before translation even begins. These tools are designed to maximize the source contents quality and consistency, and provide the following key features: - Spellcheck and grammar correction: to catch basic mistakes during authoring. - Terminology management: to insure that technical terms are used consistently. - Brand protection: to insure that brand names and proper names are used correctly. - Edit for MT: to enable users to prepare source text for machine translation.

22

Copyright TAUS 2013

Copyright TAUS 2013

23

Translation Technology Landscape Report

Translation Technology Landscape Report

The basic goal is to prevent authors from using inconsistent terminology, catch common errors, and generally produce standardized output that is search engine friendly, and translation ready.

3.2

Translation Technology Trends

3.1.3.6 Quality Assurance Translation quality assurance relies on a combination of technology and processes to prevent errors from creeping into translation projects. The QA process starts before a project is sent out for translation, for example be sanitizing text to protect non-translatable elements, disambiguate the source text, provide comments and context, etc. Once translation is in progress, QA is implemented in several ways at different stages of the process: - [prior to translator assignment] : decide which translator(s) are best match to the task, factoring in skill level, prior QA scores, availability and domain of expertise. - [during translation] : computer aided translation (CAT) tools draw upon resources such as spellcheckers, term glossaries, and translation memory to increase productivity, catch common errors, and encourage the use of consistent style and terminology. - [post translation] : completed translations are generally sent to an editor or trusted reviewer to be spot checked, and post-edited as needed (or sent back to the translator(s) for additional work. - [post translation] : highly automated systems may send translations to one or more randomly chosen translators for a blind peer review and score. Editors only intervene to evaluate translations with an ambiguous score. - [post delivery] : highly automated translation services may also provide a widget or API through which the customer can comment on, score or request re-translation of texts. This allows high volume translation applications to continually improve translations without subjecting every text to embargo, and also enables the clients customers to contribute feedback about translations (e.g. in moderated crowd translation scenarios). Examples of specialized QA tools are QA Distiller, Error Spy and Xbench. The industry reference for quality evaluation is the TAUS Dynamic Quality Framework. This documents best practices for quality evaluation, provides a method to select fit-for-purpose evaluation approaches and a set of tools to establish quality evaluation metrics and benchmarks.

There are several trends affecting this sector: the transition from desktop to client/server and then to cloud based (SaaS) services, the translation from licensed software to subscription (SaaS) business models, and the trend toward integration solutions (for example, translation management systems that are integrated with web publishing or CMS platforms).

3.2.1

From Desktop to Server and now Cloud

Computer aided translation tools have gone through three distinct phases of development since their inception. First generation tools were largely designed to be used as stand-alone applications, largely due to the fact that network connectivity was limited at the time of their development, and due to the low tech history of the translation industry, as it predates the computing industry by far. Some of these tools, such as SDL/Trados, continue to thrive, and can be upgraded to interoperate with new cloud based translation platforms. As companies deployed local and wide area networks, and then later connected to the Internet, client/ server CAT tools and systems became commonplace. These tools were developed during the 1990s, and mirror the technologies available at the time. These systems enabled their operators to centralize the storage and management of translation assets and project management, which enabled significant productivity gains in translation workflow. These tools were largely Windows based, and most utilized proprietary communication protocols for client/server communication. The latest generation of CAT tools are largely cloud based SaaS (software as a service) offerings that eliminate the need for the operator to own and manage on premise equipment. These services, many of them developed by emerging companies such as XTM International and Smartling, are also built from the ground up around web based technologies, and are accessible from virtually any operating system or device. This is a significant advantage over second generation tools, as they are largely tied to the Windows operating system, and are not easily accessed via the web or mobile devices.

3.2.2

From Licensing to Professional Services and SaaS

Translation platforms have gone through a similar evolution from per-seat perpetual licenses to professional services, and most recently, to a SaaS (software as a service) pricing model. Early products were sold as shrink-wrapped software, essentially a perpetual license based on the number of installed seats. Most software was packaged this way at that time, so this approach made sense in its day. Since then, small and large translation technology companies, from startups to industry leaders like SDL, are shifting to SaaS pricing models.
Copyright TAUS 2013

24

Copyright TAUS 2013

25

Translation Technology Landscape Report

Translation Technology Landscape Report

These contracts are typically priced as monthly or annual subscriptions, with rates linked to one of several variables, including: the number of words or segments stored in the system, translation volume, number of target languages, or number of active users. Transifex, for example, prices its localization management service based on the number of words stored in its central repository. The pricing formula varies from company to company, but is generally linked to usage, storage or translation volume, so each customer pays based on their usage of the service.

supply chain management systems that, in addition to managing translation workflow, also automate the process of interacting with external translation service providers. Purpose built TMS systems are often embedded within other products, such as content management systems. While these do not provide full supply chain management, they provide most of the functions needed to manage translation workflow for typical users. Examples include the Translation Management Tool, by MD Systems, for the Drupal content management system, and Word Press ML (wpml.org), a multilingual translation management tool for the popular Word Press platform. Stand-alone TMS systems, on the other hand, are used to manage translation workflow for many different types of assets, from documents, to websites. Since different types of content require diffe rent workflows, and often different service providers, an enterprise TMS enables operators to manage not just the translation process, but also the vendor supply chain. Examples of vendors in this category include: SDL, Across Systems, and Lingotek.

3.2.3

Integration with Content Management Systems

Leading translation management systems, such as SDL World Server, are integrated with popular content management systems, including Drupal, Sharepoint, and others. This has become a requirement as companies use CMS platforms to manage their source content, editorial and publishing workflows. Integrating with these systems enables translation companies from re-inventing tools that have already been done well by other companies. Middleware companies, such as Clay Tablet Systems, offer services that connect a variety of translation management systems with the leading content management systems. They have integrated with most of the popular corporate CMS platforms. This is an attractive option for translation technology companies, who can integrate once with Clay Tablets system, and then automatically inherit support for every CMS platform they have integrated with.

3.3.1.1.2 Localization Management Systems Localization management systems are a special type of translation management system, and focus on the tasks and challenges that are unique to software localization. There has been a proliferation of SaaS based services in this area, especially for mobile app localization for iOS and Android platforms. These services enable users to upload their application prompt catalogs using a variety of localization file formats, and manage them and their translations via a centralized repository. Newer services borrow from collaborative development platforms like Github to support agile localization, where localizations are continually refined and deployed with incremental upgrades. Examples of these services include GetLocalization, Onesky, Tethras, Transfluent, and Transifex, all of which are accessible to both small and large companies. On the high end of the market, companies like Moravia Worldwide and Welocalize provide software localization and testing to large software companies such as Microsoft and Oracle.

3.3

Translation Technology Value Chain

3.3.1 Supply 3.3.1.1 Types of Providers A number of platforms are commonly integrated into the translation supply chain, including translation management systems, localization management systems, captioning and subtitling platforms, standalone translation memory services and live interpretation systems. Each of these directly manage or interact with the translation workflow, each with its own use cases and special requirements.

3.3.1.1.3 Translation Memory (Stand Alone) Translation memory, while it is generally integrated into translation management systems, is also available as a stand-alone, cloud based service. Most translation management systems and CAT tools provide some form of translation memory. Nearly all provide exact match translation memory, to avoid re-translation of repeating texts, as well as termino logy glossaries or dictionaries, to promote consistent translation. Most also provide fuzzy match transla-

3.3.1.1.1 Translation Management Systems Translation management systems enable users to centralize and control their translation workflow, as well as to manage the assets being translated (documents, videos, and other content). These systems range from simple, purpose-built TMS solutions, such as Word Press translation tools, to complete

26

Copyright TAUS 2013

Copyright TAUS 2013

27

Translation Technology Landscape Report

Translation Technology Landscape Report

tion memory, as a memory aid and style guide for translators. If a TMS or CAT tool does not provide at least basic translation memory, this is a serious deficiency that should rule out the use of that product.

3.3.1.2 Business Models 3.3.1.2.1 Licensed

3.3.1.1.4 Audio/Video Captioning Systems Audio/video captioning and subtitling systems have a unique set of requirements that differ from text based content. Because of this, vendors tend to specialize in this area. Captioning systems must deal with a number of technical issues, including: - Support for a wide variety of video file/stream formats.

Until recently, most translation tools were sold as licensed software, typically priced per user/seat. This is how most enterprise software was sold prior to the transition to cloud/SaaS based services. Even now, many enterprise software vendors sell their software this way. Cloud based offerings will, however, force many vendors to rethink their pricing model because these offerings enable customers to scale their tools budget up and down in response to usage, and to avoid committing to expensive upfront purchases.

3.3.1.2.2 Cloud/SaaS - Tools to transcribe audio tracks to create source language captions. - Tools to translate captions into one or more languages. - Ability to time code captions so they appear at the right time during playback. - Number of registered or active users. - Tools to review and post-edit translations. - Number of target languages supported. Several vendors, including dotSub, Amara, and Viki specialize in video captioning and subtitling. dotSub and Amara provide tools that enable video content producers to generate captions using a combination of crowdsourced translations, and optional professional translators. Viki, meanwhile, is a purely crowd based system, and has created a vibrant translation community (several million active users) around captioning video programs from around the world. - Number of projects or assets stored on system. - Number of words hosted on the system. - Translation volume per month. 3.3.1.1.5 Interpretation Systems Interpretation systems and services enable users to have telephone calls translated in real time, using either sequential or simultaneous interpretation (simultaneous interpretation enables both parties to converse naturally without pausing for an interpreter to repeat what they said). Like video captioning, this is also a specialist market, with a different set of dominant vendors compared to other sectors of the translation industry. These services are typically accessed via a telephone or voice over IP (VoIP) call to the interpretation service which, in turn, bridges on or more interpreters onto the call. Providers in this category include Language Services Associates and Language Line. In all cases, customers are largely able to avoid up front commitments, and can also evaluate a product at relatively low cost and risk prior to scaling up to production use. Vendors that fail to offer a viable SaaS option will risk losing market share to emerging companies and services that do. Cloud based (SaaS) offerings are becoming more common in the translation industry, and are the overwhelming favorite among new companies. These services are typically priced as monthly subscriptions, with the fee based on any number of factors including:

3.3.1.2.3 Translation Services Another business model employed by some tools and platforms is a loss-leader strategy, where the software or platform is offered for free, while the vendor charges for professional translations brokered through their system. Cloudwords, for example, operates a hosted marketplace and translation/project management service that enables customers to select from many translation agencies, and to centrally manage their projects. While the Cloudwords service is not free, it is quite inexpensive. Cloudwords, in turn, charges a commission for the projects brokered through its platform.

28

Copyright TAUS 2013

Copyright TAUS 2013

29

Translation Technology Landscape Report

Translation Technology Landscape Report

Many localization service providers follow a similar model, and offer their hosted platform for an inexpensive monthly fee, then make the bulk of their money by selling professional translations on a per word basis. Translation agencies, in turn, make most or all of their money by selling translation.

3.3.2.1 Individual Translators Individual translators participate in the supply chain in three main ways: via translation agencies, translation marketplaces and direct-to-customer relationships. Traditionally, individual translators would interact with customers via language service providers (translation agencies). LSPs provide several services to translators: customer acquisition (sales), project management, and administrative support (billing, collections, etc). They are still the dominant channel translators go through, but new technologies are enabling customers to automate more of the process, and in some cases build direct relationships with translators. Googles G-Community is an example of such disintermediation. Translation marketplaces, such as Cloudwords, enable translation buyers to request competitive quotes, place orders, and manage their projects via an SaaS offering. This is attractive for companies that have complex translation needs that require the use of multiple agencies. Simpler marketplaces, such as ProZ and Translators Cafe, enable users to request competitive quotes, but do not provide integrated project management tools. Translators will often decide to work directly with their favorite clients. New tools make it easier for people who are not translation industry professionals to assume many project management tasks, and therefore to work directly with a hand picked crew of translators.

3.3.1.3 Channels and Platforms These translation services are offered through both direct and channel partner systems, depending on the amount of automation and system integration required. For example, a translation agency that offers a self-service web translation tool for ordering translations for Word documents will typically sell direct to end users. On the other hand, systems that require a lot of automation will often have translation built in. Examples of integrated solutions include: - Content management systems that have translation management built in (e.g. Drupal). - Captioning and subtitling services that support translations as part of the captioning process (e.g. 3PlayMedia, Amara). - Multilingual e-commerce systems that automatically translate source language content as new products are added. - Custom applications built around a language service providers system or API.

3.3.2.2 Language Service Providers (Translation Agencies) The advantage of integrated solutions is that they greatly reduce the amount of work the customer needs to do to utilize translation (in some cases, they automatically request translations from LSPs behind the scenes so no administrative work is required of the users). Their main disadvantage is that these integrations are difficult and expensive to do, so LSPs and third party solution providers are slow to create integrated tools for new markets and applications. Language service providers typically interact with the supply chain via more direct means, since they serve in an intermediary role for individual translators. They typically focus on direct sales, and to a lesser extent reach customers through translation marketplaces. One of the services they provide is outbound sales and account management to larger corporate clients, something individual translators are not necessarily skilled at or financially equipped to do. 3.3.2 Demand The demand for translation technology comes from several sources: individual translators, language service providers (translation agencies), and publishers/content producers. Individual translators and agencies are typically looking for computer aided translation tools, so they can work more efficiently. Larger agencies will also invest in translation management tools (or use their customers translation management tools, depending on the situation). Publishers and content producers, on the other hand, are generally looking for process automation, and are less concerned about the details of how translators do their work (this is often done by an outsourced agency). There are a significant number of translation agencies that work primarily as subcontractors for larger service providers. These typically focus on a small number of languages or specialize in specific sectors. They undertake the production, while the larger firms maintain the relationship with customers. To a lesser extent, translation agencies will access customers through translation marketplaces like ProZ and Cloudwords. The larger agencies generally do not feel they need to participate in these communities, except to recruit translators, since they have well developed outbound sounds and account management

30

Copyright TAUS 2013

Copyright TAUS 2013

31

Translation Technology Landscape Report

Translation Technology Landscape Report

capabilities. Small and mid-sized agencies, which tend to specialize by language, services or domain of expertise, do actively participate in these.

3.4
3.4.1

Opportunities and Challenges in Translation Technology


Interoperability and Standards

3.3.2.3 Publisher Organizations Publisher organizations, or content producers, are generally concerned with delivering translations once completed, and generate requests for translation for new content, whether it is a website article, travel lis ting or other item. They will typically use a content management system or e-commerce platform that has been integrated with the translation supply chain to implement an automated or semi-automated workflow. There are many different types of publisher organizations, since anyone who produces content in print or digital form can be considered a content producer. Different types of content producers will typically use different types of content management and e-commerce systems. For example, the online version of a newsmagazine might use a content management system like Drupal to host its website, while a flight booking service would use a completely different system. These customers are typically looking to integrate their existing publishing and e-commerce platforms with translation resources that automate the process of detecting new content, queuing it for translation, and then storing/displaying the translated content when needed. Each type of content producer has different requirement where translation is concerned. For example, an e-commerce site may have tens of thousands of product listings that need to be translated and kept in sync, and may be concerned more with search engine visibility than the highest possible translation quality. A magazine publisher, on the other hand, will have fewer but longer texts needing translation, and will be much more concerned with output quality. In most of these cases, the customers primary concern is system integration between their content delivery platform and their translation management system.

The translation industry is notorious for its lack of standards. While there are standard file formats, like TMX and XLIFF, and a long list of secondary file formats, these standards tend to be over-engineered, and because of this difficult and expensive to integrate into products. XLIFF and TMX in particular are file formats that are specific to translations and aligned texts. This is the opposite of the situation with web services, where simple REST APIs and data interchange formats like JSON are dominant. The basic issue facing the translation industry today is not the lack of a standard file format in which to store texts and their translations (both TMX and XLIFF work well for this), but the lack of standard procedures to initiate common translation tasks (for example, when a content management system needs to call a third party translation service to request a translation for a new document, there is no standard protocol). To this end, TAUS has introduced a web services reference API that standardizes the way common tasks are done in the context of a publicly accessible web service. This will enable translation technology and service providers to provide a common implementation that is the same for participating vendors. This, in turn, will enable developers to leverage re-usable code, libraries, and extensions rather than build custom integrations for each translation service or technology they choose to work with. At the time of writing this report the authoring team is aware of over sixty translation servces APIs that have been developed by vendors. Representing wasteful duplication and growing interoperability issues. The open API specification was made public in September 2012.

3.4.2

Measuring and Benchmarking Quality

Measuring and benchmarking translation quality in a consistent way is another challenge for the industry. For many years now the industry has applied a one size fits all approach to quality. It is an area with an inherent level of subjectivity. There has been no industry-level consensus on the diffe rent expectations for quality for different types of content and purpose. An industry initiative, the TAUS Dynamic Quality Framework, has begun to address this common issue. In 2012 a knowledgebase on quality assurance best practices, content profiling methodology and set of tools for quality evaluation were launched. These tools introduce industry benchmarking. Such business metrics have been missing from the translation industry. A number of automated metrics have been developed to measure quality. These can provide an understanding of relative levels of quality, for example to indicate if one system has produced a better translation than another. However, they do not provide absolute measures of quality. They dont provide a measure of how quality is measured, nor do they measure factors such as style, grammar, etc.

32

Copyright TAUS 2013

Copyright TAUS 2013

33

Translation Technology Landscape Report

Translation Technology Landscape Report

Unfortunately, due to the variance of human language, as well as the fact that there are often many ways to translate the same thing, there may not be an algorithmic way to measure final linguistic quality. For the next few years, automated and human mediated quality assurance processes will be required to provide a precise measure of quality. For example, statistical methods can be used to measure the likelihood that a given text is a decent translation (using a corpora of high quality translations as a baseline for comparison). Humans, on the other hand, are much better at catching more subtle problems with word order, grammar and word choice.

Further reading: How to increase your leveraging, Rahzeb Choudhury and Richard Sikes, TAUS, April 2010 A Common Translation Services API, Brian McConnell and Prof. Dr. Klemens Waldhr, TAUS, September 2012 Dynamic Quality Framework, Dr. Sharon OBrien, Rahzeb Choudhury, Dr. Nora Aranberri, Jaap van der Meer, TAUS, November 2011 Advancing Best Practices in Machine Translation Evaluation, Rahzeb Choudhury and Dr. Nora Aranberri, TAUS, July 2012

34

Copyright TAUS 2013

Copyright TAUS 2013

35

Translation Technology Landscape Report

Translation Technology Landscape Report

Translation Industry Evolution


Translation to Integration

4.

Machine Translation

Machine translation, available for several decades, has advanced dramatically in terms of speed and quality, especially statistical MT engines, however, it is not and is not expected to replace human translators in the foreseeable future. The main use cases for machine translation are applications that require real-time or near real-time interaction, for assimilating texts and chat, and as a productivity tool. Content producers are also generating exponentially increasing volumes of material, and in many cases, human translation is simply not economically or technically feasible. In the context of professional translation, machine translation is often used in a number of situations, including: - As a short-term, placeholder translation for time sensitive content while awaiting human translation. - As an immediate good enough translation (for example the long tail of infrequently viewed products in an online store). - As a draft translation for post-editing by human translators (there is significant debate about how well this works with present tools). - As a way to detect problem texts that need further attention by reviewers and editors.

4.1
4.1.1

Approaches to Machine Translation


Rules Based

Source: TAUS

Rules based machine translation, developed several decades ago, was the first practical approach to automatic translation. This type of translation engine works by parsing a source sentence, analyzing its structure (for example, determining which words are used as verbs or nouns), and then converting this into an intermediate, machine-readable code. This is, in turn, transformed into the target language. The advantage of rules based translation is that a sufficiently sophisticated translation engine can translate a wide range of texts without having been trained with a large number of examples, as in statistical machine translation. The disadvantage is that it is necessary to build custom parsing software and dictionaries for each language pair, and that it is quite brittle. Rules based translation engines dont deal very well with slang or metaphorical texts, for example. For this reason, rules based translation has largely been replaced by statistical machine translation or hybrid systems, though it is useful for less common language pairs (where there are often not enough parallel texts to train a statistical machine translation engine).

Between 2000 and 2010 demand grew for application scenarios involving machine translation (MT). In that same period statistical machine translation became a viable mainstream technology. Citizens became publishers and trusted sources of consumer advice en masse. User generated content surfaced as a translation driver. Business cases began emerging for language strategies integrating translation into enterprise systems to aid all business functions, as well as into devices and applications.

36

Copyright TAUS 2013

Copyright TAUS 2013

37

Translation Technology Landscape Report

Translation Technology Landscape Report

4.1.1.1 Products and Practitioners The two primary providers in this category are Systran, PROMT, Lucy Software (commercial software) and Apertium (open source). Language specific providers include CCID (Chinese) and Toshiba (Japanese). Systran has been in operation for decades, and was a pioneer in web translation (their translation engine powered the Babelfish web translation service back in the 1990s). They cover most major language pairs, and most recently has released a hybrid rules/statistical translation engine to upgrade their product line. Apertium is an open source project sponsored by Universitat dAlacant in Spain. They have developed an open source rules based translation engine that enables users to create custom translation engines for any language pair. This solves an important problem for rules based translation engines, as commercial vendors do not invest in development for less common language pairs, such as Spanish Catalan. Developing a custom engine is a large task, as it requires the development of dictionaries, parsing rules, etc, which requires the involvement of linguists who are experts in the source and target languages.

4.1.3 Statistical Statistical machine translation is currently the most popular form of machine translation in use today. The process works by training the translation engine with a very large volume of parallel texts (source texts and their translations), as well as monolingual corpora. The system looks for statistical correlations between source texts and translations, both for an entire segment, but also smaller phrases, or N-grams, within each segment. It then generates confidence scores for how likely it is that a given source text will map to a translation. The translation engine itself has no notion of rules or grammar. The key advantage of statistical machine translation is that it eliminates the need to handcraft a translation engine for each language pair, as is the case with rules based translation. Provided you have a large enough collection of texts, you can train a generic translation engine for any language pair. The main disadvantage of statistical machine translation is that it fails when it is presented texts that are not similar to material in the training corpora. For example, a translation engine that was trained using technical texts will have a difficult time translating texts written in casual style. Therefore, it is important to train the engine with texts that are similar to the material you will be translating on an ongoing basis. Even with large and suitable training corpora, statistical machine translation does not generally produce publication quality text. It frequently translates items out of context or uses the wrong word order. However, it generally translates well enough that it is suitable for comprehension. If you need publication quality translation, you will want to have some sort of human review and post-edit process, which many commercial MT engines provide as an option.

4.1.2

Example Based

Example based machine translation is similar to statistical machine translation, as it uses a large volume of parallel texts (source segments and their translations) to train the system. The logic behind examplebased translation is that it treats sentences as a collection of often repeated phrases that can be translated independently and then combined to form a translated sentence. The problem with this approach is that you need a very large corpus of phrases and their translations. This requires a lot of data, and also requires the phrases and translations to be perfectly aligned, which typically requires manual effort, whereas statistical machine translation systems can be trained in a fully automated process. Example based machine translation has not been widely deployed as a commercial service. However, there is an open source platform, Cunei, which enables developers to build their own example based MT engine (similar to the Apertium platform for rules based translation). Most translation engines in development and commercial use today are statistical or hybrid systems.

4.1.3.1 Products and Practitioners Many companies offer statistical machine translation, most of them derived from the Moses open source translation engine. Moses has been an important development for the machine translation sector because companies can build custom MT engines without rewriting the translation engine itself, but simply provide the parallel texts used to train the engine. This has enabled many companies to launch custom MT offerings with a modest effort. - BeGlobal (SDL): BeGlobal is SDLs machine translation offering. Derived from its acquisition of Language Weaver several years ago, BeGlobal enables users to combine machine translation with professional translation and post-editing. A common workflow is to machine translate a text on the first pass, and then have human translators and editors review the output and correct it. These corrections can, in turn, be fed back into the translation memory to further train it. - Google Translate (free): Google offers a free web translation service that is based on its own translation engine and research. The service can translate to and from over 50 languages, and is regarded as a benchmark for translation quality for non-specialized translation engines.
Copyright TAUS 2013

4.1.2.1 Products and Practitioners Example based machine translation is not available as a stand-alone commercial product or service, but you can find two open source projects: Cunei and Marclator. These are open source projects, and are only suitable for expert software developers and system administrators, as they are not turnkey solutions designed to be touched by end users. They are great for experimental use, but if you are looking for user-ready platform, look for statistical machine translation platforms.
38
Copyright TAUS 2013

39

Translation Technology Landscape Report

Translation Technology Landscape Report

- Microsoft Bing Translator (free): Microsoft also offers a free web translation service that is similar to Google Translate, but also includes many options for people to score and post-edit translations using an interactive (WYSIWG) editing tool. This is an especially interesting option for companies that have a large community of readers who can be tapped to edit and improve translations to benefit other users. Both Google Translate and Bing Translator offer free consumer services as part of the companies respective consumer offerings, however the use of the API is paid. Microsoft has also recently launched Microsoft Translator Hub, which offers free customization. The consumption of translation, customized or not, via the API, is paid. - Moses (Open Source): Moses is an open source statistical machine translation engine. It is widely used within the industry to build customized MT engines. Because it is open source, people wishing to develop a custom engine can focus on obtaining the training corpora rather than writing their own statistical machine translation engine (a difficult task that is beyond the abilities of most developers). - A growing number of vendors market MT solutions using Moses as the core engine. These are often SaaS offerings providing customized MT for specific segments. Examples include, Capita Translation and Interpreting, DoMY CE, Firma8, Lets MT, PangeaMT, Safaba Translation Solutions, Simple Shift, and Tauyou. One of the earliest providers was Asia Online with a client server offering. The sophistication of the offerings depend on the client segments being targeted. At a minimum the vendors seek to solve engineering gaps in Moses, ensuring ease of use. At the other end of the spectrum they combine other natural language processing techniques with Moses to improve the quality of translations.

4.1.4.1 Products and Practitioners Several companies offer hybrid translation platforms, mostly focused on the enterprise market, among them: - LinguaSys: Developed Carabao, a hybrid translation engine that targets the enterprise market. - PROMT: Started out with rules based translation, and have since upgraded their product line to support hybrid translation. - Systran: Been developing machine translation software for 40 years, and has upgraded its tools to combine statistical and rules based translation.

4.2
4.2.1

Machine Translation Trends


Customized Engines

4.1.4 Hybrid Hybrid translation engines combine elements from rules based and statistical machine translation to leverage the strengths of each approach. This is an area of ongoing development, so we expect many systems to evolve into hybrid platforms. There are two main categories of hybrid systems: rules based engines that use statistical translation for post processing and cleanup, and statistical systems that are guided by rules based engines. In the first case, the text is translated first by a rules based translation engine. This translation is then processed by a statistical machine translation engine, which corrects errors made by the rules based engine, or replaces the text entirely if needed. In the second case, the rules based translation engine does not translate the text but assists the statistical translation engine by inserting meta data (e.g. noun/verb/ adjective, present/past tense, etc).

Statistical machine translation software is generic, meaning the same translation engine can be used for any number of language pairs or specialized applications. There is no need to handcraft the translation software as there often is with rules based translation. There is a market need for customized, or adapted, translation engines that are trained with corpora for a specific language pair, or for a particular industry or domain of expertise. Doing so results in higher quality and more consistent translation for that application. Customized engines typically provide results translations that are twice or more better than the translations of free online systems. A number of companies are focusing on language or domain specific translation engines as their speciality, among them: Asia Online, Capita Translation and Interpreting, Lets MT, Lucy Software, PangeaMT, PROMT, SDL/BeGlobal, Safaba Translation Solutions, Simple Shift, Systran and Tauyou. We expect this to be an area of ongoing development, along with hybrid machine/ human translation platforms.

4.2.2

Real-time Customization

The time it takes to create customized machine translation engines has come down significantly recently. Given sufficient and appropriate translation data, it is possible to create a machine translation engine for a specific industry, organization or even product within a day. We are fast approaching the data when it will be technically feasible to create a purpose built machine translation engine on demand. The only hurdle would be the availability of translation data. Many companies already offer SaaS based solutions where users are able to combine their data with that of the vendor to create a custom engine.
Copyright TAUS 2013

40

Copyright TAUS 2013

41

Translation Technology Landscape Report

Translation Technology Landscape Report

4.2.3

Open Source Technology

Open source has already had a major impact on machine translation. There are mature and actively used open source platforms for all types of machine translation. Apertium enables people to build their own rules based translation engines. Moses is widely used by people building custom statistical machine translation systems. Open source is important not so much to end users, but to language service providers who want to develop customized or adapted translation engines. Because they dont have to build or support the underlying translation engine, they are freed to focus primarily on compiling training data needed to build a clients system. Thus its primary role, in the context of MT, is to reduce the R&D costs that these companies would otherwise incur in bringing their products to market.

SDL BeGlobal is a good example of this type of integration. Their machine translation engine, which can also be trained with custom corpora, is fully integrated into SDLs Translation Management System, and can be used to generate first draft texts that are, in turn, processed and post-edited by staff trans lators or outsourced workers as needed.

4.2.6

From Licensing to Professional Services

Across the board, vendors are migrating to a software as a service business model. Few companies want to pay up front for site licenses anymore, and would prefer to pay based on usage. Machine translation lends itself nicely to this business model, as translations can be billed on a per word or per character basis. Google, for example, charges $20 per million characters for the use of their Translate API (version 2.0). Pricing models vary. Some vendors charge on a per word basis, others per language, but in general expect some combination of the two for most offerings.

4.2.4

Data Sharing

Public translation memories will play an important role in the improvement of machine translation because they can be used to generate high quality training corpora. This will also reduce development costs for companies because they can re-use an ever growing baseline corpora that many parties feed into, and can focus on collecting the high value information that is specific to their clients project. The ability to share translation memories is also important for secondary languages where there are often smaller batches of translations stored in many different locations, typically language service providers. If these can be pooled and shared, this will make it much easier for companies to create high quality translation engines for secondary language pairs. The TAUS Data repository is the largest open platform for shared translation memories. See Section 7, Translation Data and Translation Technology, for more information.

4.3

Machine Translation Value Chain

4.3.1 Supply 4.3.1.1 Types of Providers There are a wide variety of service providers in the machine translation space. These include: - Consumer/web translation services, such as Google Translate and Microsoft Translator. These services are trained with general purpose corpora and offer decent translation for comprehension, but not publication quality output. - Custom/adapted machine translation for specific language pairs or subject matter (domain of expertise). These providers typically help clients build domain specific translation engines using their own training corpora (enterprise customers often have very large translation memories which can be used for this purpose). Lets MT, Asia Online and Tauyou are examples of this. - Hybrid translation engines. These providers combine rules based and statistical translation technology to develop adapted systems. Systran and PROMT are good examples of this type of provider. - Human/machine translation engines. These typically use machine translation for a first pass, and then have human translators review the proposed translations and edit or replace them as needed. SDL BeGlobal is a good example of this type of engine. Microsoft Bing Translator also enables this type of workflow, although they use a more ad hoc approach thats designed for crowd translation.

4.2.5

Human/Machine Translation

Another important trend is the growing use of machine translation for a first pass, with human trans lators (users, professional translators or both) providing feedback, suggested translations, or direct post-edits. This approach is becoming popular because it is often unknown if a particular document will be read by enough people to justify the cost of professional translation. For example, a document might be translated by machine, but then when traffic reaches a defined threshold, would be sent to human translators and editors for further review and post-editing. We expect this type of cost optimi zation workflow to become popular, especially with web and mobile content producers who have to generate fast and low-cost translations. Also, if customers know that content is being machine translated, and only later cleaned up by people, their expectation of quality is markedly different.

42

Copyright TAUS 2013

Copyright TAUS 2013

43

Translation Technology Landscape Report

Translation Technology Landscape Report

4.3.1.2 Business Models Machine translation is typically offered as Software as a Service, although a few vendors like Systran also offer licensed products designed for standalone use. Machine translation is a memory and CPU intensive application, and generally requires fairly high end hardware to perform well. Some systems, like Moses, are also difficult to administer. SaaS offerings enable customers to offload upfront capital costs and system administration to their service provider. These services are generally priced based on the volume of material being translated. Pricing models may be per word or character, or split into tiers that correlate fairly directly to word count. For custom or hybrid systems, there is typically a baseline monthly or annual fee that accounts for the cost of hardware dedicated to each customer.

Tokyo based Gengo, for example, offers a highly automated translation service for software developers. Their service is accessed via a web services API, so the process of requesting and retrieving translations can be completely automated, even if the translations are done behind the scenes by humans. They offer several different levels of translation, including a free machine translation option (powered by Microsoft Bing Translator). We expect to see offerings like this become fairly standard, as they allow the client to decide on a case by case basis how much they are willing to spend on a given task. The type of machine translation they need varies depending on the type of clients they have. Gengo typically focuses on fairly generic texts, so consumer/web translation engines are suitable for them. LSPs that work with less common languages or domains of expertise will probably want to use a custom MT engine such as Tauyou, LetsMT or AsiaOnline. Another model involves LSPs and MT specialists working very closely to serve enterprise clients. Safaba Translation Solutions and Welocalize are both good examples of this.

4.3.1.3 Channels and Platforms Machine translation services for consumers (e.g. web translation) are generally marketed direct to end users via sites such as Google Translate. While Google Translate has a dominant position in this market, there is plenty of room for companies that specialize in other language pairs to enter this space. This is particularly true of countries that have a different ecosystem of service providers. For example, Yandex in Russia or Baidu in China, where these companies are the dominant internet search providers. Corporate translation services are typically offered to customers via a combination of direct, stand-alone offerings, and integrated solutions that are part of a larger translation toolset. Direct offerings include services like Systran or PangeaMT, which can be used as an independent service, with or without a translation management system. SDL, on the other hand, promotes its BeGlobal product as part of a translation supply chain management system, and have done a lot of work to integrate it into other tools such as their translation management system. This allows them to sell machine translation as an option in a larger suite of services. Both approaches require the vendor to have an outbound sales capability since these clients take a long time to cultivate, and also often need time to transition off of legacy systems. 4.3.2.2 Consumer/Individuals Direct Google Translate and Microsoft Translator (Bing Translator) are the two dominant machine translation services for consumers. In addition to machine translation, there is also a well developed market for web based professional translation, where the customer uploads a Word document, PDF file, etc to have it translated by professional translators. Several companies including Gengo, One Hour Translation, expressIT (Elanex), Straker Translations and others offer some version of this type of service. We expect hybrid translation service to become a standard offering, as it enables customers to use free machine translation when they need to comprehend the contents of a document, but do not need to publish it or share it with customers. By offering the ability to easily switch over to paid, professional translation these providers can offer a convenient all-in-one solution for day to day translation requests. For example, the customer can quickly obtain a draft machine translation, and then upgrade to professional translation if the task warrants it.

4.3.2 Demand 4.3.2.3 SME and Enterprise Direct 4.3.2.1 Language Service Providers Language service providers are increasingly using machine translation as part of their process. Though initially resistant to using it at all, they need it to serve customers who need to translate a lot of material very cheaply and very quickly. Machine translation can be offered as part of a tiered service where the customer can decide on a per request basis, what level of quality they need for a particular item.
44
Copyright TAUS 2013

Some companies, such as SDL BeGlobal, AsiaOnline, and others offer products that can be sold direct to customers, enterprise clients in particular. They frequently have a very large volume of material to be translated, and cannot afford or wait for professional translation. The companies best suited to sell direct have a good outbound sales capability, as the direct to enterprise sales channel has a slow sales cycle (months or 1-2 years).
Copyright TAUS 2013

45

Translation Technology Landscape Report

Translation Technology Landscape Report

In this scenario, the machine translation platform can be used either as a standalone service, or can be integrated into other systems where automation is a requirement. If the customer has a fairly standard workflow (for example, to upload documents for translation, and then post-edit as needed), they can often use the built in toolset provided with the translation engine. If the customer needs to integrate machine translation into a highly automated system, for example an e-commerce server with tens of thousands of product SKUs, they will probably need to do some system integration work. Nearly all of the translation engines observed seen provide some sort of web services API, so they can be integrated into external systems in a relatively straightforward way.

of linguistic rules, lexical data, or translated content. They cannot access a knowledge base that helps them decide correctly how to disambiguate a given expression in a plausible way in a given context.

4.4.2

Solvable Problems

4.3.2.4 Government/Institutions Government is an important market for machine translation technology, especially investigative and intelligence agencies, which use it to sift through vast amounts of source material before it is reviewed by analysts. Machine translation is a vital tool in making information visible to analysts. This is an example of a market that will be well served by hybrid translation platforms, where machine translation is used for a first pass. The machine translated documents are then fed into automated and human assisted search tools to flag potentially interesting documents, which are then queued for professional translation, and then for review by analysts and specialists. The downside of this market is that the procurement process is quite arduous. Some governments also impose considerable security requirements that commercial clients are less concerned with. The decision to pursue this market is a strategic one, as it requires considerable investment in ancillary services by the vendor, as well as a sales force that is experienced at securing public sector accounts, and military/ intelligence agencies in particular.

The solvable problems are already on the R&D agenda. One is to optimize the handling of languages with complex morphologies or with non-Indo-European word orders, both of which typically make it hard to deliver smooth machine outputs for a number of language pairs. This type of system optimization will almost certainly involve adding annotations to the existing translation data to help the machine learn more effectively. Building on what has been called the unreasonable effectiveness of data, most MT scientists believe there is a need for much more abstract language models that can handle the immense complexity and context-sensitivity of linguistic objects, and then use the available data to improve the translation process. We can expect breakthroughs based on such Hierarchical Alignment Tree approaches to data driven MT in the next few years.

4.5
4.5.1

Opportunities and Challenges in the Machine Translation Industry


Interoperability and Standards

4.4
4.4.1

Machine Translation Breakthroughs


Intractable Problems

Interoperability between machine translation systems is desirable, but the systems are so similar in the way they present themselves to outside users that transitioning from one system to another, from a system integration standpoint, isnt that difficult. The APIs they expose to external systems all perform similar functions (request a translation, post-edit a translation, etc), and are not terribly complicated. Moreover, many vendors have committed to implementing the TAUS web services reference API, as a way of providing a standard API that developers can use to interact with a variety of services. The reference API defines how to make a variety of requests that are common to all human and machine translation systems. Translation memory and corpora, if they are stored separately from the machine translation engine, in a standard localization file format (e.g. TMX), can easily be ported from system to system. Changing from one translation engine to another shouldnt prevent customers from using the corpora they have built up over the years. 4.5.2 Measuring and Benchmarking Quality

There are a small quantity of extremely hard problems to be solved for fully automatic translation, and a larger quantity of less intractable problems in MT that will be solved within the coming decade. The problems that require a theoretical breakthrough - or which turn out to be inherently unsolvable by artificial means - involve conceptual issues in computational linguistics rather than technology issues in real world engineering environments. A system capable of systematically aping (or even surpassing) a human translator will need to draw on world models - real-world knowledge - to overcome the critical quality bottleneck. It has so far proved impossible to program a machine to understand the semantic intentionality of a text. This means that computers can be programmed to deploy knowledge of language or of statistical patterns of fluency or
46
Copyright TAUS 2013

Measuring and benchmarking quality using an automated process remains a difficult challenge. While there are quality scales, such as the BLEU scale, they only provide a comparative measure of quality.
Copyright TAUS 2013

47

Translation Technology Landscape Report

Translation Technology Landscape Report

This is important because whats really needed is an automated way to identify problem texts so they can be routed for human review and post-edit. At present, the standard practice is to have human reviews look at a certain percentage of texts, or spend an assigned about of time reviewing a subset of a project. As the volume of material being translated grows, it becomes easier for reviewers to miss defective translations. An automated process that could identify problem texts without generating a large number of false positives would be highly useful. This is largely an area of research currently although a couple of companies, Multilizer and SDL, offer solutions employing such techniques. Human review is typically used to ensure desired quality levels are achieved. The industry misses benchmark data on the benefits of MT. An driven industry-initiative, the TAUS Dynamic Quality Framework, has begun to address this common issue. In 2012 a knowledgebase on quality assurance best practices and set of tools for quality evaluation have been launched. These tools introduce industry benchmarking.

4.5.3

Cost of Customization

The cost of building custom or adapted translation engines remains quite high, mostly due to the cost of obtaining and pre-processing high quality parallel texts with which to train the translation engine. Fortunately, work already underway to create globally shared translation memories, including the TAUS Data repository, will encourage translation vendors to pool translations so they can be combined to create large, high quality training corpora. While the technical challenge to building a shared translation memory has been solved, the primary challenge going forward is to encourage translation vendors to share their translations by default. LSPs often resist doing this, so it will take time to make this a standard practice.

4.5.4

The Search for Talent

Recently the authoring team observed that there are more PhD positions for MT related research in Europe than there are candidates. As with other high skilled areas, the global talent pool will be hard pressed to meet demand in the medium term. There are opportunities for people currently in the translation industry to acquire the skills needed to fill some of these new positions. Further reading: A Common Tranlsation Services API, Brian McConnell and Prof. Dr. Klemens Waldhr, TAUS, September 2012 Dynamic Quality Framework, Dr. Sharon OBrien, Rahzeb Choudhury, Dr. Nora Aranberri, Jaap van der Meer, TAUS, November 2011 Advancing Best Practices in Machine Translation Machine Translation Evaluation, Rahzeb Choudhury and Dr. Nora Aranberri, TAUS, July 2012

48

Copyright TAUS 2013

Copyright TAUS 2013

49

Translation Technology Landscape Report

Translation Technology Landscape Report

Translation Industry Evolution


Translation to Convergence

5.

Trends That Influence the Translation Technology Industry

5.1 Cloud
As with other technology segments, the client/server model is being quickly replaced with a cloud (SaaS) based service model. The advantages of hosted services compared to customer premise software are extensive, and include: - Continual upgrades, with a continuous software release cycle (agile development). - Greatly reduced system administration and IT costs. - High availability with highly redundant storage. - Subscription based cost model, versus perpetual licenses, reduces up front capital cost. - Accessible across broad range of devices and operating systems, including mobile. - Generally improved user interface and usability compared to enterprise client/server software. While there are a few situations where customer premise software makes sense, these use cases are generally limited to corporations and government agencies that have stringent security and data pro tection requirements (e.g. intelligence agencies), although in many situations, the case can be made that a cloud based solution can be every bit as secure as on site software. Therefore, we expect this migration away from client/server software (thick client) solutions to continue, and expect most new products to be designed around cloud computing.

Source: TAUS

With the onset of unified personalized user experiences across digital touch points, demand for translation will come from many more languages than currently. Translation technology will increasingly need to be fueled by translation data.

5.2 Crowd
Crowd-sourcing has become a popular by-product of web 2.0. Crowd translation is becoming an increasingly important mode of translation, and has been widely used by popular web services to localize all or part of their offering. While pure crowd translation is relatively rare, many web com panies that are expanding internationally are using crowd translation to some extent, typically by recruiting their users to contribute to their translation and localization efforts. The ideal solution combines crowd and professional translation in a way that enables large scale user participation while using professional translators in the background to score and vet new users (to prevent bad actors and incompetent translators from getting into the labor pool). This approach enables large scale, low cost translation while guaranteeing quality levels.

50

Copyright TAUS 2013

Copyright TAUS 2013

51

Translation Technology Landscape Report

Translation Technology Landscape Report

Translation management systems are beginning to incorporate support for crowd translation, so that crowd translators can work alongside professional translators, albeit with different access rights on the system. Transifex, a localization management system, is a good example of this, as they allow any combination of machine, crowd/user and professional translation. We expect that support for crowd translation, and tools to manage crowd translators, will become a standard feature in translation and localization management systems. Lingotek is one on the most established providers in this space.

these regions. This is new territory, with only a few providers, such as Helsinki based Transfluent, specializing in this area. Social translation is another interesting opportunity. Amara and Viki, for example, both operate crowd translation and captioning services for web video. Viki in particular has attracted several million users via word of mouth, and has built up an impressive catalog of translated videos from major content providers. We expect to see more services like this for translating other types of content.

5.3

Big Data

See section 7 on Translation Data and Translation Technology.

5.6
5.6.1

Convergence
Technology convergence

5.4 Mobile
With most people shifting to laptop, tablet and mobile devices, most translation technology vendors have to develop a mobile strategy to remain competitive. Translation management systems, for example, should enable translators to work on projects from tablets and smartphones. Small form factor devices, of course, have limited real estate, so this is not a trivial problem. The companies that develop an intuitive mobile interface will have a distinct advantage in the marketplace compared to platforms that only work with conventional computers. This is particularly true for systems that enable crowd translation, since crowd translation often involves large numbers of people doing individually small amounts of work, a perfect use case for casual translation via a mobile device. Mobile also represents an opportunity for language service providers. Localization to multiple languages is becoming a material requirement for mobile app and service developers. Companies that provide simple, high quality localization solution for popular mobile environments and frameworks will put themselves in a position to capture other translation business as these companies expand internationally.

Successful computer-aided translation tools will be connected to a variety of machine translation engines, usually via web services. Translation memory and advanced leveraging features will be combined to help translate continuous streams of content with consistency. Machine translation and speech recognition technology will be employed in more business use scenarios in the near future, including healthcare and other sectors with high equity interactions. Machine translation, search tools and web content management systems will continue to converge, further embedding multilingualism on the internet and therefore across devices. As the quality of machine translation improves it will converge with consumer listening and a myriad of information processing technologies.

5.5 Social
The explosion of user generated content, such as in customer support forums, customer reviews, as well as user-contributed question and answer, is a major growth driver for machine translation. This type of content is growing in importance as consumers increasingly rely on such information for pre and post purchase decisions. Social media has become an important marketing and distribution channel for web services and publi cations, and will continue to grow in importance. Social media translation services will enable content producers and service providers to expand their reach in multiple languages, and to drive usage from

52

Copyright TAUS 2013

Copyright TAUS 2013

53

Translation Technology Landscape Report

Translation Technology Landscape Report

5.6.2

Functional convergence

This in turn will feed technological convergence as new applications for translation technology emerge. The globes at the centre of the exhibit illustrate the need for targeted translations that communicate to specific consumer groups. These needs will be better met as (language) data management, customer care and listening functions converge with translation. In addition to convergence of enterprise functions described above, we can expect convergence of functions in the supply-chain, with the main drivers being the need for speed and efficiency and the result being disintermediation.

The following exhibit is an adaptation of Mckinseys Consumer Decision Journey. It illustrates how consumers have many more touch points than in the past, when the sales funnel was seen as the standard conversion model. Consumers begin their journeys at trigger moments and are guided by much more than a sellers/ publishers messaging. Their trusted sources tend to be other consumers. The customer care they receive after conversion informs their messaging as trusted sources for others.

Consumer Journey

Further resources: Enterprise Language Strategy - Eight Things To Change, TAUS Video on Youtube: http://www.youtube.com/watch?v=b5nnIwrXv54, January 2012

Source: TAUS

At every touch point on the journey, translation technology offers the potential to increase customer satisfaction and lower costs. In many situations, content that otherwise would not be translated can be in a cost effective manner with machine translation. We expect functional convergence between translation and localization departments and their colleagues across other areas, with all working to optimize the consumer journey. Translation technology, and in particular machine translation, will be required as a tool embedded across functions within enterprises.

54

Copyright TAUS 2013

Copyright TAUS 2013

55

Translation Technology Landscape Report

Translation Technology Landscape Report

6.
6.1

Paradigm Shift and Counter Forces


Translation as a Utility

6.2

Counter Forces

The underlying direction of trends such as the shift to cloud services, use of open source technology and collaborative translation is towards open business models. Open models are characterized by collaboration, transparency and sharing to promote market growth. With each directional force there is a counter force. The counter of open models, closed models, are characterized by proprietary, highly bounded and protective approaches that seek to lock in competitive advantage. Locking in customers to proprietary software has historically been a highly successful market strategy. The consequence of which has often been a lack of competition, resulting in high usage costs and low innovation. Cloud, crowd and open source lower barriers to entry, and seem to offer counter balances. However, the battleground is shifting to the area of big data. In the upcoming phase of Convergence, data driven translation technologies will be combined with other technologies, such as search, knowledge management, speech, and many others to create entirely new offerings. Whether or not companies are able to access the right big dataset will be a pivotal factor in the evolution of the translation technology landscape going forward. Will data be monopolized or freed? Will the future translation technology segment be an oligarchy or place of opportunity for many firms? Further reading: Scenario Based Planning: Planning For An Uncertain Future, Rahzeb Choudury, TAUS, Greg Oxton, Consortium For Service Innovation, October 2010

The drivers of global economic growth are shifting to non-English speaking countries. Globalization is enabling an exchange of culture at an unprecedented scale. The ready availability of data driven online translation means users have become accustomed to gist translations into and out of the long tail of economically minor languages. There is a viral effect of machine translation with governments and citizens seeing other languages being served by translation automation and wanting their language or dialect to be included. These economic, social and technology trends are converging to create a paradigm shift: from trans lation as a relatively expensive professional to translation as a utility. Translation embedded in every digital touch point: application, device, signboard and screen. The growing expectation that translation is a basic right for anyone involved in the global information society. These three trends are themselves enabled by other trends, such as the shift to cloud services, growing use of big data, shared platforms, such as Moses, as well as adoption of platforms enabling massive online collaboration, described in the previous section.

Trend Convergence Drives Translation as a Utility

Source: TAUS
56
Copyright TAUS 2013 Copyright TAUS 2013

57

Translation Technology Landscape Report

Translation Technology Landscape Report

7.

Translation Data and Technology

Language data includes speech corpora, text corpora, lexicons and grammars. Language data is the fuel for translation and language technologies. These data are used to train and enhance the quality of output generated by such technologies. The importance of language data, and specifically textual translation data, to translation technology is difficult to overstate. The best sources of translation data are good quality human translations from trusted sources, such as government bodies and institutions, companies large and small, from professional translators and consumers themselves. When such data is shared and curated, it can be accessed for specific purposes in a cost effective manner. Curation involves standardization of file formats, to ensure machine readability for instance, and categorization into high-level ontologies, such industry categories. Curation can involve preprocessing the translation data, through tokenization or part-of-speech tagging for example. The ultimate benefit of shared data, which is openly accessible, is that through aggregation the whole is greater than the sum of its parts. The more well-curated data is openly available, the better datadriven translation technology will perform. One example of a data sharing service is TAUS Data, which in December 2012 hosted 54 billion words in 2200 language combinations across a number of industry-sectors. TAUS Data operates with a clear legal framework with regard to intellectual property and copyright that all users must agree to. Whilst this is a sizeable repository, TAUS Data and others like it are a fraction of the size they could be. If translation is to be fully exploited as a utility we might aim for a repository of say a trillion words covering 80,000 language combinations, openly available for all to exploit. This would be fuel enough for translation technology to enable 99% plus of the worlds population to communicate with each other, covering 400 of the worlds 6000 or so languages. If we consider one example of technology convergence, machine translation with speech recognition, it is easy to imagine the massive benefit to commerce and society. Another route to accessing large amounts of data is crawling/harvesting from the worldwide web. However, data that has been harvested from the internet is often of a far lower quality than that desired. The standard adage garbage in, garbage out applies. The low quality of data used reduces the quality of machine translation output. Crawled data usually requires extensive human review and correction before it is usable by translation technology. Only a handful of companies have the resources needed to undertake such corrective activity at scale.

There is no international legal framework to cover the practice of harvesting translations from the Internet. Even so, the demand for data is so strong that there are numerous examples of small and large companies, such as search engine providers, undertaking such activity. They are not able to publicly share the harvested and cleaned data, even if they would wish to, and so there is no aggregate benefit outside the benefit to them. There are also inconsistencies between national and regional rules, for example between the US, China and Europe, which has meant that US and Chinese companies have been able to benefit ahead of European firms. Whilst TAUS Data operates with a clear contract, its goals and sharing model are not easy to reconcile for companies and organizations that are often competing with each other in specific verticals. As you might expect, more often than not, content owners do not have the time or interest to investigate the benefits of sharing translation data. There is a material risk of a handful of companies creating data monopolies locking out competition, locking customers, and stifling innovation. A coordinated change in the law to allow use of translation data by translation technology providers, while continuing to protect the rights of content owners, would dramatically change the possibilities in this segment. The way would be opened for greater competition and innovation in the future. Clearly, such a change is extremely difficult to bring to fruition.

7.1 Opportunities
There are numerous ways to exploit large translation data sets. These range from applications in translation related activity or more broadly by other language technologies. A few obvious applications are outlined in this section.

7.1.1 Terminology Today, glossaries are built by terminologists: the best in class language specialists. It is laborious, manual work and frustrating. This is because language keeps changing, the terminologist is always behind and the glossary is often ignored. Accessing up-to-date subject/niche specific terminology is the most common challenge facing people working with languages. If high volumes of language data could be easily accessed, statistical routines could be used to determine subject specific terminology in real-time. Synonyms and related terms could be identified automatically. Part-of-speech tagged, context listed, sources quoted, and meanings described. The technology to do this

58

Copyright TAUS 2013

Copyright TAUS 2013

59

Translation Technology Landscape Report

Translation Technology Landscape Report

work in a largely automated fashion, with linguists and users involved as validators, already exists. Access to terminology on demand would raise the capability of the translation industry across the board and help fuel innovation among translation technology providers.

7.1.4

Quality management

Today the translation industry often struggles to deliver adequately targeted quality. Missing the local flavor, the right term or subject knowledge. Source texts are often in bad shape, causing all kinds of trouble for translators or machine translation engines. If high volumes of language data could be easily accessed, it would be possible to automatically clean and improve source texts for translation. It would be simpler to run automatic scoring and benchmark quality. Consistency and comprehensibility of source (author) and target (translation) texts would be improved.

7.1.2

Customized Machine Translation

Today, most users accept the inadequacies of machine translation on the internet, as it often serves their goals, typically for comprehension. Relatively few companies or organizations go through the process of customizing an engine for their content. If high volumes of language data could be easily accessed, myriad opportunities would open up. Fully automatic semantic clustering could be developed to find the translation data that matches specific domains. This would make it much easier and cheaper to make industry and domain specific machine translation engines. Automatic genre identification techniques could be employed, making it easier and cheaper to ensure machine translation engines apply the right writing style. It would be easier to go deeper in advancing machine translation technology with syntax and concept descriptions. In short the overall quality of machine translation would be raised, whilst also making development faster, cheaper, and more accessible. These types of enhancements, improving quality and lowering cost, would make customized machine translation solutions far more practical for small and medium sized firms trading across borders. Customized engines typically produce translation quality at least twice as good as that resulting from free online tools. With the availability of open source tools like Moses, the only material barrier to a growth in competition in the machine translation space is the availability of data to make customized, subject specific machine translation engines.

7.1.5 Interoperability Today the lack of interoperability and compliance with standards costs a fortune. Translation industry providers have typically implemented their own interpretations of three open standards, TMX, XLIFF and SRX. Buyers and providers of translation often lose 10% to 40% of their budgets or revenues because language resources are not stored in compatible formats. They are often locked-in to a technology. This is a major problem for the translation industry as a whole. If it were common practice for the global translation industry to share most of its public translation data in common industry repositories, all vendors and translation tools would be driven towards full compa tibility. This would enable the industry to scale up significantly.

7.2

Access to Translation Data

There are a few examples of openly shared translation datasets, such as that provided by the United Nations and European Commission. These cover very narrow and specific subject matter and so are of limited value outside their topic areas. With growing understanding and some evidence of a trend towards open data, we can expect more data under for example, creative commons license, to become available overtime. However, the most natural source of quality translation data is the translation services industry itself. The primary challenge facing the translation technology industry is the fragmentation of such data, parallel texts, glossaries, etc, which if shared would enable vendors to avoid duplicating each others data collection efforts and fuel sustainable innovation. We have already seen how the open source Moses translation engine has facilitated this by enabling many companies and research institutions to re-use the platform, and instead of trying to build their own

7.1.3

Global Market and Customer Analytics

If high volumes of language data could be easily accessed, translation technologies and processes could be more easily integrated with consumer listening, analytics and social media management. Enabling multilingual sentiment analysis, search engine optimization, opinion mining, customer engagement, competitor analysis, and more. Lexalytics and Vocus are early examples of companies in this space.

60

Copyright TAUS 2013

Copyright TAUS 2013

61

Translation Technology Landscape Report

Translation Technology Landscape Report

machine translation system from scratch, focus on collecting parallel texts to train this existing platform. In just the same way, shared translation corpora, term glossaries, and dictionaries can be used as a shortcut in this training process, just as search engine providers initially used EU and United Nations corpora as baseline training data for their machine translation systems. The issue with respect to accessing this data is not technical, the companies and institutions who possess it can easily publish it in machine readable form if they choose to, but rather a policy choice, requiring industry leaders to decide whether and how to share this data with peers.

8.

Drivers and Inhibitors

The drivers and inhibitors exhibit below, highlights the main factors affecting the growth of the translation technology segment.

Drivers and Inhibitors

7.3

Sharing Translation Data

Now that virtually all systems are connected to the internet, it is technically straightforward to create shared translation memories that are continually updated as new translations are created. This is particularly important for machine translation and hybrid translation systems that require a large corpus of aligned texts to train the translation engines. Language service providers are an ideal source of texts, since they continually produce high quality, aligned texts that, with client permission, can be fed into these systems. This process can be completely automated so that the language service provider incurs little or no cost to participate in the shared translation memory. TAUS has played a leading role by creating an open global translation memory that, at December 2012, contains 54 billion words in 2200 language combinations. There are also a number of platforms where data is shared as a function of using a tool. These include Translation Workspace, MyMemory, and the Google Translation Toolkit, among others. All but the very largest organizations, which perhaps produce sufficient data to service their own requirements, will need to access others translation data to exploit the full benefits to translation technology going forward. Whether such data is available to many possible solution providers or only few will be a factor in cost and choice of offerings.

Source: TAUS

Further reading: Clarifying Copyright on Translation Data, Jaap van der Meer and Andrew Joscelyne, TAUS, January 2013, see page 66

In the short term, global e-commerce and cloud computing are the strongest growth drivers for the segment. Industry resistance to adopting translation technology is a major inhibitor currently. Much of this resistance is likely to abide as the quality of machine translation is seen to improve with industry-benchmarking metrics become available. Such metrics are needed to inform and ensure robust technology adoption decisions within the wider translation industry. While speech and machine translation techno logy have been integrated to create market offerings, the limited quality of output means this driver currently has narrow application. Social media, such as user generated content, is an increasingly important

62

Copyright TAUS 2013

Copyright TAUS 2013

63

Translation Technology Landscape Report

Translation Technology Landscape Report

form of content, often guiding pre and post purchasing decisions. The continued growth of social media will be a significant driver going forward. Looking further ahead, big translation data and the use of mobile will be significant growth drivers. Until five years ago there were around a dozen commercially viable machine translation providers. Since then a few dozen have entered the market, often exploiting the open source Moses toolkit. The impact of open source is now being counter-balanced by severe pricing pressure and product innovation from proprietary technology providers, such as Microsoft. The authoring team expects open source to have only medium impact as a driver in the next two years. The custodians of open source technology, in particular, Moses, will need to be responsive to industry requirements if this approach is to be a stronger driver in the long-term. Some cross translation industry cooperation in areas, such as establishing common reporting metrics, interoperability and sharing translation data would drive material growth in the translation technology segment. Machine translation breakthroughs on some of the solvable problems will be coming of laboratories in the next few years. However, there is usually a lag before these can be applied in commercial settings. For the next five years at least, the biggest boosts to machine translation quality will come from selecting and properly preparing the most appropriate translation data. The cost of low standardization is likely to be born by customers. Continued over regulation, in particular outdated copyright law, could stifle growth. A talent shortage seems likely as demand grows for specialist knowledge. Whether markets for translation data are open or closed is a key factor affecting the nature of evolution in the segment, the cost of solutions and the motivation to innovate. By 2020, the semantic web and consolidation with other language technologies will begin to have material impact on the translation technology segment.

9. Methodology
The facts and analyses contained in this report are the result of ongoing research undertaken by TAUS on behalf of its membership since 2005 and the extensive collective experience of the authoring team. The factual information in the report has been drawn from the TAUS Tracker databases and inter views with senior representatives of translation technology companies. The TAUS Tracker is a series of granular translation and language technology directories found at www.taustracker.com. The analyses of market evolution and trends are informed by TAUS reports such as Translation in Turmoil (2006), Localization Business Innovation (2008), The Innovation and Interoperability Roadmap (2009), Enterprise Language Strategy (2010), Scenario Based Planning Planning for an uncertain future (2010), Lack of Interoperability Costs the Translation Industry a Fortune (2011) and Dynamic Quality Framework (2011). TAUS reports are informed by primary research such as survey data, interviews and ideation sessions at events. TAUS events are non-sponsored. They bring together members and other key industry stakeholders to share and define new strategies. This rich foundation is found: Reports Catalogue - http://www.translationautomation.com/reports/search/1 Multimedia Library - http://www.translationautomation.com/videos/videos We thank our members for their openness to sharing knowledge and ideas to serve the common goals of the translation industry.

64

Copyright TAUS 2013

Copyright TAUS 2013

65

Translation Technology Landscape Report

Translation Technology Landscape Report

SPECIAL ADDENDUM Clarifying Copyright on Translation Data


Andrew Joscelyne and Jaap van der Meer First published on www.translationautomation.com 16 January 2013

We believe it is time for legislators in Europe, the United States, Canada and all other leading nations to clarify the current copyright law on a new technology phenomenon, namely translation data. During the last decade, innovation and creativity in technology, business processes and collective intelligence have made a remarkable impact on the global translation industry. These forces have been generating resources and processes that have revolutionized the way companies and governments can communicate with users and citizens. This has resulted in the creation of a certain type of data that we feel should have an independent existence under the law, or at least be given special status as a technology-induced phenomenon. There are parallels. A similar cry has gone up from communities involved in leveraging knowledge from large text corpora in the field of life-sciences, law and similar big data research fields. In the UK, for example, there are efforts underway to release such data from the constraints of legal protection and allow non-commercial innovative text mining initiatives to thrive by changing the copyright conditions on academic journal content. We would argue that translation memory usage could be conceived as a special case of mining for data, even though its ultimate usage will probably be commercial. Making exceptions under the copyright law for specific types of data translation words - by a broader cohort of legislators might offer a better path through the legal maze. We now realize that further progress will not simply depend on better technical fixes, but also on solving the daily conflicts between the principles of intellectual property law, corporate policies, business practices, and what we can call pragmatic use of data. These copyright issues are obviously not the only stumbling block to progress in general, but they do raise important issues over the long term, and relate to fields of technology-driven innovation that are similarly focused on leveraging intelligence from forms of data. Here is a view of how we can collectively improve the legislative framework for all stakeholders in tomorrows translation industry.

A world without language barriers


This could be a reality within ten to fifteen years. The technology is there. What it will take is a very large-scale coordinated effort between governments, businesses and academia worldwide. We call it the Human Language Project. The goal is to reach sufficient adequacy and fluency in fully automatic translation so that most of the worlds citizens can speak and write their own language and be understood by everyone else. In two articles TAUS wishes to focus on the pre-conditions for making this happen. This is the first article: a call on legislators and policy makers to consider how to address copyright law for the new legal entity of translation data. The second article will focus on defining and framing this mega-project.

Lets make translation easier


The French president Franois Hollande demanded that the Dutch return all the borrowed words from the French language as of the year 2015. The Dutch people will then no longer be allowed to use many words such as dossier, portefeuille, and ordinair that have become commonly used words in the Dutch language. This news item was published on November 15 by the Dutch newspaper de Volkskrant, under the parody section of course. Who would seriously think that national governments can claim ownership over the words that people use? After all we all copy each others words in order to be better understood and communicate well. We have learned to do so ever since we were babies and heard our mothers say our first words to us. And yet when reviewing the intellectual property rights on terminology and translations, it seems that we are entering a minefield. Publishers, authors and translators have the right to own words and users need to be aware of that. And the relevant law is different in different parts of the world.

The issues
Multiple layers
Copyright law has had a difficult time in the digital age, now that file copying has become instanta neous and ubiquitous. We all agree that intellectual property rules are vital for protecting business value. Yet the practice of translation and localization (which creates products that we own but which somehow escape our total control as content) raises new and unexpected concerns for many digital stakeholders.

66

Copyright TAUS 2013

Copyright TAUS 2013

67

Translation Technology Landscape Report

Translation Technology Landscape Report

Lets take a closer look at the steps in the translation process and how these relate to copyright. First there is the original work, the document written in the source language. IP rights to this document belong to the author or to the company that employs the author or that has purchased the services of the author as a subcontractor. Then there is the translation of the full source document. The IP rights can belong to the translator or to the translation company, but are generally transferred to the publisher who hires the translator or the translation company to provide the translation service. It should be noted here that this copyright protection on translations is absolutely automatic under the law: it is not necessary to register the work or go through any red tape to transform a translation file into a copyrighted translation file. But translated works can be registered in an official IP repository to ensure stronger protection where necessary. During the translation process a translation memory (TM) will be created - i.e. a list of all the sentences in source and target languages in a particular database format. Generally speaking, neither the original source document nor the translated document can be recreated automatically from this digital translation memory file. IP rights to the individual sentences - source and target - still belong to the author, translator or the company that employed them or paid for their services. The critical issue here is that TM technology creates a new database with its own format and attributes, and this forms a completely new work with its own IP rights, made out of bits of other works. Once again, the IP rights to the translation memory belong either to the translator or translation company, or they may be transferred to the publisher who paid for the translation service if this is clearly stated in the service agreement. That said, exclusively in Europe there is a so-called sui generis right on databases that may need to be reviewed. The 1996 Database Directive points to copyright protection for a database that contains non-original data but which nevertheless required a substantial (intellectual) investment to set it up. In other words, the structure and not the content are protected - though the content may obviously be copyright protected under some other head in the law. The lack of full harmony within all EU countries about copyright issues on databases, and what counts as an original work under the law, is another knotty issue facing the quintessentially cross-border practice of translation, and its automated processes.

It will mean that the terminology will be more consistent and the price will be lower. If this educated customer provides the tools platform and access to the previously created translation memories, this process could become a welcome industry practice. However, the vast majority of translation buyers rely on their translation vendors to manage the translation memories and tool platforms, even though they may insist in their service agreements that the IP rights to these translation memories belong to them. But what if translation buyers use the services of multiple vendors? What if they change vendors and the new vendor subcontracts the translation jobs to the same translators who did the original translations? In all these cases it will be hard to follow the best practice suggested above. Even if the informed translation buyer legally owns the translation memories, the translation memory as a file (to which the IP rights belong) may not exist in the original form. It has probably been mixed and matched with new translation memories, creating a new database with its own new IP owner. The upshot of all this is that the educated translation buyers are completely lost; they probably wish they knew as little about the issue as the vast majority of organizations that buy translation services. In recent years pragmatists and innovators have entered the market to make the most of this confusion around IP rights for translation memories. Innovators often help clarify issues, prompt welcome changes in inadequate legislation and pave the way for growth. In the translation industry, innovators offer online services and tools platform that make translation faster and easier for end users. These users may not realize that they are allowing these new innovative providers to use their trans lations - not to recreate the original work, but to carry out research on translation technology, and generate derivative work. For these innovators, translation memories have become data that help improve automatic translation engines.

A 21st century view on translation


Todays practices, policies and principles of IP legislation all stem from a last-century definition of translation whereby translation memories were merely intended to help the translator do a better job a little faster and somewhat cheaper and more consistent than previously. Today the focus is shifting from translation memories on hard disks to massive amounts of translation data in the cloud, in the form of parallel text corpora. These translation data may be accumulated from translation memories, or from online translation service platforms or harvested (crawled and aligned) from localized versions of web sites and other sources. In addition, these data are often usefully annotated with attributes for domain, content type and source. In this way translation data are becoming the key to quantum leaps forward in translation efficiency.

Policies, practices and pragmatists


The confusion about who owns what at which moment in the workflow can lead to a conflict between policies and practices. This opens up a source of profit for pragmatists and innovators. An educated customer (i.e. one who is fully aware of the benefits of TM) will deem it good business practice if the translator re-uses the translation memory created during a previous job for a new job.
68
Copyright TAUS 2013

Copyright TAUS 2013

69

Translation Technology Landscape Report

Translation Technology Landscape Report

Google has demonstrated this already in the past five years by training new machine translation engines for 4032 different language pairs by using data, nothing but translation data. In their articleThe Unreasonable Effectiveness of DataGoogle scientists Alon Halevy, Peter Norvig and Fernando Pereira (Published in March-April 2009 in IEEE Intelligent Systems) make the case for anyone who wants to train machines to translate to go out and gather some data, and see what it can do. Since 2009 many language service providers, large and small organizations and new-generation MT developers have followed this advice and started training MT engines with whatever data they could put their hands on.

Her conclusions are that in some cases the use of translation memories may be permitted under the fair use/fair dealing exception to copyright legislation. But this too is a minefield. Translators and translation companies who do not want to upset their customers naturally stay on the safe side of the law and accept the consequences. The translation industry today is under tremendous pressure to keep up with market demand. Volumes keep growing, turnaround times are shrinking and the cost per word has to come down. There are not enough professional translators in the world to meet this demand. By introducing more openness and greater shareability of data under specific conditions into the current copyright law, the translation industry would be able to innovate and automate processes more efficiently, and in everyones interest. If translation as data can be used freely to develop derivative work, improve quality and drive research into new technologies, then the industry and that means all stakeholders: buyers, vendors and techno logy suppliers - would be able to flourish as never before. We could expect a broader range of services, much larger capacity, and new opportunities for technology innovation. There will naturally be a concern to protect the underlying intent of copyright law - i.e. to protect originality and creative work. But in fairness, the IP criteria of originality and creativity should only be applied to a fraction of the words that are actually published these days. How many different translations would a user in any given country in the world like to see of the sentence: Please enter your password? Companies, governments and NGOs along with citizens, end-users, tax payers and patients everywhere would benefit enormously if ninety percent of the words written and translated can be leveraged as data to drive better, quicker communication.

Summary of key issues


What we are suggesting here is that complex multi-layer IP rights, conflicts between policies and practices, and somewhat old-fashioned definitions of translation memory and data tend to make it harder to build a prosperous, innovative and fast-growing global translation industry that helps the world to communicate better. Obviously organizations need to legally protect their ownership rights over their content and its translations. But we feel that a clearer distinction between the words of their content as data, and the structure of their content as published documents could be instrumental in opening up language data as a collective good.

The ideal situation


What the industry needs
In a November 2007 article entitled:You Must Remember This: The Copyright Conundrum of Translation Memory Databases, Francie Gow has studied the precarious position with respect to copyright laws for translators using translation memory tools. Copyright law does not allow translators and translation companies to use the translation memories they have built for one customer to help them on projects for another customer. Legally speaking they are required to destroy their translation memories after projects are completed. That seems to be an extremely counterproductive ruling. Customers choose to work with professional translators because they are experienced and skilled. Like lawyers and management consultants, they are expected to keep libraries of past work to build on in the future, says Francie Gow, who investigates whether a case can be made for fair dealing under Canadian and US law when translators keep copies of translation memories for the leveraging of new translation jobs.

What the world needs now


If language were no longer a barrier for a Japanese student reading a French newspaper and a German consumer placing an order on a Greek web shop, the world would look very different. We would have a much better understanding across cultures, which might in turn diminish some of the risks of international conflict and political disintegration. Global business would grow exponentially. Breaking the language barrier would help streamline the globalization of business and politics. Erik Ketzan in his articleRebuilding Babel: Copyright and the Future of Machine Translation Online (2006), says: Technology may have put us on the moon, but machine translation has the potential to take us farther, across the gulf of comprehension that lies between people from different places. The ideals of a prosperous global translation industry and a world of better communications will benefit enormously from greater clarity about the translation-specific nature of copyright law.

70

Copyright TAUS 2013

Copyright TAUS 2013

71

Translation Technology Landscape Report

Translation Technology Landscape Report

New principles for a revision of copyright law


We estimate that 75% of texts written and translated in the world these days are published online. This makes it very hard to protect the usual intellectual property rights of all these texts. Innovators are active everyday using automatic spiders to crawl the web and align translations. In North America this use of translation data may be allowed under the exception of fair use and fair dealing. Europe tends to interpret copyright law on a much stricter basis. Yet the innovators are ubiquitous, and they range from very large global IT companies in the USA to small start-ups in any part of the world. It is hard to call them thieves and impossible to prosecute them. This creates unfair competition both inside and outside the translation industry. One way to clarify the law would be to create a more open, sharing translation environment, in which every stakeholder is free and able to use data to optimize their translation process. Obviously the act of modernizing copyright law to reflect this reality among the leading nations of the world will not alone pave the way to a wonderful new world of global translation. There are many other issues at stake. But we would like to propose a couple of simple principles that focus on the specifics of the translation issue: 1.  A clear distinction must be made between the way Intellectual Property (IP) rights are treated for the text to be translated (the Source), the translation (the Target) and Translation Data as a new legal entity. 2.  Translation Data are defined as a database containing terms, phrases and segments of text, aligned between two or more languages. Translation Data in most cases contain phrases and segments from many Sources and Targets. If the database allows users to reconstruct the Source or Target, as referred to in the first principle here above, this will be considered an infringement on the IP rights assigned to the Source and Target. 3.  IP rights to the Source and Target may be held exclusively by the author, the translator or the company that is publishing the Source and Target. 4.  Translators and translation companies should be allowed to store, share and aggregate Translation Data for the purposes of developing derivative work, leveraging and reusing translations, research, and improving their services. 5.  The translator or the company that aggregates the Translation Data holds IP rights to the Translation Data in the form in which the data are stored and used in the database. 6.  Owners of Source and Target should know that they can legally protect their documents from copying when they publish on the web. We hope that these six simple principles help to start the discussion towards a more open, data-sharing vision in the translation industry. We welcome your comments.

TAUS Members
ABBYY Able Translations Acclaro Acrolinx Adobe Amesto Appen Butler Hill Attached BV Autodesk AVB Vertalingen BBN Raytheon Beo Bothof CA Technologies Capita Celer Soluciones Charles University Cisco Citrix Clay Tablet Cloudwords CLS Communication CNGL Concorde Crestec Cross Language DELL Digital Linguistics Dublin City University eBay/Paypal EC Innovations EMC European Patent Office FBI NVTC Global Textware Google Harley Davidson Hewlett-Packard Pactera Honyaku Center Hunnect iDisc Informatica Inpokulis Intel Iolar Jensen Localization John Deere John Hopkins University Kilgray Kingdom Site Ministries L&L Language Intelligence Language Service Associates LDS Church Lexcelera Lingo 24 LingoSail Technology Lingosoft Oy Linguistic Systems Lionbridge Logrus Lucy Software Manpower Medilingua Medtronic Memsource Merrill Brink Microsoft Molina Healthcare Moravia Worldwide Morphologic MultiCorpora R&D Inc NICT Oracle Pangeanic Philips PROMT PTC PTSGI Quagnito R.R. Donnelley Safaba SDL Semantix Siemens SimpleShift Skrivanek Smartling SpeakLike Spoken Translation STP Nordic Straker Software Symantec Systran Tauyou Tekom Tembua Tilde Translated.net TransN TripAdvisor Trusted Translations University of Helsinki Urban Translation Service Welocalize Win and Winnow XTM International Yahoo! Yamagata Europe

72

Copyright TAUS 2013

Copyright TAUS 2013

73

Translation Technology Landscape Report

TAUS is an innovation think tank and platform for industry-shared services, resources and research for the translation sector globally. We envision translation as a standard feature, a ubiquitous service. Like the Internet, electricity, and water, translation is one of the basic needs of human civilization. Our mission is to increase the size and significance of the translation industry to help the world communicate better. We support entrepreneurs and principals in the translation industry to share and define new strategies through a comprehensive range of events, publications and knowledge tools.

74

Copyright TAUS 2013

You might also like