Tag Archives: Kantan

Machine Translation Pipeline Meets Business Trends

This week we will carry out final integration and deployment tests on our distributed pipeline for large scale and continuous translation scenarios that heavily leverage the power of machine translation.

We have built this platform as we recognised the demands and trends that are being reported by industry experts like Common Sense Advisory and Kantan.

The platform features several configurable services that can be switched on as required. These include:

  • automated source pre-editing prior to passing to a choice of custom machine translation engines;
  • integrated pre-MT translation memory leverage;
  • automated post-edit of raw machine translation prior to human post-edit;
  • in-process, low-friction capture of actionable feedback on MT output from humans;
  • automated post-processing of human post-edit;
  • automated capture of edit distance data for BI and reporting.

The only component missing that will be integrated during May is the text analysis and text classification algorithms which will give us the ability to do automated quality assurance of every single segment. Yes, everything – no spot-checking or limited scope audits.

The platform is distributed and utilises industry standard formats including XLIFF and ITS. Thus it is wholly scalable and extensible. Of note is that this platform delivers upon all six of the trends recently publicised by Kantan. Thanks to Olga O’Laoghaire who made significant contributions to the post-editing components and Ferenc Dobi, lead Architect and developer.

I’m very excited to see the fruition of this project. It doesn’t just represent the ability for us to generate millions of words of translated content, it delivers a controlled environment in which we can apply state-of-the-art techniques that are highly optimised at every stage, measurable and designed to target the goal of fully automated usable translation (FAUT).

GALA Innovations in Language Technology

I presented on the Internationalization Tag Set 2.0 and gave a demonstration of Reviewer’s Workbench at yesterday’s GALA “Innovations in Language Technology” pre-Think Latin America event. It seemed to go well: I couldn’t spot anyone sleeping.

Highlights of the various presentations

Vincent Wade, CNGL – Research at CNGL

Prof. Vincent Wade, Director of CNGL set the stage for the afternoon by talking about the challenges of volume, variety and velocity and the arrival of Intelligent Content followed by an overview of the research activities at the Centre.

Steve Gotz talked knowledgeably (as he always does) about the differences between invention and innovation. Seemingly our industry has been guilty of only doing incremental innovation rather than disruptive invention. Luckily CNGL can help with the latter.

Tony O’Dowd, Kantan – Machine Translation and Quality

Tony talked about the dichotomy of machine translation quality metrics used by system developers versus the measurements that are more of interest to those downstream from the raw MT output: Post-Editors, Project Managers, etc. He proposed an interesting way of bridging this divide.

Reinhard Schäler, Rosetta Foundation – Collaborative Translation and Non-market Localization Models

Reinhard talked about the great work that is being done by volunteer translators and how this highly collaborative model could influence the future of the industry in the medium to long term. He also covered the Open Source Solas localization platform which is the backbone of the Rosetta production environment and includes a component called “Solas Match”: a dating application for “connecting translators to content”.

Summary

Between presentations there was some stimulating and interesting discussions around the impact that disruptive technologies could have on the industry, the challenges of carrying out innovation in the industry, the future of Language Service Providers and non-market localization.

There’s probably not enough of this type of conversation that happens in the industry, particularly between the service providers, possibly because we are all concerned about differentiating our offerings. However, as Arle Lommel pointed out to me, if those differentiating factors can be assimilated by someone else within the space of an afternoon, it probably wasn’t much of a differentiator!