Monthly Archives: April 2013

GALA Innovations in Language Technology

I presented on the Internationalization Tag Set 2.0 and gave a demonstration of Reviewer’s Workbench at yesterday’s GALA “Innovations in Language Technology” pre-Think Latin America event. It seemed to go well: I couldn’t spot anyone sleeping.

Highlights of the various presentations

Vincent Wade, CNGL – Research at CNGL

Prof. Vincent Wade, Director of CNGL set the stage for the afternoon by talking about the challenges of volume, variety and velocity and the arrival of Intelligent Content followed by an overview of the research activities at the Centre.

Steve Gotz talked knowledgeably (as he always does) about the differences between invention and innovation. Seemingly our industry has been guilty of only doing incremental innovation rather than disruptive invention. Luckily CNGL can help with the latter.

Tony O’Dowd, Kantan – Machine Translation and Quality

Tony talked about the dichotomy of machine translation quality metrics used by system developers versus the measurements that are more of interest to those downstream from the raw MT output: Post-Editors, Project Managers, etc. He proposed an interesting way of bridging this divide.

Reinhard Schäler, Rosetta Foundation – Collaborative Translation and Non-market Localization Models

Reinhard talked about the great work that is being done by volunteer translators and how this highly collaborative model could influence the future of the industry in the medium to long term. He also covered the Open Source Solas localization platform which is the backbone of the Rosetta production environment and includes a component called “Solas Match”: a dating application for “connecting translators to content”.

Summary

Between presentations there was some stimulating and interesting discussions around the impact that disruptive technologies could have on the industry, the challenges of carrying out innovation in the industry, the future of Language Service Providers and non-market localization.

There’s probably not enough of this type of conversation that happens in the industry, particularly between the service providers, possibly because we are all concerned about differentiating our offerings. However, as Arle Lommel pointed out to me, if those differentiating factors can be assimilated by someone else within the space of an afternoon, it probably wasn’t much of a differentiator!

A Personal Contribution to Global Intelligent Content

Global Intelligent Content

As Chief Technology Officer of VistaTEC, I was fortunate to be one of the founding Industrial Partners of the Science Foundation Ireland funded Centre for Next Generation Localisation (CNGL). CNGL has just received support for a further term with the overall research theme of “Global Intelligent Content”. I therefore thought it appropriate that my first post should actively demonstrate and support this vision.

So, what’s so “intelligent” about this post?

If you have any basic understanding of HTML you’ll know that the page you’re reading is composed of mark-up tags (elements) such as <p>, <span>, and <h1>, etc. The mark-up allows your browser to display the page such that it is easy to comprehend (i.e. headings, paragraphs, bold, italic, etc.) and also interact with (i.e. hyperlinks to other related web documents). You may also know that it can contain “keywords” or “tags”: individual words or phrases which indicate to search engines what the subject matter of this post is. The post certainly does contain all of these.

The page also includes a lot of “metadata“. This metadata conforms to two standards each of which is set to transform the way in which multilingual intelligent content is produced, published, discovered and consumed.

Resource Description Format in Attributes

In layman’s terms RDFa is a way of embedding sense and definition into a document in such a way that non-human agents (machines and computer programs) can read and “understand” the content. RDFa is one mechanism for building the Multilingual Semantic Web.

If you right-click this page in your browser and choose “View Source” you’ll see that it contains attributes (things which allow generic HTML tags to have more unique characteristics) such as property and typeof. These allow web robots to understand those parts of the content that I have decorated at a much more fundamental level. For example, that I created the page, the vocabulary that I have used to describe people, organisations and concepts within the document, and details about them. This data can form the basis of wider inferences regarding personal and content relationships.

Internationalization Tag Set 2.0

ITS 2.0 is a brand new W3C standard which is being funded through the European Commission as the Multilingual Web (Language Technologies) Working Group; part of the W3C Internationalization Activity. Its goal is to define categories of metadata relating to the production and publishing of multilingual web content.

To exemplify this, the overview of ITS 2.0 below was translated from German to English using the Microsoft Bing machine translation engine. Viewing the source of this page and searching for “its-” will locate ITS Localization Quality metadata that I annotated the translations with so as to capture my review of the target English.

“The goal of MultilingualWeb LT (multilingual Web – language technologies) it is to demonstrate how such metadata encoded, safely passed on and used in various processes such as Lokalisierungsworkflows can be and frameworks such as Okapi, machine translation, or CMS systems like Drupal.
Instead of a theoretical and institutional approach to standardization, LT-Web aims to develop implementations, which concretely demonstrates the value of metadata with real systems and users. The resulting conventions and results are documented and published as a W3C standard, including the necessary documentation, data and test suite, as the W3C standardization process requires it.”

Summary

I’m very excited about Global Intelligent Content. This post is a very small and personal contribution to the vision but hopefully it illustrates in a simple way what it is about and some of its possibilities.