Tag Archives: Localization

mojito

I took a look at Box’s mojito recently. I really like it.

Mature, full-featured translation management systems are large, monolithic applications that can sometimes be difficult to navigate and learn. It is refreshing to see something simple and lightweight. Yes, mojito’s feature set is limited but if you’re a start-up looking for a way to turnaround translations online and have limited technology budget, mojito is worth a look.

It was easy to install comprising of two jar files: one for the web application (includes an in-memory database and embedded web server for trial purposes: and one for the client command line interface.

 

Machine Translation of Software

Software distinguishes itself as a content type by being serialized in many different formats along with optional metadata describing the type of user interface control it will appear on, what user interface real-estate it should occupy and possibly other related data. When working on cross-platform or multi-platform products, one invariably bumps up against several of these serialization (resource) formats.

It is standard practice on localization projects to reuse as much translation as possible from previous releases, saving time and effort in quality assurance and, of course, cost. This process of leveraging work from one product release to the current one should be done with attention to context. It’s a real necessity then to have tools support when working on scenarios such as these.

We are embarking on a large enterprise software project in which we want to utilise machine translation in addition to reusing previous human translations. I’d like to give a shout out to Cristiano Maggi and Enda McDonnell at long time business partners, Alchemy Software, for giving us the facilities and assistance to build a Microsoft Translators Hub connector for Catalyst 11.

Now we can safely and reliably employ a Translation Memory/Machine Translation translation process across a project which involves many software resource formats.

A Personal Contribution to Global Intelligent Content

Global Intelligent Content

As Chief Technology Officer of VistaTEC, I was fortunate to be one of the founding Industrial Partners of the Science Foundation Ireland funded Centre for Next Generation Localisation (CNGL). CNGL has just received support for a further term with the overall research theme of “Global Intelligent Content”. I therefore thought it appropriate that my first post should actively demonstrate and support this vision.

So, what’s so “intelligent” about this post?

If you have any basic understanding of HTML you’ll know that the page you’re reading is composed of mark-up tags (elements) such as <p>, <span>, and <h1>, etc. The mark-up allows your browser to display the page such that it is easy to comprehend (i.e. headings, paragraphs, bold, italic, etc.) and also interact with (i.e. hyperlinks to other related web documents). You may also know that it can contain “keywords” or “tags”: individual words or phrases which indicate to search engines what the subject matter of this post is. The post certainly does contain all of these.

The page also includes a lot of “metadata“. This metadata conforms to two standards each of which is set to transform the way in which multilingual intelligent content is produced, published, discovered and consumed.

Resource Description Format in Attributes

In layman’s terms RDFa is a way of embedding sense and definition into a document in such a way that non-human agents (machines and computer programs) can read and “understand” the content. RDFa is one mechanism for building the Multilingual Semantic Web.

If you right-click this page in your browser and choose “View Source” you’ll see that it contains attributes (things which allow generic HTML tags to have more unique characteristics) such as property and typeof. These allow web robots to understand those parts of the content that I have decorated at a much more fundamental level. For example, that I created the page, the vocabulary that I have used to describe people, organisations and concepts within the document, and details about them. This data can form the basis of wider inferences regarding personal and content relationships.

Internationalization Tag Set 2.0

ITS 2.0 is a brand new W3C standard which is being funded through the European Commission as the Multilingual Web (Language Technologies) Working Group; part of the W3C Internationalization Activity. Its goal is to define categories of metadata relating to the production and publishing of multilingual web content.

To exemplify this, the overview of ITS 2.0 below was translated from German to English using the Microsoft Bing machine translation engine. Viewing the source of this page and searching for “its-” will locate ITS Localization Quality metadata that I annotated the translations with so as to capture my review of the target English.

“The goal of MultilingualWeb LT (multilingual Web – language technologies) it is to demonstrate how such metadata encoded, safely passed on and used in various processes such as Lokalisierungsworkflows can be and frameworks such as Okapi, machine translation, or CMS systems like Drupal.
Instead of a theoretical and institutional approach to standardization, LT-Web aims to develop implementations, which concretely demonstrates the value of metadata with real systems and users. The resulting conventions and results are documented and published as a W3C standard, including the necessary documentation, data and test suite, as the W3C standardization process requires it.”

Summary

I’m very excited about Global Intelligent Content. This post is a very small and personal contribution to the vision but hopefully it illustrates in a simple way what it is about and some of its possibilities.