Monthly Archives: March 2015

Machine Translation of Software

Software distinguishes itself as a content type by being serialized in many different formats along with optional metadata describing the type of user interface control it will appear on, what user interface real-estate it should occupy and possibly other related data. When working on cross-platform or multi-platform products, one invariably bumps up against several of these serialization (resource) formats.

It is standard practice on localization projects to reuse as much translation as possible from previous releases, saving time and effort in quality assurance and, of course, cost. This process of leveraging work from one product release to the current one should be done with attention to context. It’s a real necessity then to have tools support when working on scenarios such as these.

We are embarking on a large enterprise software project in which we want to utilise machine translation in addition to reusing previous human translations. I’d like to give a shout out to Cristiano Maggi and Enda McDonnell at long time business partners, Alchemy Software, for giving us the facilities and assistance to build a Microsoft Translators Hub connector for Catalyst 11.

Now we can safely and reliably employ a Translation Memory/Machine Translation translation process across a project which involves many software resource formats.

FREME

The web site for our new European Commission funded Horizon 2020 project went live on 2015-03-27. I’m very excited about this project. It encompasses many important current topics: Big Linguistic Linked Data; The Semantic Web; NLP Technologies; Linguistic Linked Data Interoperability and Intelligent and Enriched Content.

My goals for the project include new features for our open sourced editor, Ocelot. The planned features will further integrate it with other linguistic technologies and standards, not least the Semantic Web and Linked Linguistic Data Clouds themselves.

Having missed the project kick-off in Berlin in February, I’m looking forward to meeting all of the world-class academic and industry partners.

 

My crazy idea for NIF

I was recently invited to join a LIDER call to talk about my Use Case idea’s for NIF. Here’s what the is:

Context

We often have to provide translations for content which is non-literal but rather more metaphorical/idiomatic. For example, “My destination is only a ‘hop-and-a-skip’ from my home.” might get translated as “Mein Ziel ist nur ein ‘Katzen-sprung’ von meinem Zuhause.”.

Describing in NIF with relationship

I suggest that you could model this translation as:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix vt: <http://www.vistatec.com/rdf> .

<http://example.com/exampledoc.html#char=0,55>
a nif:Context ;
a nif:RFC5147String ;
nif:beginIndex "0" ;
nif:endIndex "178" ;
nif:isString "My destination is only a 'hop-and-a-skip' from my home." .

<http://example.com/exampledoc.html#char=26,40>
nif:beginIndex "26" ;
nif:endIndex "40" ;
a nif:RFC5147String ;
itsrdf:hasLocQualityIssue [
a itsrdf:LocQualityIssue ;
itsrdf:locQualityIssueType "uncategorized" ;
];
nif:referenceContext <http://example.com/exampledoc.html#char=0,178>.,/code>

<http://example.com/exampledoc-de.html#char=0,57>
a nif:Context ;
a nif:RFC5147String ;
nif:beginIndex "0" ;
nif:endIndex "178" ;
nif:isString "Mein Ziel ist nur ein 'Katzen-sprung' von meinem Zuhause." .

<http://example.com/exampledoc-de.html#char=23,36>
nif:beginIndex "23" ;
nif:endIndex "36" ;
a nif:RFC5147String ;
itsrdf:hasLocQualityIssue [
a itsrdf:LocQualityIssue ;
itsrdf:locQualityIssueType "uncategorized" ;
];
nif:referenceContext <http://example.com/exampledoc-de.html#char=0,57>.


<http://example.com/exampledoc.html#char=26,40>
vt:translatedAs <http://example.com/exampledoc-de.html#char=23,36>.

Surely then this model could be extended to give a comprehensive representation of such non-literal translations in a way that NLP tools could consume.