Monthly Archives: July 2014

Crunching Post-editing Numbers

I have spent the last few days crunching numbers which relate to the post-editing of a 17,000 source word document that has been machine translated into three languages.

The reason for spending a few days at this is that for each document in each language I have the following data available to me:

  • The time in seconds spent editing each segment;
  • A Review Sentinel Conformance Score for each segment;
  • Raw machine translation output and post-edited target string for each segment thus allowing me to generate TER and GTM scores.

With plenty of data comes plenty of work. Many of the automated metrics utilities work with plain text and have no concept of inline tags. This means lots of work converting from one serialisation format to another and re-formatting or removing tags. Once again I have found PowerGREP very helpful during this process.

A final challenge is that most automatic metrics tools just report line based measures using the input line number rather than any original identification number (that probably had to be stripped out anyway). This means that a slight error in line totals or order can really throw results out.

I’m hoping to identify some interesting correlations and insights. Stay tuned.

Accessing Content

During the last six weeks we’ve seen a proliferation of requests for connections to (integration’s with) Content Management Systems (including content silos and repositories): GitHub, WordPress, Drupal, Marketo, WebCenter Sites and Zendesk.

Wanting to connect content to a translation pipeline is a logical and reasonable ask. You want to get content out to the largest audience with the least process friction and publishing simultaneously to several geographies is a standard want.

It’s actually not that easy to achieve though even in these days of well known internationalization and localization principles and Application Programming Interfaces.

Firstly, few CMS’ conform to a standard interface like CMIS or data format like XLIFF. Then, even if the CMS has an API it might not actually let you get to the content assets but rather just let you remotely initiate actions. Finally, as a connector developer you need to know a suitable programming language (at least REST API’s are mostly programming language agnostic), the content model of the CMS and the content model of the translation pipeline (or transport/routing middleware if you’re using some level of indirection like Clay Tablet).

I spoke with Marketo and they confirmed that their API doesn’t allow access to campaign assets like emails, landing pages and brochures. They recommended giving translators reduced permission accounts to the Marketo platform and translate within the application. This comes with risks (as Bryan Schnabel will attest to) and limits the use of translation productivity tools. Zendesk themselves seem to use a system of Python scripts.

If any people out there have experience with writing connectors to any of the content platforms mentioned above I’d like to hear about the details.