Tag Archives: Post-editing

Review Sentinel Workflow Integration

I have been working on improving the physical integration of Review Sentinel with SDL WorldServer using Microsoft Azure Blob Storage and SendGrid notifications.

Programming to the Enterprise Service Bus architectural model always makes me smile. Loosely-coupled applications which collaborate to provide a distributed, scalable, fault-tolerant workflow.

The diagram below shows the overall architecture. The small bit of impedence is the lack of support in SDL WorldServer for the ITS 2.0 Localisation Quality metadata category which Review Sentinel uses to serialise its conformance scores within XLIFF.

WS_ReviewSentinel_Integration

Crunching Post-editing Numbers

I have spent the last few days crunching numbers which relate to the post-editing of a 17,000 source word document that has been machine translated into three languages.

The reason for spending a few days at this is that for each document in each language I have the following data available to me:

  • The time in seconds spent editing each segment;
  • A Review Sentinel Conformance Score for each segment;
  • Raw machine translation output and post-edited target string for each segment thus allowing me to generate TER and GTM scores.

With plenty of data comes plenty of work. Many of the automated metrics utilities work with plain text and have no concept of inline tags. This means lots of work converting from one serialisation format to another and re-formatting or removing tags. Once again I have found PowerGREP very helpful during this process.

A final challenge is that most automatic metrics tools just report line based measures using the input line number rather than any original identification number (that probably had to be stripped out anyway). This means that a slight error in line totals or order can really throw results out.

I’m hoping to identify some interesting correlations and insights. Stay tuned.