Tag Archives: Review Sentinel

Polymath Service Provider

Over the Christmas break I started to reflect on the nature of service provision in the Language Services industry in the light of new technologies coming out of machine learning and artificial intelligence advances and my own predictions of the influences upon the industry and the industry’s response to them.

There are the recent announcements of adaptive and neural network machine translation; pervasive cloud platforms with ubiquitous connectivity and cognitive capabilities; an upsurge in low-cost, high-benefit open source tooling and frameworks; and many mature api’s and standards.

All of these sophisticated opportunities really do mean that as a company providing services you have to be informed, adaptable, and agile; employ clever, enthusiastic people; and derive joy and satisfaction from harnessing disruptive influences to the benefit of yourselves and your customers.

I do have concerns: How do we sustain the level of investment necessary to stay abreast of all these influences and produce novel services and solutions from them in an environment of very small margins and low tolerance to increased or additional costs?

Don’t get me wrong though. Having spent the last 10 years engaging with world-class research centers such as ADAPT, working alongside thought leading academics and institutions such as DFKI and InfAI, participating in European level Innovation Actions and Projects, and generally ensuring that our company has the required awareness, understanding and expertise, I continue to be positive and enthusiastic in my approach to these challenges.

I am satisfied that we are active in all of the spaces that industry analysts see as being currently significant. To whit: ongoing evaluations of adaptive translation environments and NMT, agile platforms powered by distributed services and serverless architectures, Deep Content (semantic enrichment and NLP), and Review Sentinel (machine learning and text classification).

Less I sound complacent, we have much more in the pipeline and my talented and knowledgeable colleagues are excited for the future.

A Prime Year

So 2017! Let’s hope you turn out to be a good one.

I guess traditionally I should be using this post to make my predictions about the industry and technologies I’m engaged in to demonstrate thought leadership. The truth is I think I’m going to be arrogant and let the industry catch up a bit first with all of the innovations I have spent the last two years working on. Sure, I have forward looking plans based on what I think will be prevalent trends and requirements in the year ahead but sometimes you have to live in the moment and execute on what is imminent.

My team will be busy through Q1 with the migration of a large part of our operations to Plunet. Then we have the ramping up of a major new account that we were awarded last year.

Development has re-started slowly it has to be said with some small tasks that were started before Christmas taking an annoyingly long time to get finished. Or maybe I’m getting increasingly impatient.

Q1 will almost certainly see an update to Ocelot. We are trying to simplify and expand the configuration of plug-ins and the user interface so that you launch with windows and tools in a state ready to go.

I’ve resumed the Angular 2 and F# learning I started last year. Using Angular I’m writing a web-based string translation editor component which I hope will be simple in operation and blisteringly fast. With F# I started by writing some basic NLP utilities such as string tokenization and n-gram generation. I would like to try and re-write some of my Review Sentinel machine learning algorithms but I’d be surprised if I get all of that done this year.

So I’m going to finish this post by wishing you all and your families a safe, enjoyable, productive, and happy year.

Machine Translation Pipeline Meets Business Trends

This week we will carry out final integration and deployment tests on our distributed pipeline for large scale and continuous translation scenarios that heavily leverage the power of machine translation.

We have built this platform as we recognised the demands and trends that are being reported by industry experts like Common Sense Advisory and Kantan.

The platform features several configurable services that can be switched on as required. These include:

  • automated source pre-editing prior to passing to a choice of custom machine translation engines;
  • integrated pre-MT translation memory leverage;
  • automated post-edit of raw machine translation prior to human post-edit;
  • in-process, low-friction capture of actionable feedback on MT output from humans;
  • automated post-processing of human post-edit;
  • automated capture of edit distance data for BI and reporting.

The only component missing that will be integrated during May is the text analysis and text classification algorithms which will give us the ability to do automated quality assurance of every single segment. Yes, everything – no spot-checking or limited scope audits.

The platform is distributed and utilises industry standard formats including XLIFF and ITS. Thus it is wholly scalable and extensible. Of note is that this platform delivers upon all six of the trends recently publicised by Kantan. Thanks to Olga O’Laoghaire who made significant contributions to the post-editing components and Ferenc Dobi, lead Architect and developer.

I’m very excited to see the fruition of this project. It doesn’t just represent the ability for us to generate millions of words of translated content, it delivers a controlled environment in which we can apply state-of-the-art techniques that are highly optimised at every stage, measurable and designed to target the goal of fully automated usable translation (FAUT).

Review Sentinel Document Profile Dashboard

I have been working for the last few weeks on designing a new intuitive way to visualise Review Sentinel conformance data at a document level.

Ideas have culminated in what I’m calling a Document Profile. This is essentially a scatter chart of the individual segment conformance scores sorted by score in ascending numeric (descending conformance) order. This is plotted and labelled with a single overall numeric indicator for the document.

Conformance scores cannot be naively aggregated (summed or averaged) because a document with a large number of good conformance scores and few very poor conformance scores could conceivably give an overall result similar to a document that contains all medium severity conformance scores.

Instead we have identified a threshold score which ideally the vast majority of segment scores would fall below. We can then express the number of segments below this threshold as a percentage of the whole document.

Having prototyped the graphing in Microsoft Excel, we had to find a way of mimicking it on the web. I settled on two relatively new open source JavaScript libraries: D3 and AngularJS.

D3 was easy to come up to speed on. There are masses of sample charts on their web site and as always a good Pluralsight course. I was able to prototype quickly and easily using Plunker, JSFiddle, etc.

It’s a testament to Angular’s clear, concise and modular architecture that I was able to learn everything I needed to without writing practically a line of code. I have few slots of contiguous focus time these days but I was able to pick an Angular concept (scopes, binding, directives) and study it for an hour here and there (on planes, buses and walks). I then did all of the coding practically in a single sitting.

The finished article is small, modular, elegant and very easily enhanced.

I’ll leave a deployment here for a short while for people to play with. The data is for one US English source document machine translated into three languages: Spanish, French and Brazilian Portuguese. Three charts show the conformance of the raw MT output against a human translated reference corpora and the fourth shows the conformance of the Brazilian Portuguese document after post-editing (using directed post-editing effort in Ocelot).

Old Dog, New Languages

I think I’m becoming a fan of JavaScript – or at least some of its facilities and a few frameworks that are built on top of it.

When you get to know about JavaScript’s beginnings it’s not so surprising why it is quirky.

My desire to up-skill for JavaScript is my desire to render Review Sentinel Document Quality Profiles on the web. So I have been reading and watching a lot about D3.js and AngularJs. D3 has nice function chaining but I find some of its constructs tricky despite understanding its underlying enter, update, exit pattern. Angular I love even though I’ve only touched the surface.

My learning was aided by the PluralSight JavaScript for C# Developers, and AngularJS for .NET Developers courses.

Here’s what the current beta version looks like:

RSDocumentProfile

Review Sentinel Workflow Integration

I have been working on improving the physical integration of Review Sentinel with SDL WorldServer using Microsoft Azure Blob Storage and SendGrid notifications.

Programming to the Enterprise Service Bus architectural model always makes me smile. Loosely-coupled applications which collaborate to provide a distributed, scalable, fault-tolerant workflow.

The diagram below shows the overall architecture. The small bit of impedence is the lack of support in SDL WorldServer for the ITS 2.0 Localisation Quality metadata category which Review Sentinel uses to serialise its conformance scores within XLIFF.

WS_ReviewSentinel_Integration

Disruptive Behaviour

What a great morning! Just finished a presentation and demonstration of the power of combining two projects that are close to my heart: Review Sentinel and Ocelot.
I walked through the scenario of a post-editing workflow and editing session using Review Sentinel, some configured modules of the Okapi Framework and Ocelot. I will post more details later but the customer agreed that the value proposition was totally compelling and proposed a live pilot on the spot.
This is the enjoyable aspect of my job – shanking up existing processes with technology!
And this is just the start. We have at least three other business scenarios where the technology/process combination can destabilize the rug under people’s feet.

Review and Post-editing go Analyitical

On 27th May we finalised release 1.0 of Reviewer’s Workbench. RW represents the culmination of several strands of research and development that I had been involved with over the last couple of years.

In 2011 I set up Digital Linguistics to sell Review Sentinel. Review Sentinel is the world’s first Text Analytics based Language Quality Assurance technology. I first publicly presented Review Sentinel at the TAUS User Conference held in Seattle in October, 2012.

In January 2012 I became a member of the Multilingual Web Language Technologies Working Group. Funded by the EU, this Working Group of the W3C is responsible for defining and publishing the ITS 2.0 Standard. ITS 2.0 is now in Final Call stage of the W3C process.

I can safely assert that Reviewer’s Workbench is the world’s first editor to utilise text analytics and other metadata that can be encoded with ITS 2.0 – such as machine translation confidence scores, term disambiguation information and translation process related provenance – to bring new levels of performance to the tasks of linguistic review and post-editing. What’s more is that Reviewers Workbench is completely inter-operable with industry standards like XLIFF and toolsets such as the Okapi Framework.

Reviewer’s Workbench allows you to personalise the visualisation of all this available important, contextual and useful data to inform and direct post-editing and linguistic review effort.

r1_0interface

This is just the beginning. Feature set planning for release 2.0 is already very advanced and includes more state-of-the-art facilities. Stay tuned!

 

A Week Out West

I’m fortunate that my job gives me the opportunity to travel. Last week it was California. The week started well with a positive customer meeting and the arrival of a new employee in our Mountain View office.

Over the course of a number of years with one of our customers we have had a great opportunity to automate and integrate a significant number of business processes. Like ourselves, our customer thrives on and enjoys continuously reinventing, iterating and improving tools. (I’m reluctant to use the word “innovate” as it’s becoming over-used but the term would certainly describe what both of us do regularly.) The exciting possibility that came out of last week’s conversations with their scarily bright engineering team is for us to build a truly scalable, cloud hosted, service bus based business rules engine using data regularly polled from their web services API endpoints.
In addition to existing business related discussions I was also able to utilise the trip to evangelise my more research-based interests and present and get early feedback on new products on the horizon such as ITS 2.0, Linked Data, Review Sentinel and Reviewer’s Workbench.
The one potentially tedious aspect of business travel is the actual relocation of your body geographically. I always prepare well to combat the boredom that journeys – aided by delays – can bring. Tooled up with Kindle, iPad and paperback’s (for use during takeoff and landing) I used the time to catch up (somewhat belatedly) on Breeze, Moderizr, Require, Knockout, Font Awesome and Bootstrap all courtesy of John Papa’s Pluralsight course.
The week also provided the chance to catch up with one of our outsource development partners, Spartan Software in person. Google Hangout doesn’t yet replace the experience of enjoying a beer together. Spartan have been building Reviewer’s Workbench for us. Reviewer’s Workbench is our implementation of the W3C Multilingual Web Internationalization Tag Set 2.0