Tag Archives: Machine Translation

Serverless Machine Translation

It is well known that you can produce relatively good quality machine translations by doing the following:

  • Carry out some processing on the source language.
    Such as remove text which serves no purpose in the translations (say, imperial measurements in content destined for Europe); re-order some lengthy sentences; mark the boundaries of embedded tags, etc.
  • Use custom domain trained machine translation engines.
    This is possible with several machine translation providers. If you have an amount of good quality bilingual and monolingual corpora relevant to your subject matter then you can train and build engines which will produce higher quality output than a general public domain engine.
  • Post process the raw machine translation output to correct recurrent errors.
    To improve overall fluency; replace specific terminology, etc.

We decided to implement this in a fully automated Azure Functions pipeline.

NOTE: Some MT providers have this capability built into their services but we wanted the centralized flexibility to control the pre- and post-editing rules and to be able to mix and match which MT providers we get the translations from.

The pipeline consists of three functions: preedit, translate and postedit. The json payload used for inter-function communication is Jliff. Jliff is an open object graph serialization format specification being designed by an OASIS Technical Committee.

NOTE: Jliff is still in design phase but I’m impatient and it seemed like a good way to test the current snapshot of the format.

The whole thing is easily re-configured and re-deployed, and has all the advantages of an Azure consumption plan.

We can see that this pipeline would be a good candidate for durable functions so once we have time we’ll take a look at those.

Halfway Mark

Wow, June already. Time flies in the enjoyable world of translation and technology.

I embraced the cloud 6 years ago having evaluated the benefits of Platform and Software as a Service and believed in, what was then, a future vision of all kinds of intelligent distributed services which would be impossible to achieve with a private, internal infrastructure. It was interesting to see that light bulb flash on for non-cloud using attendees at Microsoft’s Red Shirt Dublin event with Scott Guthrie last week.

scottgu

 

Scott took us on a whistle-stop tour of Azure facilities from Functions (a few lines of code executing logic on demand) to arrays of GPU’s running Deep Learning algorithms capable of doing face recognition and sentiment analysis.

 

Within the development team at work our utilization of such technologies continues: Neural Network Machine Translation; Adaptive Machine Translation; Continuous Integration; Distributed Services; and Serverless functions and logic.

At the Research end of the scale, having successfully completed our most recent European Project, I’ve been re-engaging with local research centers and interest groups. This month’s and last month’s Machine Learning Meetups were testament to how dominant Deep Learning is in driving business success and competitiveness.

And because working hard has to be balanced by playing hard I’ve ramped up sailing to three times a week.

1720

 

The Cork 1720’s I go out in are just wonderful boats.

 

 

 

We started the year with some operationally complex, significant impact projects. Progress has been slower than I would have liked but ensuring we have a solid base upon which to build is critical to the overall success. My impatience is to realize some of the potential gains now but the collateral risk is too high. So, at the midpoint we are looking at a busy next two quarters to get everything we want done but the team is well capable.

Polymath Service Provider

Over the Christmas break I started to reflect on the nature of service provision in the Language Services industry in the light of new technologies coming out of machine learning and artificial intelligence advances and my own predictions of the influences upon the industry and the industry’s response to them.

There are the recent announcements of adaptive and neural network machine translation; pervasive cloud platforms with ubiquitous connectivity and cognitive capabilities; an upsurge in low-cost, high-benefit open source tooling and frameworks; and many mature api’s and standards.

All of these sophisticated opportunities really do mean that as a company providing services you have to be informed, adaptable, and agile; employ clever, enthusiastic people; and derive joy and satisfaction from harnessing disruptive influences to the benefit of yourselves and your customers.

I do have concerns: How do we sustain the level of investment necessary to stay abreast of all these influences and produce novel services and solutions from them in an environment of very small margins and low tolerance to increased or additional costs?

Don’t get me wrong though. Having spent the last 10 years engaging with world-class research centers such as ADAPT, working alongside thought leading academics and institutions such as DFKI and InfAI, participating in European level Innovation Actions and Projects, and generally ensuring that our company has the required awareness, understanding and expertise, I continue to be positive and enthusiastic in my approach to these challenges.

I am satisfied that we are active in all of the spaces that industry analysts see as being currently significant. To whit: ongoing evaluations of adaptive translation environments and NMT, agile platforms powered by distributed services and serverless architectures, Deep Content (semantic enrichment and NLP), and Review Sentinel (machine learning and text classification).

Less I sound complacent, we have much more in the pipeline and my talented and knowledgeable colleagues are excited for the future.

Machine Translation Pipeline Meets Business Trends

This week we will carry out final integration and deployment tests on our distributed pipeline for large scale and continuous translation scenarios that heavily leverage the power of machine translation.

We have built this platform as we recognised the demands and trends that are being reported by industry experts like Common Sense Advisory and Kantan.

The platform features several configurable services that can be switched on as required. These include:

  • automated source pre-editing prior to passing to a choice of custom machine translation engines;
  • integrated pre-MT translation memory leverage;
  • automated post-edit of raw machine translation prior to human post-edit;
  • in-process, low-friction capture of actionable feedback on MT output from humans;
  • automated post-processing of human post-edit;
  • automated capture of edit distance data for BI and reporting.

The only component missing that will be integrated during May is the text analysis and text classification algorithms which will give us the ability to do automated quality assurance of every single segment. Yes, everything – no spot-checking or limited scope audits.

The platform is distributed and utilises industry standard formats including XLIFF and ITS. Thus it is wholly scalable and extensible. Of note is that this platform delivers upon all six of the trends recently publicised by Kantan. Thanks to Olga O’Laoghaire who made significant contributions to the post-editing components and Ferenc Dobi, lead Architect and developer.

I’m very excited to see the fruition of this project. It doesn’t just represent the ability for us to generate millions of words of translated content, it delivers a controlled environment in which we can apply state-of-the-art techniques that are highly optimised at every stage, measurable and designed to target the goal of fully automated usable translation (FAUT).

Rapid Process Change

In January we commenced an enterprise subscription with translation management technology provider, XTM. It is well known that since 2008 our primary workflow backbone has been powered by WorldServer. So what was our motivation to try XTM?

The primary drivers for investing time and effort in XTM are:

  • Out-of-the-box connectivity to any other instance of XTM;
  • A larger choice of connectivity to machine translation systems out-of-the-box;
  • A desire to work with a cloud hosted platform (for several reasons);
  • Unlimited integration opportunities.

Having carried out a superficial evaluation we felt we had some candidate customers and projects for which we could do some comprehensive pilots. However, in this business plans always change and our first project turned out to be very large: circa 30 million words across 16 languages with a TM/MT+PE workflow.

This project also has some other interesting characteristics:

  • Projects come to us from WorldServer as XLIFF into new components of our distributed workflow platform (currently this step is manual but planned to be automated);
  • By the end of October we plan that all aspects of the workflow, other than some human post-editing, will be completely automated (I say some human post-editing because automatic post-editing will also play a part);
  • It features challenging turnaround times which despite unforeseen issues, we’ve been able to meet.

Projects this large with a fast ramp-up time will soon identify process and technology weaknesses. XTM support have worked diligently with us to provide new and enhanced facilities and bug fixes to keep everything on track.

By the end of September I hope that we will have completed a second significant customer integration which uses some of this same infrastructure.

A Busy 4 Months

The last four months have been some of the busiest and productive I’ve known. Achievements relate to Machine Translation, dash-boarding, workflow integration, Ocelot and the FREME Project.

MT, dash-boarding and workflow integration are a widening of our adoption 18 months ago of enterprise integration patterns. Two new strategic hires covering the skill sets of ASP.NET Web API, AngularJS, Java Spring and SOAP services are about to deliver high impact tools and automation.

Ocelot and FREME have also made great progress following my third recent hire.

I finally joined the XLIFF Technical Committee and look forward to contributing to what I believe is a fundamental technology in our industry.

I’ll try to post more about each of these in the near future. Suffice to say the rumors of my demise based on lack of posts are unfounded.

Machine Translation of Software

Software distinguishes itself as a content type by being serialized in many different formats along with optional metadata describing the type of user interface control it will appear on, what user interface real-estate it should occupy and possibly other related data. When working on cross-platform or multi-platform products, one invariably bumps up against several of these serialization (resource) formats.

It is standard practice on localization projects to reuse as much translation as possible from previous releases, saving time and effort in quality assurance and, of course, cost. This process of leveraging work from one product release to the current one should be done with attention to context. It’s a real necessity then to have tools support when working on scenarios such as these.

We are embarking on a large enterprise software project in which we want to utilise machine translation in addition to reusing previous human translations. I’d like to give a shout out to Cristiano Maggi and Enda McDonnell at long time business partners, Alchemy Software, for giving us the facilities and assistance to build a Microsoft Translators Hub connector for Catalyst 11.

Now we can safely and reliably employ a Translation Memory/Machine Translation translation process across a project which involves many software resource formats.