Monthly Archives: June 2013

Train To Belfast

Train journeys are a great opportunity to experience a place from a different perspective and learn something new. (I find it difficult to engage in activities that don’t have a purpose.)

The initial part of the train journey from Dublin to Belfast follows the eastern coast of Ireland. There are some wonderful, long sandy beaches on this stretch.

As the train nears Drogheda it vears more inland so as to cross the River Boyne which separates county’s Louth and Meath. The segment between Drogheda and Dundalk is characterised by green farm land and large detached houses. Then as you near Newry there are more hills.

I decided I wanted to take a second run through of Andrew Ng’s Machine Learning course available free on Coursera. I have taken several of Coursera’s coursers (knowledge junkie) and I have to say that Andrew stands out as an awesome teacher. Despite the terminology-laden and mathematical nature of the course content, Andrew puts the concepts in context, explains them clearly and moves through the syllabus in a series of logical, developmental steps which are easy to assimilate.

Newry – my stop. Heading to Freddy-fest where no doubt the liberal consumption of alcohol will destroy precisely the brain cells that hold the recently refreshed knowledge of linear regression.

Fast and Loose

About a year ago we started to think about the cloud and how it could help us. Should we put our relational databases in the cloud and stop having to worry about their size? Should we try to put our network and compute heavy processes in the cloud freeing up internal compute and network bandwidth? Could it just make us more agile and less capital expenditure sensitive in responding to compute and storage requirements?

We prevaricated on these questions for a while because in hindsight we didn’t understand the mindset change. To me it was like moving from procedural, interpreted languages to compiled, object oriented.

We finally “got it” when we had the challenge of trying to efficiently¬†produce results for an unpredictable number of concurrent new compute heavy tasks.

After optimizing algorithms, our initial perception was that all we needed to do was throw more processors at it. How wrong we were. Next up, after making computation parallel was data storage and memory requirements – the bottleneck had shifted from compute to storage and retrieval (I/O). Seriously, I/O can slow things down considerably. After finding a solution to that it was how do we scale out and not up – more processors rather than bigger processors.

Eventually we came to understand that a true cloud architecture makes use of many patterns or paradigms: fault tolerant service bus (message queue), several compute instances, noSql data storage, web service endpoints and thinking of every operation as asynchronous.

What you end up with is a loosely coupled but highly fault tolerant, highly scalable, flexible, separated concerns configuration.

We are close to deploying this new platform and I’m very excited about it. We have no single points of failure, unlimited extensibility points – a very robust and scalable infrastructure.

We’ve prototyped bits of this on both AWS and Azure and are confident that deployment on either is workable.

No doubt we’ll hit limitations or problems at some point but right now the return on investment looks unbeatable.

Review and Post-editing go Analyitical

On 27th May we finalised release 1.0 of Reviewer’s Workbench. RW represents the culmination of several strands of research and development that I had been involved with over the last couple of years.

In 2011 I set up Digital Linguistics¬†to sell Review Sentinel. Review Sentinel is the world’s first Text Analytics based Language Quality Assurance technology. I first publicly presented Review Sentinel at the TAUS User Conference held in Seattle in October, 2012.

In January 2012 I became a member of the Multilingual Web Language Technologies Working Group. Funded by the EU, this Working Group of the W3C is responsible for defining and publishing the ITS 2.0 Standard. ITS 2.0 is now in Final Call stage of the W3C process.

I can safely assert that Reviewer’s Workbench is the world’s first editor to utilise text analytics and other metadata that can be encoded with ITS 2.0 – such as machine translation confidence scores, term disambiguation information and translation process related provenance – to bring new levels of performance to the tasks of linguistic review and post-editing. What’s more is that Reviewers Workbench is completely inter-operable with industry standards like XLIFF and toolsets such as the Okapi Framework.

Reviewer’s Workbench allows you to personalise the visualisation of all this available important, contextual and useful data to inform and direct post-editing and linguistic review effort.

r1_0interface

This is just the beginning. Feature set planning for release 2.0 is already very advanced and includes more state-of-the-art facilities. Stay tuned!