Tag Archives: MLW-LT

2014 Ramp Up

The beginning of 2014 has been a hive of activity: San Francisco, Boston, Luxembourg, team re-structure, and new research interests.

Until now a hectic travel schedule has meant generally more input (reading reports, e-learning videos, and thinking through ideas) than physical output. However, with the increasing prevalence of in-flight wifi and Google QuickOffice for iPad, I have been able to work more in real-time despite being on the move.

With the extended travel and inevitably longer periods of being sat on my backside, I wanted to get sorted with a Sit Stand Desk. This is actually quite difficult if you want to avoid spending several hundred Euro and not have to get Allen Keys and screwdrivers out. My solution is this:

Budget Sit Stand Table

Seriously, I’m delighted with it. It is portable, easily stored, is stable, spacious (450 x 1,350mm) and fully adjustable between 610-1,020mm. Total cost including purchase, transport and installation: €130.

One benefit of travel is of course meeting friends and business associates. San Francisco was the chance to socialize with our friends and development partners, Spartan Software.

Spartan Dinner

Kevin Lew, Yan Yu, Scott Schwalbach, Chase Tingley, Paul Magee, Chris Pimlott, Me.

Luxembourg was an opportunity: meeting many of the Multilingual Web LT Working Group. I hope to have an opportunity to work with these people in the future.


Pedro Díez Orzas, Milan Karasek, Me, Stephan Walter, Jirka Kosek, Felix Sasaki, Dave Lewis, David Filip, Tadej Stajner.

Multilingual Europe Technology Alliance

For the last three days I have been at META-FORUM 2013 in Berlin.

Myself and many of my Multilingual Web – Language Technologies partners were presenting our project final deliverables. My presentation was well received though didn’t generate as much coffee break interaction as I’d hoped. Nevertheless one of those conversations is worth following up on. [Update 2013-10-23: My presentation can be viewed on YouTube.]

Overall MLW-LT received great feedback from our Project Manager within the EU. I attribute our success to our lead coordinator, Felix Sasaki of DFKI, and the unique mix of project partners. I would certainly want to work with them again.

The conference was well attended – 260 I think was the figure. This was probably due to the fact that Kimmo Rossi of the DG – CNECT was presenting on Horizon 2020 and CEF.

One of the highlights of the conference were the keynote by Daniel Marcu, Chief Science Officer of SDL, who’s talk was entertaining, engaging, honest and thought provoking. I wish I had recorded it on my iPad but that is distracting and hopefully it’ll be available on the conference web site eventually.

It was great to meet Sebastian Hellmann of University of Leipzig in person. Sebastian is lead editor of the NLP Interchange Format 2.0 specification.

Good trip though I didn’t get to see much of Berlin this time around.

Review and Post-editing go Analyitical

On 27th May we finalised release 1.0 of Reviewer’s Workbench. RW represents the culmination of several strands of research and development that I had been involved with over the last couple of years.

In 2011 I set up Digital Linguistics to sell Review Sentinel. Review Sentinel is the world’s first Text Analytics based Language Quality Assurance technology. I first publicly presented Review Sentinel at the TAUS User Conference held in Seattle in October, 2012.

In January 2012 I became a member of the Multilingual Web Language Technologies Working Group. Funded by the EU, this Working Group of the W3C is responsible for defining and publishing the ITS 2.0 Standard. ITS 2.0 is now in Final Call stage of the W3C process.

I can safely assert that Reviewer’s Workbench is the world’s first editor to utilise text analytics and other metadata that can be encoded with ITS 2.0 – such as machine translation confidence scores, term disambiguation information and translation process related provenance – to bring new levels of performance to the tasks of linguistic review and post-editing. What’s more is that Reviewers Workbench is completely inter-operable with industry standards like XLIFF and toolsets such as the Okapi Framework.

Reviewer’s Workbench allows you to personalise the visualisation of all this available important, contextual and useful data to inform and direct post-editing and linguistic review effort.


This is just the beginning. Feature set planning for release 2.0 is already very advanced and includes more state-of-the-art facilities. Stay tuned!


Standards in the Park

Standards in the Park

On the 7th and 8th of May the Multilingual Web – Language Technologies Group met at the Hotel Park in Bled, Slovenia. Bled is a stunningly beautiful town alongside Lake Bled and situated close to the Austrian border.


Specification and implementation work is progressing well on ITS 2.0 and we are giving some focus to outreach activities with the goal of getting broad adoption of the standard. If you haven’t yet heard of ITS 2.0 or, have heard of the project but don’t know how it could help you, I invite you to visit these resources:

We plan to publish targeted flyers on the use, benefits and details of individual aspects of the standard in the near future but these will get you started and I will be sure to post resource locations for the flyers when they are available.

Several working group members will be presenting at FEISGILTT 2013 (11-12th June 2013) in London which is once again co-hosted with Localization World (12-14th June 2013). This will be a great opportunity to see applications of the standard demonstrated live and be able to talk with members of the working group.

I am also happy to receive email enquiries at my VistaTEC address.

Brains Trust: Karl Fritsche, Jirka Kosek, Milan Karasek, Pablo Nieto, Yves Savourel, Arle Lommel, Felix Sasaki, David Filip, Mauricio del Olmo, Tadej Štajner, David Lewis and Pedro Luis Díez Orzas

Brains Trust: Karl Fritsche, Jirka Kosek, Milan Karasek, Pablo Nieto, Yves Savourel, Arle Lommel, Felix Sasaki, David Filip, Mauricio del Olmo, Tadej Štajner, David Lewis and Pedro Luis Díez Orzas.

A Week Out West

I’m fortunate that my job gives me the opportunity to travel. Last week it was California. The week started well with a positive customer meeting and the arrival of a new employee in our Mountain View office.

Over the course of a number of years with one of our customers we have had a great opportunity to automate and integrate a significant number of business processes. Like ourselves, our customer thrives on and enjoys continuously reinventing, iterating and improving tools. (I’m reluctant to use the word “innovate” as it’s becoming over-used but the term would certainly describe what both of us do regularly.) The exciting possibility that came out of last week’s conversations with their scarily bright engineering team is for us to build a truly scalable, cloud hosted, service bus based business rules engine using data regularly polled from their web services API endpoints.
In addition to existing business related discussions I was also able to utilise the trip to evangelise my more research-based interests and present and get early feedback on new products on the horizon such as ITS 2.0, Linked Data, Review Sentinel and Reviewer’s Workbench.
The one potentially tedious aspect of business travel is the actual relocation of your body geographically. I always prepare well to combat the boredom that journeys – aided by delays – can bring. Tooled up with Kindle, iPad and paperback’s (for use during takeoff and landing) I used the time to catch up (somewhat belatedly) on Breeze, Moderizr, Require, Knockout, Font Awesome and Bootstrap all courtesy of John Papa’s Pluralsight course.
The week also provided the chance to catch up with one of our outsource development partners, Spartan Software in person. Google Hangout doesn’t yet replace the experience of enjoying a beer together. Spartan have been building Reviewer’s Workbench for us. Reviewer’s Workbench is our implementation of the W3C Multilingual Web Internationalization Tag Set 2.0

A Personal Contribution to Global Intelligent Content

Global Intelligent Content

As Chief Technology Officer of VistaTEC, I was fortunate to be one of the founding Industrial Partners of the Science Foundation Ireland funded Centre for Next Generation Localisation (CNGL). CNGL has just received support for a further term with the overall research theme of “Global Intelligent Content”. I therefore thought it appropriate that my first post should actively demonstrate and support this vision.

So, what’s so “intelligent” about this post?

If you have any basic understanding of HTML you’ll know that the page you’re reading is composed of mark-up tags (elements) such as <p>, <span>, and <h1>, etc. The mark-up allows your browser to display the page such that it is easy to comprehend (i.e. headings, paragraphs, bold, italic, etc.) and also interact with (i.e. hyperlinks to other related web documents). You may also know that it can contain “keywords” or “tags”: individual words or phrases which indicate to search engines what the subject matter of this post is. The post certainly does contain all of these.

The page also includes a lot of “metadata“. This metadata conforms to two standards each of which is set to transform the way in which multilingual intelligent content is produced, published, discovered and consumed.

Resource Description Format in Attributes

In layman’s terms RDFa is a way of embedding sense and definition into a document in such a way that non-human agents (machines and computer programs) can read and “understand” the content. RDFa is one mechanism for building the Multilingual Semantic Web.

If you right-click this page in your browser and choose “View Source” you’ll see that it contains attributes (things which allow generic HTML tags to have more unique characteristics) such as property and typeof. These allow web robots to understand those parts of the content that I have decorated at a much more fundamental level. For example, that I created the page, the vocabulary that I have used to describe people, organisations and concepts within the document, and details about them. This data can form the basis of wider inferences regarding personal and content relationships.

Internationalization Tag Set 2.0

ITS 2.0 is a brand new W3C standard which is being funded through the European Commission as the Multilingual Web (Language Technologies) Working Group; part of the W3C Internationalization Activity. Its goal is to define categories of metadata relating to the production and publishing of multilingual web content.

To exemplify this, the overview of ITS 2.0 below was translated from German to English using the Microsoft Bing machine translation engine. Viewing the source of this page and searching for “its-” will locate ITS Localization Quality metadata that I annotated the translations with so as to capture my review of the target English.

“The goal of MultilingualWeb LT (multilingual Web – language technologies) it is to demonstrate how such metadata encoded, safely passed on and used in various processes such as Lokalisierungsworkflows can be and frameworks such as Okapi, machine translation, or CMS systems like Drupal.
Instead of a theoretical and institutional approach to standardization, LT-Web aims to develop implementations, which concretely demonstrates the value of metadata with real systems and users. The resulting conventions and results are documented and published as a W3C standard, including the necessary documentation, data and test suite, as the W3C standardization process requires it.”


I’m very excited about Global Intelligent Content. This post is a very small and personal contribution to the vision but hopefully it illustrates in a simple way what it is about and some of its possibilities.