Monthly Archives: January 2014

Variety is the Spice of Life

January has been wonderfully varied on the work front. Most days have brought new learning.

In thinking about my R&D Manifesto I decided it was time to revisit Neural Networks and Semantics. I’m not adverse to learning new development languages and environments but when you want to evaluate ideas quickly one will tend towards the familiar. For this initial reason Encog looks interesting.

The Centre for Global Intelligent Content, which VistaTEC have been industry partners of since its inception, received funding for a further two and a half years in October of last year. As a consequence there have been numerous meetings. It’s really exciting to see how the centre has evolved and honed its process for bringing innovation to the market. A key element of this process is the d.lab under the direction of Steve Gotz. In my view Steve has been one of the notable personalities in CNGL. He has a great broad knowledge of the technology, innovation and start-up landscapes and excellent business acumen. Two interesting pieces of technology were shown to centre members recently. The first named MTMPrime is a component which in real-time can assess translation memory matches along side of machine translation output and based on confidences recommend which one to use. The second is a machine translation incremental learning component which can profile a document and suggest the most efficient path to translating it given the algorithm’s analysis of the incremental benefit that would be realized from translating segments in a particular order. Basically it works out the bang-for-buck for translating segments.

In discussing semantics and disambiguation Steve pointed me at Open Calais. This is a service which like Enrycher parses content and automatically adds semantic metadata for named entities and subjects that it “recognizes”. The picture below shows the result of assign this post through the Open Calais Viewer.

open_calais

We’ve had some very interesting customer inquiries too. Too early to talk about them but I hope that we get more requests for these types of engagements and services. If any come to fruition I’ll blog about them later.

Finally, we did some small updates to Ocelot:

  • New configuration file for plug-ins,
  • Native launch experience for Windows and Mac, and
  • Native hot-key experiences within Windows and Mac interfaces.

Long may this variety continue.

Winter Evening Research

I try to maximize the dark evenings for evaluating tools and researching topics that I don’t get time for during a normal working day.

First up for the New Year is LanguageTool. Automated Quality Assurance is an important aspect of many production processes and at VistaTEC we have deployed various commercial and in-house tools. My motivation for looking at LanguageTool was its use of Part of Speech tagging and user definable rules which can be combined with regular expressions to encode sophisticated linguistic checks.

My test domain was Marketing translations. Content full of emotive, symbolic and suggestive language.

The custom rule encoding is necessarily verbose given that the serialisation format is XML. Our internal tool, Cerberus, suffers from the same characteristics – elements and escaping of regular expression meta characters. Some of the advanced rule constructs are initially difficult to grasp. We ran many small tests in order to get to understand the operation of functionality like skip scope. Hopefully the evolving Rule Editor will support more advanced rule constructs soon and this will aid learning and perhaps speed up rule writing.

We were not able to build rules for all of the constructs that we wanted. Extending the tool via Java for these is a possibility. That will be another days work.

Next up has been Neural Networks. I’ve been viewing Andrew Ng’s Machine Learning lectures on Coursera. Learning is always helped by consulting several references and James McCaffrey’s articles have been straight-forward to understand. I really like that James included worked examples in his articles – it’s a great way of being able to check your own understanding (or lack of). Finally, it’s nice that the code examples are in C# rather than the ubiquitous Python (I can read it but let’s say it’s not my native tongue).

I’d prefer to be resident in warmer climes between November and March so that I could be out more without the need for layers of thermal underwear but I do love the different seasons.