I am affected by insomnia on a fairly frequent basis. I don’t use any sleep related gadgets or applications because I’d probably scare the crap out of myself. Let’s see if this post has any cathartic properties.
Tonight’s/this morning’s musings (in no particular order of appearance): graph based machine learning, building an object model library in Typescript, my next career move, health impact of not sleeping well, industry integration/fragmentation dichotomy, current development projects, Brexit, personal relationships, and quality estimation.
It’s going to be tough getting through tomorrow. Sweet dreams.
I have published a web application where you can submit XLIFF 2.x files and get back a JLIFF serialization as text.
JLIFF is a new XLIFF serialization format currently in working draft with the OASIS Object Model and Other Serializations Technical Committee.
The application uses my open source JliffGraphTools library.
I am working on a conversion of XLIFF 1.2 to JLIFF but as the content model is structurally different it’s tricky.
I was careful to implement it in a way that means no data is persisted. I don’t even collect any metadata about what is submitted. That way people can feel confident of about the privacy of their data.
A belated note that back in March I open sourced my utility library for serializing and de-serializing JLIFF.
I was recently asked to talk on a topic that I have some familiarity with. Nevertheless thought it would be good to check back on some earlier slide decks and came across this.
On October 17 we released Ocelot 3.0. See the Release Notes for details.
I really enjoyed attending Unicode 41 this week. Following changes to my role some years back and the fact that the conference is always held on the West Coast of the US, I hadn’t been in a while but I will definitely put it back on my conference agenda. It was great bumping into customers and old friends and seeing the new generation of researchers and engineers address what is essentially the challenge of worldwide communications.
The conference kicked off with a very interesting, entertaining and thought-provoking keynote entitled “Can We Escape Alphabetic Order”, given by Thomas S. Mullaney.
It is quite incredible the degree to which companies are enabling and adapting their products in order to have them accepted in target regions. I’m not talking about translations and number formats here: it’s about supporting all writing directions, accurate and detailed rendering of complex scripts and perfect fluency in generated messages that involve levels of plurality, gender, formality and style. And the open and collaborative nature of the efforts to document this information in the form of the Common Locale Data Repository is commendable.
I’ve not updated my blog for a couple of months because I’ve been binge learning and sailing the East and South Coasts of Ireland.
Despite several unsuccessful past attempts, due to lack of free time, to complete a Deep Learning course, I’m delighted to have now finished and passed the first course in deeplearning.ai’s Neural Networks and Deep Learning specialization on Coursera.
I blew most of my annual leave on cruising between Dún Laoghaire and Schull, Co. Cork: my justification being that the only way to improve my technique is to get out onto the sea. A great combination of learning and adventure.
Dolphins off of the South Coast of Ireland
Rounding the Fastnet Rock Lighthouse
It is well known that you can produce relatively good quality machine translations by doing the following:
- Carry out some processing on the source language.
Such as remove text which serves no purpose in the translations (say, imperial measurements in content destined for Europe); re-order some lengthy sentences; mark the boundaries of embedded tags, etc.
- Use custom domain trained machine translation engines.
This is possible with several machine translation providers. If you have an amount of good quality bilingual and monolingual corpora relevant to your subject matter then you can train and build engines which will produce higher quality output than a general public domain engine.
- Post process the raw machine translation output to correct recurrent errors.
To improve overall fluency; replace specific terminology, etc.
We decided to implement this in a fully automated Azure Functions pipeline.
NOTE: Some MT providers have this capability built into their services but we wanted the centralized flexibility to control the pre- and post-editing rules and to be able to mix and match which MT providers we get the translations from.
The pipeline consists of three functions: preedit, translate and postedit. The json payload used for inter-function communication is Jliff. Jliff is an open object graph serialization format specification being designed by an OASIS Technical Committee.
NOTE: Jliff is still in design phase but I’m impatient and it seemed like a good way to test the current snapshot of the format.
The whole thing is easily re-configured and re-deployed, and has all the advantages of an Azure consumption plan.
We can see that this pipeline would be a good candidate for durable functions so once we have time we’ll take a look at those.
Wow, June already. Time flies in the enjoyable world of translation and technology.
I embraced the cloud 6 years ago having evaluated the benefits of Platform and Software as a Service and believed in, what was then, a future vision of all kinds of intelligent distributed services which would be impossible to achieve with a private, internal infrastructure. It was interesting to see that light bulb flash on for non-cloud using attendees at Microsoft’s Red Shirt Dublin event with Scott Guthrie last week.
Scott took us on a whistle-stop tour of Azure facilities from Functions (a few lines of code executing logic on demand) to arrays of GPU’s running Deep Learning algorithms capable of doing face recognition and sentiment analysis.
Within the development team at work our utilization of such technologies continues: Neural Network Machine Translation; Adaptive Machine Translation; Continuous Integration; Distributed Services; and Serverless functions and logic.
At the Research end of the scale, having successfully completed our most recent European Project, I’ve been re-engaging with local research centers and interest groups. This month’s and last month’s Machine Learning Meetups were testament to how dominant Deep Learning is in driving business success and competitiveness.
And because working hard has to be balanced by playing hard I’ve ramped up sailing to three times a week.
The Cork 1720’s I go out in are just wonderful boats.
We started the year with some operationally complex, significant impact projects. Progress has been slower than I would have liked but ensuring we have a solid base upon which to build is critical to the overall success. My impatience is to realize some of the potential gains now but the collateral risk is too high. So, at the midpoint we are looking at a busy next two quarters to get everything we want done but the team is well capable.
Article on the success of the latest European Commission-funded Innovation Action that we participated in.