Serializing and Deserializing JLIFF

I’ve been having all kinds of fun saving text (json) representations of translation units (pairs of source and target language strings), sending them from one cloud based service to another and then rebuilding the in-memory object representations from the text representation.

I know that any software engineer will be yawning about now because libraries for doing this kind of thing have existed for a long time. However, it’s been fun for me partly because I’m doing it inside the new Azure Function service, and because some of the objects have abstract relationships (interfaces and sub-classes) introducing some subtleties to getting this to work which took a lot of research to implement.

It relates to the work of the OASIS OMOS TC whose evolving schema for what has been dubbed JLIFF can be seen on GitHub.

The two parts of the object graph requiring the special handling are the array containing the Segment‘s and Ignorable‘s (which implement the ISubUnit interface in my implementation), and the array containing the text and inline markup elements of the Source and Target containers (which implement the IElement interface and subclass AbstractElement in my implementation).

When deserializing the components of these arrays each needs a class which derives from Newtonsoft.Json.JsonConverter.

namespace JliffModel
{
    using System;
    using Newtonsoft.Json;
    using Newtonsoft.Json.Linq;

    public class ISubUnitConverter : JsonConverter
    {
        public override bool CanConvert(Type objectType)
        {
            var canConvert = false;

            if (objectType.Name.Equals("ISubUnit")) canConvert = true;

            return canConvert;
        }

        public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
        {
            var jobject = JObject.Load(reader);

            object resolvedType = null;

            if (jobject["type"].Value().Equals("segment")) resolvedType = new Segment();

            serializer.Populate(jobject.CreateReader(), resolvedType);

            return resolvedType;
        }

        public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
        {
            writer.WriteValue(value.ToString());
        }
    }
}

Then the classes derived from JsonConverter are passed into the Deserialize method.

    Fragment modelin = JsonConvert.DeserializeObject<Fragment>(output,
        new ISubUnitConverter(),
        new IElementConverter());

Polymath Service Provider

Over the Christmas break I started to reflect on the nature of service provision in the Language Services industry in the light of new technologies coming out of machine learning and artificial intelligence advances and my own predictions of the influences upon the industry and the industry’s response to them.

There are the recent announcements of adaptive and neural network machine translation; pervasive cloud platforms with ubiquitous connectivity and cognitive capabilities; an upsurge in low-cost, high-benefit open source tooling and frameworks; and many mature api’s and standards.

All of these sophisticated opportunities really do mean that as a company providing services you have to be informed, adaptable, and agile; employ clever, enthusiastic people; and derive joy and satisfaction from harnessing disruptive influences to the benefit of yourselves and your customers.

I do have concerns: How do we sustain the level of investment necessary to stay abreast of all these influences and produce novel services and solutions from them in an environment of very small margins and low tolerance to increased or additional costs?

Don’t get me wrong though. Having spent the last 10 years engaging with world-class research centers such as ADAPT, working alongside thought leading academics and institutions such as DFKI and InfAI, participating in European level Innovation Actions and Projects, and generally ensuring that our company has the required awareness, understanding and expertise, I continue to be positive and enthusiastic in my approach to these challenges.

I am satisfied that we are active in all of the spaces that industry analysts see as being currently significant. To whit: ongoing evaluations of adaptive translation environments and NMT, agile platforms powered by distributed services and serverless architectures, Deep Content (semantic enrichment and NLP), and Review Sentinel (machine learning and text classification).

Less I sound complacent, we have much more in the pipeline and my talented and knowledgeable colleagues are excited for the future.

XLIFF Over the Wire

One of the Technical Committees that I participate in is the OASIS XLIFF OMOS TC. This group is currently working on a json serialization of XLIFF. This fits nicely with our platform of distributed services and providing a standardized, structured format that these services can consume. I’m pleased that the committee members are aligned on the POLA and are working towards an API which is consistent and natural.

One of the use cases could be the simple and fast translation editor which I’ve been amusing myself with, shown below in horizontal layout.

translation-editor-horizontal

 

mojito

I took a look at Box’s mojito recently. I really like it.

Mature, full-featured translation management systems are large, monolithic applications that can sometimes be difficult to navigate and learn. It is refreshing to see something simple and lightweight. Yes, mojito’s feature set is limited but if you’re a start-up looking for a way to turnaround translations online and have limited technology budget, mojito is worth a look.

It was easy to install comprising of two jar files: one for the web application (includes an in-memory database and embedded web server for trial purposes: and one for the client command line interface.

 

Angular2 Editor

I wanted to put all of my Angular2 learning into practice so I built a translation editor.

It’s just a prototype at this stage but it has basic inline tag handling and protection and change tracking. The interface is somewhat Ocelot-like.

trans-editor

I hope to add more functionality and use it as a test-bed for using jliff as a backend transport format.

 

A Prime Year

So 2017! Let’s hope you turn out to be a good one.

I guess traditionally I should be using this post to make my predictions about the industry and technologies I’m engaged in to demonstrate thought leadership. The truth is I think I’m going to be arrogant and let the industry catch up a bit first with all of the innovations I have spent the last two years working on. Sure, I have forward looking plans based on what I think will be prevalent trends and requirements in the year ahead but sometimes you have to live in the moment and execute on what is imminent.

My team will be busy through Q1 with the migration of a large part of our operations to Plunet. Then we have the ramping up of a major new account that we were awarded last year.

Development has re-started slowly it has to be said with some small tasks that were started before Christmas taking an annoyingly long time to get finished. Or maybe I’m getting increasingly impatient.

Q1 will almost certainly see an update to Ocelot. We are trying to simplify and expand the configuration of plug-ins and the user interface so that you launch with windows and tools in a state ready to go.

I’ve resumed the Angular 2 and F# learning I started last year. Using Angular I’m writing a web-based string translation editor component which I hope will be simple in operation and blisteringly fast. With F# I started by writing some basic NLP utilities such as string tokenization and n-gram generation. I would like to try and re-write some of my Review Sentinel machine learning algorithms but I’d be surprised if I get all of that done this year.

So I’m going to finish this post by wishing you all and your families a safe, enjoyable, productive, and happy year.

Beginnings

I get a lot of satisfaction from joining new groups and communities. It is an explicit signal of a group’s desire to progress in some way and achieve something. I like that and being a founding member carries some kudos. I have been a founding participant of the Centre for Next Generation Localization (CNGL), now the ADAPT Centre; the Best Practices for Multilingual Linked Open Data Community; and the RDF and XML Interoperability Community.

I am happy that circumstances meant that yesterday I was able to help launch the Think Global Forum Technology Event in San Francisco. When events like these can bring together executives from the likes of NetApp, VMware, LinkedIn and GoPro the result is always direct, honest and collaborative.

I admit to shamelessly taking the opportunity introduce attendees to Deep Content, a project that myself and some of my team have spent the last 18 months on. I believe that Deep Content is a key enabler of some of the more ambitious goals of Content 4.0.

I wish the forum every success and hope to contribute disruptive ideas in coming meetings.

 

Succession and Invention

Congratulations to Melanie Howes who was appointed to Vistatec’s DevOps Manager on 21st November and for her sins took over the day to day management of our Applied Technology Group. This is another significant step in Vistatec’s on-going reorganization aimed at enabling further stable growth, continued operational excellence and faster execution of strategic initiatives.

Having handed over operational responsibility for the highly successful group that I have grown over the last few years I would like to assert that the rumours of me being put out to stud or dispatched to the knackers yard are unfounded. You can take it on authority that I’ll be dedicating the additional bandwidth to increasing the velocity with which we operationalize more of our skunk works projects. Those of you that know me will already know the areas that I’m actively researching and evaluating but I also hope to keep a few surprises up my sleeve.

Next week I will continue my evangelism of Deep Content. This week I spent a couple of days in Bonn presenting on one of the services that underpins Deep Content: an internationalization round-tripping service which enables HTML5 to be machine translated and semantically enriched in-place without any special preparation.