Content Interoperability

I am working on a project which is very familiar in the localization industry: moving content from the Content Management System (CMS) in which it is authored to a Translation Management System (TMS) in which it will be localized and then moved back to the CMS for publication.

These seemingly straight-forward scenarios often require far more effort than seems worthy. As the developer working on the interoperability you often have to have:

  • Knowledge of the CMS API and content model. (The content model being the representation which the article has inside the CMS and when exported.
  • Knowledge of the TMS API and the content formats that it is capable of parsing/filtering.

In this project the CMS is built on top of a “document database” and stores and exports content in JSON format.

One of the complexities is that rich text (text which includes formatting such as text emphasis – bold, italic – and embedded metadata such as hyperlinks and pointers to images) cause sentences to become fragmented when exported.

For example. the text:

“For more information refer to our User’s Guide or Community Forum.”

Becomes:

{
"content": [
{
"value": "For more information refer to our ",
"nodeType": "text"
},
{
"data": { "uri": "https://ficticious.com/help" },
"content": [{
"value": "User's Guide",
"nodeType": "text"
}],
"nodeType": "hyperlink"
},
{
"value": " or Community Forum.",
"nodeType": "text"
}],
"nodeType": "document"
}

If I simply let the TMS parse the JSON I know it will present the rich text sentence as three segments rather than one and it will be frustrating for translators to relocate the hyperlink within the overall sentence. Ironically, JLIFF suffers from the same problem.

What I need is a structured format that has the flexibility to enable me to express the sentence as a single string but also have the high fidelity to convert back without information loss. Luckily the industry has the XML Localization Interchange File Format (XLIFF).

I have three choices for programming the conversion, all of which are open source:

I wanted to exercise my own code a bit so I went with the third option.

JliffGraphTools contains a Jliff builder class and Xliff12 and xliff20 filter classes (for XLIFF 1.2 and 2.0 respectively). These event based classes allow a publish/subscribe interaction where elements in the XLIFF cause subscribing methods in the Jliff builder to be executed and thus create a JliffDocument.

I decided to use this pattern for the conversion of the above CMS’ JSON content model to XLIFF.

Slide1

It turns out that this approach wasn’t as straight-forward as anticipated but I’ll have to document that in another post.