I’m not big into singing oriented reality shows like The X-Factor or American Idol but I happened to catch an episode of The Voice UK 2016 and I was very entertained by two of the Battles: Jordan Gray vs Theo Llewellyn, and Chloe Castro vs Alaric Green. What added to the appeal was that the acts are young, they had never sung together before and that they met the standard of the songs as performed by the original artists and added something unique. Vocal mashups that kept my attention.
I’ve been spending a lot of time with XML content recently.
In one case I received content where there are XML elements which have translatable attributes. Within this translatable attribute text are custom placeholder tags which in turn have translatable attributes.
<cust-ele att="needs-translation" value="Here is text with [proprietary att='I also require translation' dnt="77yf990"] placeholders contained within."/>
Borrowing from XLIFF, my current approach is to markup like so:
<cust-ele att="needs-translation" value=""><trans-att id="765fe3">Here is text with <ph id="432ab">[proprietary att='<sub>I also require translation</sub> dnt="77yf990"]</ph> placeholders contained within.</trans-att></cust-ele>
Is anyone else seeing this madness?
As we continue to build out our distributed platform for translation, recently we had to direct some extra effort to the conversion, protection and validation of content as it passes through the increasingly automated translation pipeline. The need to give consideration to content characteristics like encoding and structure are not new but I believe now more than ever that unless content is protected it is vulnerable to being corrupted for its intended end purpose.
When I were a lad, content fell into categories: software, help, documentation, marketing… Today we have information that is destined to be re-used and re-purposed for many distribution channels and, supplied out of context, it is difficult to distinguish it from what is more like data.
Two recent examples that led us to once again think about the degree of access to, and types of permissible interactions with, our textual celebrity are: a bundled collection of multi-platform user interface strings, and some chunks of product marketing material. These two examples encompassed several characteristics (some prominent, some obscured) which I thought worthy of capture below:
- The same content chunk (file) may contain strings which will ultimately be integrated back into different technology platforms or content repositories. Considerations here are: character literals versus entities; breaking and non-breaking spaces; and line breaks (UNIX and Mac versus PC).
- Parts of the content structure or tagging may signal that the content is to be merged or tagging expanded into presentational elements of the final content rendering. Tags can come in a limitless number of forms.
- Whilst being human readable, the content may be parsed and consumed by algorithms to greater or lesser effect depending upon particular word usage, phrasing or constraints. Seemingly normal words can be magic strings; cause noise; be invalid or unwanted.
None of the above are insurmountable. We’ve had digital bubble wrap in the form of XLIFF for many years now and it’s second incarnation, XLIFF 2.0, promises further benefits. The awareness that we’ve encoded into our production components also ensures that the textual VIP’s entrusted to us are not inconvenienced and arrive safely at their destination to give an ultimate performance.