Tag Archives: TAUS

Deep Content on Tour

I have just returned from presenting Deep Content on both coasts of North America at the TAUS Annual Conference in Portland, Oregon and the 32nd Localization World in Montréal, Canada.

mepresenting  deepcontent

Deep Content is the combination of natural language processing (NLP) tools and Linked Data. Services such as terminology spotting, Named Entity Recognition (NER) and machine translation can consume and produce data in a common protocol called Natural Language Processing Interchange Format (NIF). A digital text document is sent to each of the services either individually or sequentially in a pipeline. Entities identified by the various services are passed to a graph query service which searches for related information. Finally all of this data is used to enrich the document.

Deep Content uses open standards and enriched content can be serialized as valid HTML 5 and made available as any other page on the web.

We are currently running some beta pilot projects with customers and I’ll post on their results soon. If you’d like to know more leave a comment.

Language Industry Leaders Meet In Portland

On Monday 14th and Tuesday 15th of October I attended the TAUS Annual Conference in Portland. TAUS has grown from a small gathering of translation buyers into a 140 member strong group of translation industry thought leaders. I like TAUS because of this: not only attended by people who talk about translation innovation but also have the authority within their organizations to implement it.

TAUS’s current strands of research are Data (shared data for machine translation training), Technology (machine translation, workflow), Interoperability (Linport) and, Metrics (Dynamic Quality Framework).

The Keynote speaker was Genevieve Bell of Intel.

Her feisty, confident and entertaining presentation covered many anthropological topics and summed up with six dichotomies:

  • Consumption and Creation. Interesting notion of binge consumption.
  • Multi-tasking and “In the Flow”. Applications knowing about aspects of the user and being able to act upon them: e.g. disable phone ring when praying. Understand what’s going on at a particular time. Know if a person can be disturbed. Humans know when they can interrupt each other.
  • Persistence and Disappearance.
  • Collective and Fragmented.
  • Conversion and Accumulation.
  • Tailored and Surprised. Rather than have analytic’s prompt you to do something familiar, prompts you about something you may be interested/surprised (in a nice way) about. Currently things that machines “predict” can leave users feeling “freaked out” or “uncomfortable”.

First time hearing under 25’s being called “Millennials”.

Human machine interaction. No more command and control but more human computer relationships based on listening and negotiation.

Difference between text and touch is that touch has not received a lot of television imagery. So expectations are lower. Translation expected to be 99.1% accurate. Touch expect to be 70%.

Genevieve’s talk was followed by a series of short presentations by a number of speakers on the topic of Content Strategies for companies born in the last and this century. Snippets:

Kenneth Klein, OmniLingua: Hot air/Cold air collisions. Result: severe weather.
V = (q x s) / (c x t). Buyers determine value. Shift activities to those who provide most value.

Andrew Bredenkamp, Acrolinx: “Transactional Content”. Do something with the content. Readability, liveliness, scanability.

Jessica Roland, Gengo: Different content types need different strategies.

Stuff is hoarded / Fluff is shared. Scarcity versus abundance.

Andrejz Zydron: Industry leverages academic research: technology and standards.

The afternoon session was definitely a rare occasion to witness (and kudos to TAUS for arranging it): A panel entitled “Criss-cross the globe serving six billion customers all at once.” comprising Alolita Sharma, Wikipedia; Diane Wagner, Microsoft; Francis Tsang, Adobe; Iris Orris, Facebook; Jack Boyce, Google; and Karen Combe, PTC.


Alolita Sharma: Wikipedia today supports 287 languages. Wikipedia builds sophisticated tools to support the variety of languages and then releases them as open source.

Microsoft: 40 commercial 65 non-commercial
Google: 60 core (95% of world population) also looking at physical and infrastructural issues.
Facebook: 80 available to anyone 100 by subscription
Adobe: mission to create interesting content. 25-30 language production languages.

Jack Boyce, Google: Sustained long term engagement from community except in isolated instances.

Iris Orris, Facebook: Communities come to Facebook and ask for help with infrastructure and tools.

Google uses the number of pages that Wikipedia has in a particular language as one of its ROI data points.

Reading, writing, tagging, searching: all have to be available as a native experience.

Human Computation. Carnegie Melon University.

Day Two presentations that caught my attention:

Spoken Translation Mark Seligman
Speech to speech translation.

Great conference. Input buffers full.


Two Presentations in the Valley

Last week I gave two presentations in Silicon Valley: “Okapi Ocelot – Okapi’s New Editor” at the Localization World Using Standards to Improve Workflow pre-conference day and “Using Open Standards to Automate Quality Management” at the TAUS Translation Quality Evaluation Summit.

lwsv2013      063

I didn’t get to many of the Localization World Conference sessions but did have plenty of interesting conversations with industry friends.