Tag Archives: Linked Data

Deep Content on Tour

I have just returned from presenting Deep Content on both coasts of North America at the TAUS Annual Conference in Portland, Oregon and the 32nd Localization World in Montréal, Canada.

mepresenting  deepcontent

Deep Content is the combination of natural language processing (NLP) tools and Linked Data. Services such as terminology spotting, Named Entity Recognition (NER) and machine translation can consume and produce data in a common protocol called Natural Language Processing Interchange Format (NIF). A digital text document is sent to each of the services either individually or sequentially in a pipeline. Entities identified by the various services are passed to a graph query service which searches for related information. Finally all of this data is used to enrich the document.

Deep Content uses open standards and enriched content can be serialized as valid HTML 5 and made available as any other page on the web.

We are currently running some beta pilot projects with customers and I’ll post on their results soon. If you’d like to know more leave a comment.

What did it all mean?

I gave two presentations at the SEMANTiCS 2016 conference in Leipzig last week. Both were related to the H2020 FREME Project that I have been a participant of. The first was on the e-Internationalization service which we have contributed to significantly. The second (containing contributions from Felix Sasaki) was on the use of standards (de-facto and ratified) within the FREME e-services in general and our business case implementation in particular.

This was my third attendance at the conference and it once again contained interesting and inspiring presentations, ideas and use cases around linked data.

I sometimes return from these types of conferences, full of innovation and enthusiasm for applying new ideas, to the day-to-day operations of work and become discouraged by the inertia for change and the race to the bottom in terms of price. It is almost impossible to innovate in such an atmosphere. We have looked at the application of machine learning, text classification and various natural language processing algorithms and whilst people may acknowledge that the ideas are good, no-one wants to pilot or evaluate them let alone pay for them.

Any how, I remain inspired by the fields of NLP, Linked-Data, Deep Learning and Semantic networks and may be my day will come.

Disciple of Semantics and Linked Data

A little over two years ago I got to hear about, and had my curiosity piqued by a project being undertaken at the University of Sapienza in Rome. I was definitely interested and excited by the project goal. And so I started my journey of discovery and belief in the power of relationships between data items in the Internet of Things. As someone who frequents the commercial world it can be hard to convince colleagues as to the potential of ambitious ideas. But I was determined.

Two years ago this month I met Roberto Navigli in Athens and learnt about BabelNet. At that time, as I recall, he and his team were starting work on Babelfy. Listening to Roberto explain his vision had me hooked and since that time I’ve been a fan.

Then in September of 2014 I attended the MLODE Hackathon in Leipzig. During that event I got the chance to play with the BabelNet API and get a hands-on feel for what was possible using the resource. This event cemented a number of concepts for me and fuelled my imagination and enthusiasm such that soon afterwards I became a partner in the FREME Project. I would say my status at this point was devotee of semantics and linked data.

Today I returned from Luxembourg where I attended the BabelNet Workshop. This was one of the most interesting, stimulating and well run (wifi problems apart) events I have ever attended. The presentations were interesting, logically arranged, clear, had great support materials and follow-along exercises. Roberto himself is a pleasure to listen to. Varied examples that illustrate his points flow like water from his mind.

And so my pilgrimage to disciple of semantics and multilingual linked data is complete. I have renewed energy and desire to utilize, and contribute to, what is, in my opinion one of the most fascinating resources for people working in the fields of linguistics, computer science and computational linguistics in the world.

As one of my engineers puts the finishing touches to a beta Ocelot plug-in which performs semantic enrichment of content as it is being translated I have been able to secure sufficient commercial backing to hire a computer science intern with knowledge and qualifications in linked data and semantic concepts.


On Wednesday 16 and Thursday 17 I attended the SEMANTiCS 2015 conference in Vienna. I attended in order to present a poster for our FREME Project, to demonstrate the Ocelot based application that we have built on top of the FREME services and to catch up on the state of the art from the thought leaders in this space.

It was an enlightening conference with great presentations from large international companies, like Yahoo!, as well as research and public organizations.

Several presentations mentioned Schema.org as being the primary semantic vocabulary underpinning their technology. There was also a poster presented by researchers from the Semantic Technology Institute at the University of Innsbruck on the usage of Schema.org by hotels.

Whilst I didn’t get to talk to anyone who would be a natural customer of FREME, I left the conference with a strong feeling that the FREME e-Services, in helping to produce semantically rich digital content, would definitely serve the needs of the Linked Open Data Cloud and new technologies and services that will inevitably be built on top of it.

I reached this conclusion after listening to these presentations:

  • Complex Event Extraction from Real-Time News Streams, Alexandra La Fleur, Freie Universität, Berlin
  • When RDF alone is not enough – triples, documents, and data in combination, Stephen Buxton, MarkLogic
  • Semantic Search at Yahoo!, Peter Mika, Yahoo!
  • Evolution of Semantic Technologies in Scientific Libraries, Prof. Dr. Klaus Tochtermann, Leibniz Information Centre for Economics

All in all, an interesting and productive trip.



The web site for our new European Commission funded Horizon 2020 project went live on 2015-03-27. I’m very excited about this project. It encompasses many important current topics: Big Linguistic Linked Data; The Semantic Web; NLP Technologies; Linguistic Linked Data Interoperability and Intelligent and Enriched Content.

My goals for the project include new features for our open sourced editor, Ocelot. The planned features will further integrate it with other linguistic technologies and standards, not least the Semantic Web and Linked Linguistic Data Clouds themselves.

Having missed the project kick-off in Berlin in February, I’m looking forward to meeting all of the world-class academic and industry partners.


My crazy idea for NIF

I was recently invited to join a LIDER call to talk about my Use Case idea’s for NIF. Here’s what the is:


We often have to provide translations for content which is non-literal but rather more metaphorical/idiomatic. For example, “My destination is only a ‘hop-and-a-skip’ from my home.” might get translated as “Mein Ziel ist nur ein ‘Katzen-sprung’ von meinem Zuhause.”.

Describing in NIF with relationship

I suggest that you could model this translation as:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix vt: <http://www.vistatec.com/rdf> .

a nif:Context ;
a nif:RFC5147String ;
nif:beginIndex "0" ;
nif:endIndex "178" ;
nif:isString "My destination is only a 'hop-and-a-skip' from my home." .

nif:beginIndex "26" ;
nif:endIndex "40" ;
a nif:RFC5147String ;
itsrdf:hasLocQualityIssue [
a itsrdf:LocQualityIssue ;
itsrdf:locQualityIssueType "uncategorized" ;
nif:referenceContext <http://example.com/exampledoc.html#char=0,178>.,/code>

a nif:Context ;
a nif:RFC5147String ;
nif:beginIndex "0" ;
nif:endIndex "178" ;
nif:isString "Mein Ziel ist nur ein 'Katzen-sprung' von meinem Zuhause." .

nif:beginIndex "23" ;
nif:endIndex "36" ;
a nif:RFC5147String ;
itsrdf:hasLocQualityIssue [
a itsrdf:LocQualityIssue ;
itsrdf:locQualityIssueType "uncategorized" ;
nif:referenceContext <http://example.com/exampledoc-de.html#char=0,57>.

vt:translatedAs <http://example.com/exampledoc-de.html#char=23,36>.

Surely then this model could be extended to give a comprehensive representation of such non-literal translations in a way that NLP tools could consume.


New Year, New Project

Our press release says it all:

Today VistaTEC enthusiastically announced it will be an industrial participant in a second substantial and significant European Commission‑funded Horizon 2020 project. The €3.2 million, two year project entitled “Open Framework of E‑services for Multilingual and Semantic Enrichment of Digital Content” (FREME), will see VistaTEC collaborating on the design and implementation of a commercial‑grade, web‑accessible, linguistic e‑services platform. The framework will utilize Big Linguistic Open and Linked Data to deliver valuable multilingual resources upon which a range of e‑services can be built. These services cover use‑cases which span the digital content life‑cycle: authoring, translation, curation, publishing and discovery, in addition to bringing some leading‑edge content technologies and data models to market.

“This is a key project for my team.” said Phil Ritchie, VistaTEC’s Chief Technology Officer. “The services that will be delivered during the life of this project will provide us with unique and novel paradigms for the way in which we produce multilingual content for our customers.”

The project team includes partners such as the German Research Centre for Artificial Intelligence (DFKI), the Institute for Applied Informatics (InfAI), and Tilde all fresh from the well‑publicized success of the Multilingual Web – Language Technologies project.

Ritchie concluded the announcement saying: “VistaTEC continues to strive to harness disruptive innovations and apply them in unique ways. I’m very excited to be part of such an experienced and knowledgeable consortium which has considerable potential to deliver technological and economic value to the language industry.”


Linking Data in the Meditteranean

I am just back from the European Data Forum and the Linked Data for Language Technologies Workshop. The co-location of these two events meant that many of the leaders in linked data and the digital representation of linguistic and knowledge concepts were in one place.

The presentation that stood out for me at the EDF was the second day keynote by Ralf-Peter Schaefer of TomTom. Seeing how they use their 9 trillion and counting data points to find patterns in, and make predictions for, traffic conditions was very interesting.

The LD4LT Workshop was very productive despite the virtually non-existent free, and totally non-existent pay-as-you-go, wi-fi connectivity for the whole duration of the two events. It definitely brought home to me the importance of connectivity these days. I convinced myself to leave my mi-fi at home – will not do so in future.

Presentations I took note of during LD4LT were about webLyzard and Rozeta.

I presented my three industry challenges and my hopes for how linguistically motivated linked data might help me solve them.

I used the travel time to learn about the AngularJS project. Pretty impressive stuff. I particularly liked the way the framework handles the binding and referencing of data within the rendered HTML UI without explicitly needing to use data-* attributes to store object id’s.

I was hoping to be able to document trials I have been doing with DITA and XLIFF+ITS using the XLIFF-DITA Roundtrip toolkit but I got stuck at the last hurdle and being able to figure out the issue would require my having a deeper understanding of the toolkit.

We finished our assessment of supporting XLIFF 2.0 in Ocelot. It currently looks as though we will introduce a layer of abstract data model façade objects between the Okapi XLIFF filter classes and Ocelot.

I must dedicate some time to my H2020 proposal.

I compensated for my lack of my regular 2 kilometre daily walk with a day long urban hike around Athens taking in the Acropolis (naturally), it’s museum (I’d like to point out that Thomas Bruce, 7th Earl of Elgin was Scottish) and the chapel of St. George at the summit of Lycabettus Hill which offers truly stunning 360 degree views over Athens as far as the Aegean Sea. Unfortunately I didn’t have time to visit Kastella Hill near Pireaus.

A Week Out West

I’m fortunate that my job gives me the opportunity to travel. Last week it was California. The week started well with a positive customer meeting and the arrival of a new employee in our Mountain View office.

Over the course of a number of years with one of our customers we have had a great opportunity to automate and integrate a significant number of business processes. Like ourselves, our customer thrives on and enjoys continuously reinventing, iterating and improving tools. (I’m reluctant to use the word “innovate” as it’s becoming over-used but the term would certainly describe what both of us do regularly.) The exciting possibility that came out of last week’s conversations with their scarily bright engineering team is for us to build a truly scalable, cloud hosted, service bus based business rules engine using data regularly polled from their web services API endpoints.
In addition to existing business related discussions I was also able to utilise the trip to evangelise my more research-based interests and present and get early feedback on new products on the horizon such as ITS 2.0, Linked Data, Review Sentinel and Reviewer’s Workbench.
The one potentially tedious aspect of business travel is the actual relocation of your body geographically. I always prepare well to combat the boredom that journeys – aided by delays – can bring. Tooled up with Kindle, iPad and paperback’s (for use during takeoff and landing) I used the time to catch up (somewhat belatedly) on Breeze, Moderizr, Require, Knockout, Font Awesome and Bootstrap all courtesy of John Papa’s Pluralsight course.
The week also provided the chance to catch up with one of our outsource development partners, Spartan Software in person. Google Hangout doesn’t yet replace the experience of enjoying a beer together. Spartan have been building Reviewer’s Workbench for us. Reviewer’s Workbench is our implementation of the W3C Multilingual Web Internationalization Tag Set 2.0