Article on the success of the latest European Commission-funded Innovation Action that we participated in.
Here’s an update on Deep Content.
I have just returned from presenting Deep Content on both coasts of North America at the TAUS Annual Conference in Portland, Oregon and the 32nd Localization World in Montréal, Canada.
Deep Content is the combination of natural language processing (NLP) tools and Linked Data. Services such as terminology spotting, Named Entity Recognition (NER) and machine translation can consume and produce data in a common protocol called Natural Language Processing Interchange Format (NIF). A digital text document is sent to each of the services either individually or sequentially in a pipeline. Entities identified by the various services are passed to a graph query service which searches for related information. Finally all of this data is used to enrich the document.
Deep Content uses open standards and enriched content can be serialized as valid HTML 5 and made available as any other page on the web.
We are currently running some beta pilot projects with customers and I’ll post on their results soon. If you’d like to know more leave a comment.
I gave two presentations at the SEMANTiCS 2016 conference in Leipzig last week. Both were related to the H2020 FREME Project that I have been a participant of. The first was on the e-Internationalization service which we have contributed to significantly. The second (containing contributions from Felix Sasaki) was on the use of standards (de-facto and ratified) within the FREME e-services in general and our business case implementation in particular.
— Vistatec (@VistatecGlobal) September 14, 2016
This was my third attendance at the conference and it once again contained interesting and inspiring presentations, ideas and use cases around linked data.
I sometimes return from these types of conferences, full of innovation and enthusiasm for applying new ideas, to the day-to-day operations of work and become discouraged by the inertia for change and the race to the bottom in terms of price. It is almost impossible to innovate in such an atmosphere. We have looked at the application of machine learning, text classification and various natural language processing algorithms and whilst people may acknowledge that the ideas are good, no-one wants to pilot or evaluate them let alone pay for them.
Any how, I remain inspired by the fields of NLP, Linked-Data, Deep Learning and Semantic networks and may be my day will come.
We genuinely surprised attendees at Localization World 31 today with our announcement of our new “Deep Content” service.
I am exhausted from writing, talking and demonstrating about it so I’m just going to point to Vistatec’s Press Release and a good review posted by independent industry analysts, Common Sense Advisory.
A little over two years ago I got to hear about, and had my curiosity piqued by a project being undertaken at the University of Sapienza in Rome. I was definitely interested and excited by the project goal. And so I started my journey of discovery and belief in the power of relationships between data items in the Internet of Things. As someone who frequents the commercial world it can be hard to convince colleagues as to the potential of ambitious ideas. But I was determined.
Two years ago this month I met Roberto Navigli in Athens and learnt about BabelNet. At that time, as I recall, he and his team were starting work on Babelfy. Listening to Roberto explain his vision had me hooked and since that time I’ve been a fan.
Then in September of 2014 I attended the MLODE Hackathon in Leipzig. During that event I got the chance to play with the BabelNet API and get a hands-on feel for what was possible using the resource. This event cemented a number of concepts for me and fuelled my imagination and enthusiasm such that soon afterwards I became a partner in the FREME Project. I would say my status at this point was devotee of semantics and linked data.
Today I returned from Luxembourg where I attended the BabelNet Workshop. This was one of the most interesting, stimulating and well run (wifi problems apart) events I have ever attended. The presentations were interesting, logically arranged, clear, had great support materials and follow-along exercises. Roberto himself is a pleasure to listen to. Varied examples that illustrate his points flow like water from his mind.
And so my pilgrimage to disciple of semantics and multilingual linked data is complete. I have renewed energy and desire to utilize, and contribute to, what is, in my opinion one of the most fascinating resources for people working in the fields of linguistics, computer science and computational linguistics in the world.
As one of my engineers puts the finishing touches to a beta Ocelot plug-in which performs semantic enrichment of content as it is being translated I have been able to secure sufficient commercial backing to hire a computer science intern with knowledge and qualifications in linked data and semantic concepts.
On Wednesday 16 and Thursday 17 I attended the SEMANTiCS 2015 conference in Vienna. I attended in order to present a poster for our FREME Project, to demonstrate the Ocelot based application that we have built on top of the FREME services and to catch up on the state of the art from the thought leaders in this space.
It was an enlightening conference with great presentations from large international companies, like Yahoo!, as well as research and public organizations.
Several presentations mentioned Schema.org as being the primary semantic vocabulary underpinning their technology. There was also a poster presented by researchers from the Semantic Technology Institute at the University of Innsbruck on the usage of Schema.org by hotels.
Whilst I didn’t get to talk to anyone who would be a natural customer of FREME, I left the conference with a strong feeling that the FREME e-Services, in helping to produce semantically rich digital content, would definitely serve the needs of the Linked Open Data Cloud and new technologies and services that will inevitably be built on top of it.
I reached this conclusion after listening to these presentations:
- Complex Event Extraction from Real-Time News Streams, Alexandra La Fleur, Freie Universität, Berlin
- When RDF alone is not enough – triples, documents, and data in combination, Stephen Buxton, MarkLogic
- Semantic Search at Yahoo!, Peter Mika, Yahoo!
- Evolution of Semantic Technologies in Scientific Libraries, Prof. Dr. Klaus Tochtermann, Leibniz Information Centre for Economics
All in all, an interesting and productive trip.
I gave two important demonstrations this week to senior management:
- Phase one of our distributed production platform which uses many enterprise integration architecture patterns
- Using the semantic enrichment facilities of the FREME e-services from a proprietary plug-in to Ocelot that we built using its plug-in API.
The distributed platform demonstration went well and showed the potential of the architecture:
- Configurable routes from one micro-service to another
- Fault tolerance
- Composability and reuse.
What I particularly like about this architecture is that we can incorporate discrete processes with blocks of translation management system workflow. For example, we can transform assets from one format to another, carry out validation, pre-edit, post-edit, inject, and generally modify and optimise every aspect of the production process.
The Ocelot presentation went better than I even anticipated in that it captured the imagination of two of the attending senior managers: our Vice President of Global Sales commenting that he thought it would open up opportunities to speak to new departments and roles and within organisations who in turn could influence localization stakeholders and buyers.
I’ll be giving both presentations again next week to a customer and the collaborators and Project Officer of the FREME consortium.
I will be presenting at FEISGILLT 2014 in Dublin on the subject of VistaTEC’s work with RDF and Semantic Networks. In my presentation I want to have a series of images which illustrates the build up of relationships between entities.
As with my approach to development, I tend to start with an idea and iterate and re-factor towards the finished article. Given that I want a series of images that I can display as a crude stop-motion animation, I need a quick and consistent way of regenerating the sequence after making an alteration. From my early days in architecture and computer graphics I know that it is sometimes better to start from the end – with like a KeyFrame – and work backwards.
A graphing tool that I have come to like a lot is Graphviz. It has a very simple text based language for defining the graph. It has its own layout engine so as you add nodes to the graph it dynamically alters the positioning and layout of those nodes in the rendering. The effect in my animation that I want is not to have the position of the nodes change, rather I want new nodes to fade into the graph as if they were there all the time but just hidden.
I started with the end result – that is, the complete graph – with all of its final rendering and labeling. Next I generated the start slide. This was a copy of the definition of the complete graph but with all the nodes I want hidden simply drawn in white on white.
Generation of the intermediate slides is a case of taking the definition for the starting slide and gradually replacing the white on white node definitions with those from the final graph definition – their final visible rendered colour.
For this I used my favourite comparison tool, Beyond Compare. Beyond Compare shows differences at document and line level and has simple ways of moving changes between files. Having generated my series of image definitions, I simply created a batch file to execute Graphviz’s DOT program to generate the images previewed below.
January has been wonderfully varied on the work front. Most days have brought new learning.
In thinking about my R&D Manifesto I decided it was time to revisit Neural Networks and Semantics. I’m not adverse to learning new development languages and environments but when you want to evaluate ideas quickly one will tend towards the familiar. For this initial reason Encog looks interesting.
The Centre for Global Intelligent Content, which VistaTEC have been industry partners of since its inception, received funding for a further two and a half years in October of last year. As a consequence there have been numerous meetings. It’s really exciting to see how the centre has evolved and honed its process for bringing innovation to the market. A key element of this process is the d.lab under the direction of Steve Gotz. In my view Steve has been one of the notable personalities in CNGL. He has a great broad knowledge of the technology, innovation and start-up landscapes and excellent business acumen. Two interesting pieces of technology were shown to centre members recently. The first named MTMPrime is a component which in real-time can assess translation memory matches along side of machine translation output and based on confidences recommend which one to use. The second is a machine translation incremental learning component which can profile a document and suggest the most efficient path to translating it given the algorithm’s analysis of the incremental benefit that would be realized from translating segments in a particular order. Basically it works out the bang-for-buck for translating segments.
In discussing semantics and disambiguation Steve pointed me at Open Calais. This is a service which like Enrycher parses content and automatically adds semantic metadata for named entities and subjects that it “recognizes”. The picture below shows the result of assign this post through the Open Calais Viewer.
We’ve had some very interesting customer inquiries too. Too early to talk about them but I hope that we get more requests for these types of engagements and services. If any come to fruition I’ll blog about them later.
Finally, we did some small updates to Ocelot:
- New configuration file for plug-ins,
- Native launch experience for Windows and Mac, and
- Native hot-key experiences within Windows and Mac interfaces.
Long may this variety continue.