Tag Archives: Okapi

Ocelot 2.0

I am pleased and excited to announce the release of Ocelot 2.0 available as source code and binaries. Special thanks go to Kevin Lew, Marta Borriello and Chase Tingley who were the main engineers of this release.

The new features are:

  1. Support for XLIFF 2.0 Core;
  2. Translation Memory and Concordance lookup;
  3. Save XLIFF document as TMX;
  4. Language Quality Issue Grid;
  5. Set the fonts used for source and target content;
  6. Support for notes in XLIFF 2.0
  7. Serialization of Original Target in XLIFF 2.0 using the Change Tracking Module.

v2Overview

XLIFF 2.0

Ocelot now supports XLIFF 2.0 documents. It still supports XLIFF 1.2 documents and auto-detects the XLIFF version of the document being opened.

Translation Memory and Concordance lookup

A new window above the main editing grid now displays two tabs for Translation Memory lookup matches and Concordance Search results. If it is not visible then clicking on the splitter bar just below the font selection controls under the menu bar should reveal them.

Ocelot works with Translation Memory Exchange (TMX) files. The View->Configure TM menu option opens the TM Configuration dialog where you can specify which translation memories you want to use (in a fallback sequence), the penalty if any to be applied to the TM, the maximum number of results to display, and the minimum match threshold which matches must satisfy in order to be shown.

We have also added the ability to save the document as TMX.

Language Quality Issue Grid

Adding Internationalization Tag Set 2.0 Language Quality Issue metadata, even using the “Quick Add” mechanism of release 1 that could be configured in the rule.properties file, has been cumbersome. The LQI Grid reduces this to a one-click or one-hot key operation (excluding any comments you want to add).

The grid is customizable graphically allowing a matrix of issue severities (columns) and a user defined selection of types (rows) to be configured along with hot keys for every combination. Clicking any cell in the grid or typing its associated hotkey sequence will add the required Localization Quality Issue metadata. For example, clicking the cell at the intersection of a “Major” error severity column and “style” error category row will add an <its:locQualityIssue locQualityIssueSeverity="..." locQualityIssueType="style" /> entry to the relevant segment.

Source and Target Fonts

Just below the menu bar are two comboboxes which allow you to set the font family and size to be used for the source and target content.

XLIFF 2.0 Notes

On the View menu, the Configure Columns option allows you to display a Notes column. Text entered into cells in this column will be serialized as XLIFF <note /> elements.

Serialization of Original Target

Ocelot now captures and serializes the original target text if modified as a tracked change using the XLIFF 2.0 Change Track module. One limitation here, which we hope to address as part of XLIFF 2.1 is that only the text (and no inline markup) is saved.

I hope that these enhancements are useful.

XLIFF Is Dead, Long Live XLIFF

XLIFF 2.0 was approved as an OASIS Standard on August 6, 2014.

XLIFF 2.0 aims to address much of the feedback and criticisms of XLIFF 1.2 and has a new modular architecture which has a mandatory Core and optional modules which can be developed as independent projects.

Okapi has released a library called the XLIFF 2.0 Toolkit which has no dependencies upon 3rd party XML Parsers and provides an easy to use set of classes and methods for reading, modifying and writing XLIFF 2.0 files.

I’m really looking forward to working with XLIFF 2.0. As an advocate of Intelligent Content and distributed workflows and services for localization, I am keen to use the Metadata module in particular.

We plan to start work on building support for XLIFF 2.0 into Ocelot at the beginning of September.

Review and Post-editing go Analyitical

On 27th May we finalised release 1.0 of Reviewer’s Workbench. RW represents the culmination of several strands of research and development that I had been involved with over the last couple of years.

In 2011 I set up Digital Linguistics to sell Review Sentinel. Review Sentinel is the world’s first Text Analytics based Language Quality Assurance technology. I first publicly presented Review Sentinel at the TAUS User Conference held in Seattle in October, 2012.

In January 2012 I became a member of the Multilingual Web Language Technologies Working Group. Funded by the EU, this Working Group of the W3C is responsible for defining and publishing the ITS 2.0 Standard. ITS 2.0 is now in Final Call stage of the W3C process.

I can safely assert that Reviewer’s Workbench is the world’s first editor to utilise text analytics and other metadata that can be encoded with ITS 2.0 – such as machine translation confidence scores, term disambiguation information and translation process related provenance – to bring new levels of performance to the tasks of linguistic review and post-editing. What’s more is that Reviewers Workbench is completely inter-operable with industry standards like XLIFF and toolsets such as the Okapi Framework.

Reviewer’s Workbench allows you to personalise the visualisation of all this available important, contextual and useful data to inform and direct post-editing and linguistic review effort.

r1_0interface

This is just the beginning. Feature set planning for release 2.0 is already very advanced and includes more state-of-the-art facilities. Stay tuned!

 

Code Warriors Show Agility and Community Enthusiasm

On the 27th we finished work on release 1.0 of Reviewer’s Workbench. RW is a desktop application that aims to bring new levels of productivity to the activities of post-editing, translation and linguistic review/quality assurance by utilising some recently available technologies and standards. I will write further about RW in another posting. The subject of this post is the team behind RW.

RW is the third development project that we have realised with the help of Spartan Software Inc.

Our first engagement with Spartan was in November 2012. The project was full of risks:

  • Project required use of a new proprietary API,
  • The deadline for completion (including testing, integration and deployment) was one month,
  • It required a skill set relatively new to the development team.

Result: a resounding success! The application has been running as an almost continuous scheduled process ever since without any patches. How did we achieve this:

  • We started with a minimal functional specification which focused on requirement context, user story and outline feature set,
  • We ran development on a weekly sprint with a “stand up” call every other day,
  • We hired the right development team. Spartan are just exceptional developers. I have always believed that if you give good engineers an outline requirement and the freedom to technically specify and code the solution, you will get more than you wanted. These guys absolutely turned that belief to fact.

It has been a pleasure working with Kevin Lew and Chase Tingley over the last 7 months. The pinnacle of that working relationship was today when we announced the contribution of a significant amount of code to the Okapi open-source project.

I cannot wait until we recommence work on our RW release 1.1 product backlog.