Modelling Simple Calculations

I need to create a simple interactive single page application which allows a user to experiment with different financial scenarios. There are several variables which can be changed in the web page and the calculation should be re-computed.

I want to model the calculation as a series of smaller calculations and the final answer should be able to be computed with a single call to a compute method which cascades down through the calculation graph. Finally, if any of the sub-calculations are re-computed I want the possibility of other objects to be notified.

Having decided that the simplest calculation would consist of two inputs I have come up with this class:


public class TwoPredecessorCalculation
{
public Func<decimal> Predecessor1 { get; set; }
public Func<decimal> Predecessor2 { get; set; }

public Func<decimal, decimal, decimal> Calculation { get; set; }

public TwoPredecessorCalculation()
{

}

public TwoPredecessorCalculation(Func<decimal, decimal, decimal> calculation)
{
Calculation = calculation;
}

public decimal Compute()
{
decimal computedValue = Calculation(Predecessor1(), Predecessor2());
ValueComputed?.Invoke(this, new ComputeEventArgs{ ComputedValue = computedValue });
return computedValue;
}

public decimal Compute(Func<decimal> input1, Func<decimal> input2, Func<decimal, decimal, decimal> calculation)
{
Predecessor1 = input1;
Predecessor2 = input2;
Calculation = calculation;
return Compute();
}

public event EventHandler<ComputeEventArgs> ValueComputed;
}

Calculations can then be set up so:


Func<decimal> myHourlyRate = () => 15.00m;
Func<decimal> hoursWorked = () => 20m;

TwoPredecessorCalculation myNetCharge = new TwoPredecessorCalculation((mhr, hw) => mhr * hw);
myNetCharge.Predecessor1 = myHourlyRate;
myNetCharge.Predecessor2 = hoursWorked;

Func<decimal> vatRate = () => 23m;
TwoPredecessorCalculation myGrossCharge =
new TwoPredecessorCalculation((mnc, vatr) => mnc + (mnc * (vatr / 100)));
myGrossCharge.Predecessor1 = myNetCharge.Compute;
myGrossCharge.Predecessor2 = vatRate;

myGrossCharge.Compute().Should().Be(369m);
myNetCharge.Predecessor1 = () => 15.75m;

myGrossCharge.Compute().Should().Be(387.45m);

I think this is nice simple and somewhat fluent. I do realize that this kind of thing would be even better in F#.

Is this a web version of Ocelot?

Perhaps!

Over the Christmas break and in my spare time I wanted to learn Microsoft’s Blazor framework so I thought it would be a great opportunity to try and write an editor which has some of the functionality of our open source desktop editor, Ocelot.

The main difference between Ocelot and this (other than desktop vs web) is that this works with JLIFF as opposed to XLIFF. This means I can do stuff like use CosmosDb as the repository and save native JLIFF documents to it.

Currently it has diff’ing, and machine translation and I’m currently trying to finish off ITS Quality annotation.

Parsing, Recursion and Observer Pattern

I have worked for a while now with two serializations of the XLIFF Object Model: XLIFF and JLIFF (which is still in draft). I have had occasion to write out each as the result of parsing some proprietary content format in order to facilitate easy interoperability within our tool chain, and round-tripping one serialization with the other.

Whilst both are hierarchical formats when parsing them recursively they require different strategies.

With XLIFF (XML) each opening element has all of its attributes available immediately. This means you can construct an object graph as you go: instantiate the object, set all of its attributes and make any decisions based on them, and add the object to a stack so that you can keep track of where you are in the object model. This all works nicely with the Observer pattern: you can subscribe to events which fire upon each new element no matter how nested.

<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="" srcLang="">
<file id="f1">
<group id="g1" type="cms:field">
<unit id="u1" type="cms:slug">
<segment id="s1">
<source>/>
<target/>
</segment>
</unit>
</group>
</file>
</xliff>

With JLIFF (json) you have to wait (assuming you’re doing a depth-first token read) to read all of the properties of nested objects until you can access all the properties of the parents. Thus you have to build an object graph before you can then traverse it again and use the Observer pattern in an efficient way to build another representation.

{
"jliff": "2.1",
"srcLang": "en-US",
"trgLang": "fr-FR",
"files": [
{
"id": "f1",
"kind": "file",
"subfiles": [
{
"canResegment": "no",
"id": "u2",
"kind": "unit",
"locQualityIssues": {
"items": []
},
"notes": [],
"subunits": [
{
"canResegment": "no",
"id": "s2",
"kind": "segment",
"source": [],
"target": []
}
]
},
]
}
]
}

Differences are also apparent when dealing with items which require nesting to convey their semantics. This classically happens in localization with trying to represent rich text (text with formatting).

XLIFF handles this nicely when serialized.

<source>Vacation homes in <sc id="fmt1" disp="underline" type="fmt" subType="xlf:u" dataRef=""/>Orlando<ec dataRef=""/>

Whilst JLIFF is somewhat fragmented.

"source": [
{
"text": "Vacation homes in "
},
{
"id": "mrk1",
"kind": "sm",
"type": "term"
},
{
"text": "Orlando"
},
{
"kind": "em",
"startRef": {
"token": "mrk1"
}
}
]

Content Interoperability

I am working on a project which is very familiar in the localization industry: moving content from the Content Management System (CMS) in which it is authored to a Translation Management System (TMS) in which it will be localized and then moved back to the CMS for publication.

These seemingly straight-forward scenarios often require far more effort than seems worthy. As the developer working on the interoperability you often have to have:

Knowledge of the CMS API and content model. (The content model being the representation which the article has inside the CMS and when exported.
Knowledge of the TMS API and the content formats that it is capable of parsing/filtering.

In this project the CMS is built on top of a “document database” and stores and exports content in JSON format.

One of the complexities is that rich text (text which includes formatting such as text emphasis – bold, italic – and embedded metadata such as hyperlinks and pointers to images) cause sentences to become fragmented when exported.

For example. the text:

“For more information refer to our User’s Guide or Community Forum.”

Becomes:

{
"content": [
{
"value": "For more information refer to our ",
"nodeType": "text"
},
{
"data": { "uri": "https://ficticious.com/help" },
"content": [{
"value": "User's Guide",
"nodeType": "text"
}],
"nodeType": "hyperlink"
},
{
"value": " or Community Forum.",
"nodeType": "text"
}],
"nodeType": "document"
}

If I simply let the TMS parse the JSON I know it will present the rich text sentence as three segments rather than one and it will be frustrating for translators to relocate the hyperlink within the overall sentence. Ironically, JLIFF suffers from the same problem.

What I need is a structured format that has the flexibility to enable me to express the sentence as a single string but also have the high fidelity to convert back without information loss. Luckily the industry has the XML Localization Interchange File Format (XLIFF).

I have three choices for programming the conversion, all of which are open source:

I wanted to exercise my own code a bit so I went with the third option.

JliffGraphTools contains a Jliff builder class and Xliff12 and xliff20 filter classes (for XLIFF 1.2 and 2.0 respectively). These event based classes allow a publish/subscribe interaction where elements in the XLIFF cause subscribing methods in the Jliff builder to be executed and thus create a JliffDocument.

I decided to use this pattern for the conversion of the above CMS’ JSON content model to XLIFF.

It turns out that this approach wasn’t as straight-forward as anticipated but I’ll have to document that in another post.

Counting Sheep

I am affected by insomnia on a fairly frequent basis. I don’t use any sleep related gadgets or applications because I’d probably scare the crap out of myself. Let’s see if this post has any cathartic properties.

Tonight’s/this morning’s musings (in no particular order of appearance): graph based machine learning, building an object model library in Typescript, my next career move, health impact of not sleeping well, industry integration/fragmentation dichotomy, current development projects, Brexit, personal relationships, and quality estimation.

It’s going to be tough getting through tomorrow. Sweet dreams.

Exploring JLIFF

I have published a web application where you can submit XLIFF 2.x files and get back a JLIFF serialization as text.

JLIFF is a new XLIFF serialization format currently in working draft with the OASIS Object Model and Other Serializations Technical Committee.

The application uses my open source JliffGraphTools library.

I am working on a conversion of XLIFF 1.2 to JLIFF but as the content model is structurally different it’s tricky.

I was careful to implement it in a way that means no data is persisted. I don’t even collect any metadata about what is submitted. That way people can feel confident of about the privacy of their data.

JLIFF Library

A belated note that back in March I open sourced my utility library for serializing and de-serializing JLIFF.

plus ça change, plus c’est la même chose

I was recently asked to talk on a topic that I have some familiarity with. Nevertheless thought it would be good to check back on some earlier slide decks and came across this.

Ocelot 3.0

On October 17 we released Ocelot 3.0. See the Release Notes for details.

The Devil is in the Detail

I really enjoyed attending Unicode 41 this week. Following changes to my role some years back and the fact that the conference is always held on the West Coast of the US, I hadn’t been in a while but I will definitely put it back on my conference agenda. It was great bumping into customers and old friends and seeing the new generation of researchers and engineers address what is essentially the challenge of worldwide communications.

The conference kicked off with a very interesting, entertaining and thought-provoking keynote entitled “Can We Escape Alphabetic Order”, given by Thomas S. Mullaney.

The remainder of the conference sessions I attended covered: predictive models used by Google in their Android keyboards; dynamic translation resource bundles developed by Uber for their mobile apps; enhancements to ICU (International Components for Unicode); Nextflix’s approaches to bi-directional and vertical subtitles and captions; Javascript libraries for internationalization; support for Emoji in Unicode; and NLP techniques for identifying fraudulent names across many languages.

It is quite incredible the degree to which companies are enabling and adapting their products in order to have them accepted in target regions. I’m not talking about translations and number formats here: it’s about supporting all writing directions, accurate and detailed rendering of complex scripts and perfect fluency in generated messages that involve levels of plurality, gender, formality and style. And the open and collaborative nature of the efforts to document this information in the form of the Common Locale Data Repository is commendable.

idiomatic prose

Stuff that interests or impresses me.