Troubles with the Combining Diacritical Marks block

The problem

Unicode is more than just characters. It also includes information about how a character should behave. The behaviour of the character in the Combining Diacritical Marks block is that they should combine. Not all applications respect that behaviour information (yet), but some do.

Of course, this is not just an odd annoyance, but it is whished for. Combining diacritical marks really should combine. However, when other signs are mapped onto the Combining Diacritical Marks block, the applications that are capable of doing so will still combine them, even if this is not intended. This happens in Tengwar Elfica. That font maps signs that should not combine onto the Combining Diacritical Marks block. This leads to odd behaviour in applications that are capable of combining. And since Tengwar Elfica maps the same signs additionally on the Personal Use Area, this may result in two different displays of the same sign: Either displayd as a combined character or as a plain, un-combined one.

Two samples

Consider for instance GIMP.app, an application that is not capable of combining. In the following picture, the double grave accent on the first line is mapped onto the Combining Diacritical Marks block, U+0309 (Combining Hook Above). On the second line, the same sign is mapped onto the Personal Use Area, U+E87D. However, both look the same because GIMP.app displays no combining:

Elfica samples in GIMP.app

TextEdit, however, is an application capable of combining. The following picture shows exactly the same characters as above, but now, they are displayed in two different ways. The character on the first line, U+0309 (Combining Hook Above) combines with the preceding character, U+005B (Left Square Bracket). This has several consequences, not all of which can be seen on the picture. TextEdit raises that character slightly. It does this in order not to display the Combing Hook Above within the preceding character. This is a justified precaution since that character, the Left Square Bracket, really is a raised character. The raising of the combined character, however, results in a higher line height. This would brake the flow of a longer text. Yet another consequence is that the combining mark and the preceding character behave like a single sign for all aspects of text editing: They can only be altered or deleted together. The Private Use Area character on the second line, however, is displayed in the same way GIMP.app has displayed it because there is no combining:

Elfica samples in TextEdit

Solutions

I think the problem could be solved by avoiding the Combining Diacritical Marks blocks. It might be preferrable to rely solely on the Private Use Area without remapping any characters that are already assigned in Unicode. A number of common applications, though, seem to be unable to display characters from the Private Use Area. Consider the same characters as in the above pictures put into this html page:

11[̉1



When you have installed the proper version of Tengwar Elfica, both lines should display correctly. However, at least on Mac OS X 3.9, neither Firefox (2.0.0.4) nor Opera (9.21) are able to display the second line that is from the Private Use Area. On the other hand, Safari (1.3.2) and iCab (3.0.3) display both lines correctly:

Display in Firefox:
Elfica samples in Firefox
Display in Opera:
Elfica samples in Opera
Display in Safari:
Elfica samples in Safari
Display in iCab:
Elfica samples in iCab

Given this faulty implementation of the Private Use Area, I think the best solution is to remap the signs mapped onto the Combining Diacritical Marks block onto another block that does not combine, while of course keeping the Private Use Area assignments because that's where the future lies (hopefully)!