fontcrime – less is more

in two ways, really.

i had a talk with clare the other day during the lunch break of dave parnas‘s talk at formal methods 2006.

apparently rendering of kanji on ubuntu is really awful. each character looks different from the next. a dirty secret of pango comes out.

i attended behdad’s talk about how all of this works at guadec and pango decides what glyph will be used for a specific character on a character-at-a-time basis. this means that you can easily have different fonts chosen for different characters in the same string. what’s more — pango seems to like to pick from fonts with fewer glyphs (thinking that they’re more specific and therefore better) — “less is more”.

the problem is that this is often an evil policy.

these fonts look quite different from each other. if you look at the screenshot, the right radical (“bird”) in each shot should look identical. they are very different.

sure enough, if you put these characters side by side in a string, you get one drawn in each font.

in this case, arial unicode (which i installed myself) has complete coverage of (at least nearly all) kanji. however, on a default ubuntu install you have kochi gothic which is used by preference. remove kochi gothic and you have baekmuk dotum.

if you remove both of these fonts, however, you’re left with just arial unicode (if you’re blessed enough to have this font). now things are beautiful. less is more.

i’m not up on all of this font stuff but why is it that we don’t have a big free font with all of the glyphs for every language in it? if it does exist then why isn’t ubuntu shipping it?

also — why can’t pango do a sort of prescan on the string to look at all of the characters in the string and do its hardest to try to pick a font in which all of the characters exist and use that for every character? even if there are higher quality glyphs for some of the characters in one font, it sort of seems more important that the string is displayed in a consistent font. would this be entirely too expensive?

of course, on the other side of this argument, you have the case of a single (for example) kanji character appearing inside a huge block of latin text and causing the latin to be rendered in a lower quality junk font that just happened to be inclded in the same file as the kanji…

oi.

13 thoughts on “fontcrime – less is more”

  1. I was under the impression that font selection in general was handled by fontconfig, not by Pango (I will accept being corrected).

  2. There is no, repeat, no, “fewer characters is better” mechanism in fontconfig (which does the font selection) or Pango. Fontconfig will prefer a font that has all the characters needed for the current language. But beyond that, it’s all up to the fontconfig config files on your system, which in fact, most likely do list Kochi Gothic explicitly as a good font to use for “sans-serif”.

    (The current language is the language specified by the app/document if it gives one, otherwise your locale’s current language. If however that language doesn’t match the script: if the text couldn’t be in the locale’s language, then Pango tries to guess a language from the script… that is, if the script is Greek, it will guess a language of ‘el’. Since your current language isn’t Chinese, Japaense or Korean, and you can’t really guess what language an isolated Han character is from, Pango will use the generic xx language code for your example above, which disables the preference for fonts with all the characters in the language entirely.)

  3. (Note: I brought up this topic after seeing Ryan having pasta with his chopsticks!)

    I don’t know if the choice of the encoding is related to the problem or not. Please keep in mind there are many encodings for Chinese alone – 5 for simplified and 3 for traditional. (Don’t ask me why there are so many encoding.) Sometimes the browsers, even when using other operating systems, are left to guess which encoding the pages are using because the encoding is not specified.

    It seems to me that you don’t need Chinese or Japanese calligraphy lessons to know the characters are in different fonts. :)

  4. Clare: were you using the desktop/application in a CJK language or were you just displaying some random Kanji in an otherwise english desktop?

  5. I was using firefox displaying (non random) Chinese characters. The font rendering is usually bad in wikipedia’s Chinese articles.

  6. maybe it’s a firefox bug where it could set a language tag in pango but it doesn’t….

  7. If firefox is not correctly choosing Chinese fonts, then there is a Firefox bug, since zh.wikipedia.org has:

    <html xmlns=”http://www.w3.org/1999/xhtml” xml:lang=”zh” lang=”zh” dir=”ltr”

    So there is information there that zh should be preferred
    that needs to be passed down to Pango. (It is possible that
    your system is configured to prefer a traditional Chinese
    font when given just “zh” and zh.wikipedia.org has
    implified Chinese, but certainly twiddling around with
    Pango can’t fix that.)

  8. Since I don’t have any other browsers installed in my Ubuntu Gnome installation, I can’t verify if this bug is only related to firefox or not.

    I have picked all the Chinese language options in my firefox browser. Doing so does not solve the font rendering problem at all in firefox at all.

    Note that this problem affects zh.wikipedia.org *only*.

Comments are closed.