December 17, 2008

Number of strings to translate in Thunderbird or Lightning

A question that often comes up with new localizers that want to localize Thunderbird or the Calendar products (Lightning and Sunbird) is "How many strings do I need to translate/localize?"

Finally I have the answer to that question and I'm posting it here, because
  1. Someone else than me might be interested in the answer as well
  2. I need a page that I can point people to
So for Thunderbird you need to translate 5749 strings (as of now).
For the Calendar products you need to translate 2065 strings (as of now).

That's really a lot and it really makes it that more impressive when you think about the great work that localizers are doing for our products.


Cédric said...


Simon, does that include toolkit strings? If not, it would be useful to mention it for new locales that didn't yet localize Firefox.


Adrianer said...

I think Cedric is right about that: especially, because toolkit is, as of yesterday, exactly 3425 strings.

For those interested in other numbers:
Firefox: 1654
SeaMonkey: 5354

The good thing when localizing SeaMonkey if there are existing Tb and Fx localizations: Tb shares with Sm more than 2/3 of its strings. Fx shares some amount too, but not as much like Tb ;)

Adrianer said...

Sipaq: Im not sure how you counted the strings, but my script tells me for Tb: 4246 and for Sb: 1754 (both without toolkit). Would be interesting to know which script counts wrong (and why) ;)

Simon said...

no, this does not include toolkit strings. I left those out specifically because in all cases that I know of and have experienced Thunderbird or Sunbird were never the first Mozilla applications to be localized. It was always Firefox that was localized first, probably because of its much larger popularity.

I have those numbers from compare-locales. I tested with the German locale from which I had deleted all mail or calendar strings and which only had the toolkit strings left.

Adrianer said...

ok, I'm getting +/- the same numbers with Pike's compare-locales like with my compare-locales (the small differences are because of no support of inc-files in my version and a different way of counting keys).

In my previous numbers I forgot to add "keys" (access and command keys)...

BTW. The easiest way to count toolkit strings is by doing:
compare-locales mozilla-central/toolkit/locales/l10n.ini (...)
The numbers you get there with that for toolkit-only.

walter said...

Of course, you can also use moz2po and pocount. That way you'll even know how many words you need to translate. The toolkit also folds in accelerators in DTDs, so they won't be counted as separate messages.

Fjoerfoks said...

And how about the other folders: dom, reporter, netwerk, other-licenses and security, did you include them into your count?

Also when starting localization, there are some html-files to be translated like landingpages and the TB startpage. Can you give us a clue on that amount of work?

Dwayne said...

Strings are not a very good measure of the workload. Counting words is better. The strings measure is also bad as @walter mentioned because it probably includes counting accesskeys.

So using pocount and a Thunderbird 3.1b1 PO tarball I found these numbers:

Thunderbird: 29149 words (5977 strings)
Toolkit: 7136 words (2001 strings)

Toolkit it seems is the least of your work.

* These numbers include almost all localisable files but exclude things kept out of tree and probably some random HTML files.
* I've included these folders editor/ mail/ netwerk/ other-licenses/branding/thunderbird/ security/ dom/ in the Thunderbird count.
* Almost all accesskeys are folded (we don't fold .properties files). This deflates the string count but doesn't change the word count much.

Simon said...

I agree that measuring just the string count is not the best way to estimate the workload, but it's at least a decent indicator of the amount of work and that's what most people were looking for, when they asked me the question "How much work is it to localize Thunderbird or Lightning?"

Dwayne said...

@simon: yes and no. In a case where you can't get word counts its certainly better then nothing. But it can be grossly deceptive if there are many long strings.

In a case where we can get word counts, and we clearly can, I'd go with words counts any day.

The localisation industry actually looks much deeper. They count characters (a better measure for CJK languages), words and also count other non-translatables like variables, XML, etc.