While analyzing the files I got from Debian I ran into a lot of language codes that weren’t in my database already.

It was an interesting exercise, involving me learning about the existence of languages such as Javanese and countries that I already forgot about.

The problem is that some of the language codes are redundant, including the country code even though the language is the default for that particular country. For example el_GR means Greek from Greece, no kidding.

I don’t have el_GR in my database and see no point in adding it. So for Debian translation files that are identified as el_GR I have a hardcoded if(el_GR)replacewith(el). I’ve got about 72 such replacements that I had to figure out one by one.

A smaller set of language/country combo codes I did add to the database, such as English from Canada, South Africa, Ireland (not kidding); Catalan from Italy and Andorra, Arabic from Oman and Egypt, French from Luxembourg.

I just wanted to make a note of this, because it took me a hell of a long time to look through the list of unknown codes, figure out what they stand for, and whether they deserve a country specific version or not.

There will be a lot more work needed to clean up the list of PO files from Debian, so this post was just part 1 of hopefully not too long a series.