Christian Perrier from the debian-i18n list has done me a huge favour. He created a tarball with every translation in every language for every piece of software in Debian!

You may imagine it’s huge as did I, but I was shocked at just how big it is. Almost 2 GB of gzip-compressed PO files from the testing and unstable branches!

I wrote a little script to extract all the po files from the extracted tarball:

find | while read A; do gunzip -v "$A" ; done

I’ve no idea how long it’s going to take to run :)

After that I’ll have to write a special PHP script to parse all the po files and add the translations to the database. there are going to be some challenges with that:

  1. It’s going to be very hard to notice if an error happened during parsing or insertion.
  2. It’s probably going to take a very long time on current hardware.
  3. I might actually run out of disk space, since my MySQL databases are in /var and that’s on the root partition and it’s quite small.
  4. If my schema design isn’t great – I might have to scrap it all and go through the exercise again. This is, sadly, quite likely.

All solvable problems, and I’m happy that I already got to the point where I have to seriously worry about scalability.

Thanks Christian!