Christian Perrier from the debian-i18n list has done me a huge favour. He created a tarball with every translation in every language for every piece of software in Debian!
You may imagine it’s huge as did I, but I was shocked at just how big it is. Almost 2 GB of gzip-compressed PO files from the testing and unstable branches!
I wrote a little script to extract all the po files from the extracted tarball:
find | while read A; do gunzip -v "$A" ; done
I’ve no idea how long it’s going to take to run :)
After that I’ll have to write a special PHP script to parse all the po files and add the translations to the database. there are going to be some challenges with that:
- It’s going to be very hard to notice if an error happened during parsing or insertion.
- It’s probably going to take a very long time on current hardware.
- I might actually run out of disk space, since my MySQL databases are in /var and that’s on the root partition and it’s quite small.
- If my schema design isn’t great – I might have to scrap it all and go through the exercise again. This is, sadly, quite likely.
All solvable problems, and I’m happy that I already got to the point where I have to seriously worry about scalability.