One of the things I will need in my database is a table with all the language codes used in Linux locales. Things like en, fr, es, etc. There are lots, but where do I get a reliable list?

I’ve done some searching and found the IANA language subtag repository. It’s a 45000 line text file with contents in this format:

%%
Type: language
Subtag: ab
Description: Abkhazian
Added: 2005-10-16
Suppress-Script: Cyrl

Of all those records only 1155 lines are 2-letter codes, which is what I was interested in. How do I get the language code and english name from there into a database? Piece of cake if you know some basic shell scripting:

#!/bin/bash

cat languagelist.txt | while read LINE; 
do 
  if echo $LINE | grep Subtag > /dev/null; 
  then 
    echo -n "`echo $LINE | cut -f 2 -d' '` "; 
    HAVECODE=1
  elif echo $LINE | grep Description > /dev/null; 
  then 
    if [ $HAVECODE -eq 1 ]
    then
      echo `echo $LINE | cut -f 2 -d' '`; 
    fi
    HAVECODE=0
  fi;
done

And insert it all into the database:

#!/bin/bash

./parselanguagelist.sh | while read LINE;
do
  CODE=`echo $LINE | cut -f 1 -d ' '`
  NAME=`echo $LINE | cut -f 2 -d ' '`
  mysql -u user -ppassword -e "INSERT INTO Language (LanguageCode,LanguageEnglishName) VALUES('$CODE','$NAME');" ostd
  if [ $? -eq 0 ]
  then
    echo "Inserted $CODE ($NAME)"
  else
    echo "Failed to insert $CODE ($NAME)"
  fi
done

Done, 190 records. And next time I want to update the list (who knows, it might happen) I’ll just need to get a new list and use the MySql feature that will let me either create or update a row depending on whether it already exists.

I think it would have taken me quite a while to generate this list of sql commands by hand :)