Archive for the 'OSTD' Category

Plural forms, again

Tuesday, January 17th, 2012

By Andrew Smith

Back in 2009 I wrote a post complaining about the needless complexities plural forms introduce to the i18n process. Now I ran into them again.

Working on the OSTD I have to make sure I work with all kinds of PO files, and that has to include PO files with plural forms. The format of PO files is not a standard, mostly beacuse there’s only one implementation for fully handling them (GNU gettext). Heck, it’s not even a spec. I was barely able to piece together an understanding of possible variations of the format using the gettext manual and looking at existing real PO files. That and the fact that the format is kind of loose and everything else I mentioned in the previous post made me hate plural forms even more than I did before.

My implementations of the PO file parser and writer do not handle plural forms. I am hoping that I’m ignoring them properly when reading, and it’s ok that they’re missing when writing. And I have no plans of supporting them in the future. I will use what little influence I have to convince app maintainers to not use them, or to get rid of them.

 

Scoping in JavaScript

Tuesday, January 17th, 2012

By Andrew Smith

There are a number of articles and blog posts out there trying to explain scope and closures, but overall I’d say that a majority of them aren’t crystal-clear.

No sh**. I’ve been trying to figure out how scope works in JavaScript on-and-off for the last several years. I was never a committed JS programmer though so it wasn’t important enough for me to make sure I learn it once and for all.

Why is it that something as simple as scope is so complicated? That was a rhetorical question, I don’t really want to hear excuses. Last week I was working on some moderately advanced JS code and sure enough I spent hours finding impossible bugs caused by assignments that overwrote values in the wrong variables.

I have no choice but to suck it up, but this blog is a partial registry of my complaints, and this one was definitely worth mentioning.

 

Modifying JSON using a form

Tuesday, December 27th, 2011

By Andrew Smith

If you read the slightly older post and look at its screenshot and do some thinking – you might like me wonder this: given a bunch of JSON with multiple selections which can be modified in JavaScript using a form.. wait, modified using a form?

One of the nice things about json is you can just do myjson[x].whatever[whatever].12 = “ABC” and it works. But if all you have is a <select> element and an onClick handler – that’s not so straightforward.

You can store the string “[x].whatever[whatever].12″ in the value field of the <option> but sadly you cannot just do myjson”[x].whatever[whatever].12″ = “DEF”, that’s a syntax error.

I had to wonder and look for a while, I even found something called JsonPath, which I got really excited about until I realised it’s only for reading (what exactly is the point of it then?). Today I found the solution: eval!

So continuing the lame example above I would simply do this: eval(‘myjson['+x+'].’+whatever+’['+whatever'+'].’+12+’ = “‘+”DEF”+’”‘) Roughly speaking, I haven’t tested this line. But you get the point?

Now that works but luckily because I’m testing with a real pot file I found a bug – all the ‘\n’ literals in the strings are replaced with newlines by eval, which is not what I want. I found a solution that not only fixes that but I hope will prevent my evals from being hacked: I escape all the backslashes and the single quotes in the string before giving it to eval:

eval(jsPath + ” = ‘” + text.toString().replace(/\\/g,’\\\\’).replace(/’/g,”\\’”) + “‘;”);

Yey!

Scary json_encode()

Tuesday, December 27th, 2011

By Andrew Smith

PHP has this really neat function, json_encode(). It can take an object of whatever type, including my own class with child arrays/classes, and make a valid JSON string out of it. I was going to write this function myself but I found PHP already has it.

There’s one concern I have about it – it takes the entire object tree and makes json out of it, and all my member variable names end up as keys in the JSON. So without much difficulty you can look at the JSON and see exactly how I structure the data on the server in the PHP.

I’m not sure I really care but this makes me uneasy. It just doesn’t feel right. Maybe it will be ok.

Translating template files

Tuesday, December 27th, 2011

By Andrew Smith

I got to implementing one of the primary use cases for OSTD – user uploads a template .pot file and gets a bunch of .po files with as many translated strings as possible.

From a design point of view this isn’t a big deal: parse the .pot into a data structure, make a query per string to the database, and save the results.

But wait, what if there is more than one translation (in the same language) per english string? I expect that to happen quite often, so I have to handle that case from the start.

What I decided to do was not create the po files but only create in-memory representations of them, send them over to the user as json, generate the entire body of the webpage from that json, allow the user to make his selections, and post the modified json back to the server.

Only then would I generate the actual po files for the user to download. This will save some disk space (not that I care much about that) but also it’s pretty interesting technically. Possibly this saved me quite a bit of work too – otherwise I’d have had to reparse a bunch of po files I myself have generated.

Here’s a snippet, the blue string has more than one option. All the tables with all their contents are JS-generated:

I knew there was a good reason I liked C

Saturday, December 24th, 2011

By Andrew Smith

I think I mentioned earlier how I almost started writing a website in C and quickly realised that wasn’t the right tool for the job, and switched entirely to PHP.

For the work I did today I needed a good set of data structures:

  • A set of files
    • Each with a set of english strings
      • Each with a set of 0 or more translations

Pice of cake, right? Right, it would have taken me a half an hour in C, took me over six hours in PHP.

One of the problems was a surprise in scoping. Turns out in PHP there is no such thing as block scoping, and they forgot to mention that in the manual (isn’t it obvious there’s only per function and per file scope?). This created some very weird bugs that took a lot of printing to figoure out.

Then there’s the arrays. How can you have a programming language without 0-indexed arrays? PHP forces you to manage the indicees yourself, since their ‘array’ is actually a hash table. No vector either, not list, basically only a hash table. Can get used to it I guess but was it really so hard to have something more structured?

Then there are the classes. I needed to use them because there is no concept of a struct and the arrays are so retarded. I won’t go too much into it, let me just say that I suspect classes in PHP resemble classes in Perl. Sure you can have them, but don’t expect them to be easy to use.

All this complaining, you say, but didn’t it take you years to learn how things work in C, what’s the problem with taking the time to learn how things work in PHP? Well – I would accept that if it weren’t for the prevailing opinion that PHP is easy to learn. It isn’t: you can get started using it very quickly but its similarities to C in syntax make it harder (not easier) to learn it well.

And yeah, the same applies to JavaScript, but whatever. That’s for when I’m ready to do something with this same data I mentioned above in the browser. Gonna be fun :)

Looking at .inc files on an Apache server

Saturday, December 10th, 2011

By Andrew Smith

Ever since I learned how CGI works (a couple of lifetimes ago) I was bothered by the fact that the source code is accessible by the web server, and by extension – by anyone on the internet.

If your CGI module is properly loaded and configured then Apache will execute the files rather than display them, but if there is a problem loading your module then Apache will stupidly display the contents of your source code in the web browser.

This is a real problem because the source code usually includes credentials for your database and who knows what else.

With PHP it got a little better when it became almost completely integrated into Apache and it was very challenging to break PHP without breaking Apache too.

Today I thought – wait a minute. My php will interpret .php files, but what about all those .inc files? They are PHP yeah but with a different extension. Sure enough I looked at ostd/parsePoFile.inc in Firefox and the whole source is dumped right out.

It was an easy enough fix, adding a Files section to my httpd.conf, but come on guys! How hard would it have been to add .inc files to the default config? .ht* is there. Lame.

Number of SQL queries per page

Friday, December 9th, 2011

By Andrew Smith

If someone updates a po file with 100 translations – I need to figure out whether each translation is already in the database, and if not – insert it. The result looks like this (more about “looks” later):

This is just a snippet.

I am concerned that running an SQL query like this:

"SELECT Translation.TranslatedString FROM Translation,Language WHERE " .
"Translation.LanguageID = Language.LanguageID AND Language.LanguageCode = '%s' " .
"AND Translation.EnglishString = '%s'"

for every line uploaded might be a bit too much. Is it?

The prevailing wisdom on the internet seems to be that more than a few queries per page is too much. But I wonder if that’s for viewing content rather than uploading. Perhaps for uploading content this is not too bad. I mean I hope a lot of uploading is going to happen but that’s not the most common use case.

Regardless, I can’t think of an easy way to optimise this. I would have to spend a lot more time on the database design. Maybe when (if?) the site gets popular I will have the motivation to learn more fancy DB optimisations.

Smaler PHP-generated HTML

Friday, December 9th, 2011

By Andrew Smith

I will have something like this on one of the pages in the website:

Each of those dropdown lists has the 190 language codes I mentioned in the previous post. It may not sound like a lot but 1900 <option> values really is a lot of HTML. It’s very easy to generate it all in PHP (one loop basically) but the result still has to be pulled over the internet, and suck up my bandwidth.

I had to think about this one for a bit, but I came up with a decent solution.

In my PHP loop I create two javascript arrays: one for visible name and one for value. I also have ten sets of dropdowns printed as HTML, but with the dropdown lists empty.

Then I write some javascript that will onload populate all the dropdown lists with all the names and values from the arrays.

This made the HTML sent to the browser probably 90% smaller, which made me very happy.

Just because it’s funny, here’s what some the php code looks like, I’m glad noone else will be working with me on this so I won’t have to explain it :)

  $languages = array();
  for ( $rowNum = 0; ($row = mysql_fetch_row($result)) != FALSE; $rowNum++)
  {
    $languages[$row[0]] = $row[1];
  }
?>
    <script type="text/javascript">
    function contentOnLoad()
    {
      var languageCodes = new Array(<?php
  # make the PHP array a Javascript array
  $firstElement = TRUE;
  foreach ($languages as $code => $name)
  {
    if ($firstElement)
      $firstElement = FALSE;
    else
      echo ",";

    echo "'" . $code . "'";
  }

That’s PHP, HTML, and JS all in one, still cracks me up :)

Scraping data from a reliable source

Friday, December 9th, 2011

By Andrew Smith

One of the things I will need in my database is a table with all the language codes used in Linux locales. Things like en, fr, es, etc. There are lots, but where do I get a reliable list?

I’ve done some searching and found the IANA language subtag repository. It’s a 45000 line text file with contents in this format:

%%
Type: language
Subtag: ab
Description: Abkhazian
Added: 2005-10-16
Suppress-Script: Cyrl

Of all those records only 1155 lines are 2-letter codes, which is what I was interested in. How do I get the language code and english name from there into a database? Piece of cake if you know some basic shell scripting:

#!/bin/bash

cat languagelist.txt | while read LINE;
do
  if echo $LINE | grep Subtag > /dev/null;
  then
    echo -n "`echo $LINE | cut -f 2 -d' '` ";
    HAVECODE=1
  elif echo $LINE | grep Description > /dev/null;
  then
    if [ $HAVECODE -eq 1 ]
    then
      echo `echo $LINE | cut -f 2 -d' '`;
    fi
    HAVECODE=0
  fi;
done

And insert it all into the database:

#!/bin/bash

./parselanguagelist.sh | while read LINE;
do
  CODE=`echo $LINE | cut -f 1 -d ' '`
  NAME=`echo $LINE | cut -f 2 -d ' '`
  mysql -u user -ppassword -e "INSERT INTO Language (LanguageCode,LanguageEnglishName) VALUES('$CODE','$NAME');" ostd
  if [ $? -eq 0 ]
  then
    echo "Inserted $CODE ($NAME)"
  else
    echo "Failed to insert $CODE ($NAME)"
  fi
done

Done, 190 records. And next time I want to update the list (who knows, it might happen) I’ll just need to get a new list and use the MySql feature that will let me either create or update a row depending on whether it already exists.

I think it would have taken me quite a while to generate this list of sql commands by hand :)