Archive for the 'Safe For Seneca' Category

A shout-out from Linus Torvalds

Tuesday, May 8th, 2012

Wow, Linus himself used ISO Master and mentioned he liked it in a blog post (or whatever that Google+ thing is), I am so flattered!

I would post a thank you response there, but I don’t feel like signing up for an account (yes, I am very happy living without most google services).

Back to NFS

Monday, May 7th, 2012

Grumble grumble, me back to NFS from SSHFS (previous post). Turns out that:

  1. root has no permissions to do anything on the SSH mount, which sort of makes sense but I wonder if it’s security through obscurity. Regardless – I need root to be able to do stuff in there like “make install”.
  2. For some weird reason I can’t execute anything off the SSH mount, which I learned when I tried to (as the correct user) run a simple shell script I run every month.

At first I thought I probably just need to do some googling and figure it out, but that didn’t help. Some of the more interesting “explanations” for like behaviour are “this isn’t a real filesystem”, and bullshit like that turns me off technologies. If it was intended to be used as a toy – great work, it was fun, but I have to user my filesystem now, thank you very much.

So much potential though.. maybe I’ll try again in a few years.

Replacing NFS with SSHFS

Tuesday, April 24th, 2012

For years I knew about the security issues related to NFS, but there wasn’t really a solution to that and I didn’t have a multiuser environment to worry about so I lived with it.

More recently though I’ve experimented with sshfs and found it to work really well. So I figured why not try to replace my NFS entry from /etc/fstab with an SSHFS entry?

Not all that hard to do, though there have been a couple of quirks. First I added this to /etc/fstab:

sshfs#user@ip:/dir          /home/user/dir fuse     defaults,idmap=user,noauto,user 0 0

Note that it’s not mounted automatically. That’s because to mount something using SSHFS you need to give a password or use an SSH key, and root (who runs the startup scripts) doesn’t have my user’s private key.

This works great for mounting it manually (mount /home/user/dir) but not so well for an automount at boot. For that first I tried to add the mount command to my XFCE startup scripts, but it turns out it didn’t run quickly enough. Next thing I tried was to add it to .xinitrc but it seemed to be ignored.

A bit of searching and I found a solution – use ~/.xprofile instead. So this is what mine looks like:

~$ cat .xprofile
mount /home/user/dir

And it works great! Now if only I could find a solution for that ridiculous plain-text SMTP protocol…

Announcing the Open Source Translation Database

Thursday, March 8th, 2012

Translating software is hard, I know from my experience of starting two new open source projects (ISO Master and Asunder) about the challenges of learning how to use Gettext, finding volunteers to do the translations, encouraging and enabling them to translate my software.

The work was worth it for me, I now have almost 70 full translations of my software in 40 languages. But I’d like to make the process of getting your first translation easier, and generally help more software maintainers to get more translations with less effort.

The OSTD ( ) is an automatic translations system – it will take your .POT file and populate it with translations based on strings in other open source software, generating .PO files. Given that you can see which software the strings come from – this will be much more accurate than other automatic translation systems such as Google Translate.

I just started the project so there is a lot of polish still coming, and some significant features such as updating existing .PO files and a web service interface for other software to use. But it can be useful as it is already. Please try it out!

Any feature requests and bug reports are welcome. My goal is to make it as useful as possible to as many people as possible. I’m doing this part time, but I’m excited about the project and will do my best to improve it as quickly as possible.

Size matters

Thursday, March 8th, 2012

I was going to show the OSTO to Chris Tyler and earlier that day, because demos never work, I tried it, from the Seneca network.

Turns out already the OSTD is a victim of its own success. When translating the ISO Master POT file I get almost 6000 translated strings in 153 languages. I you do a bit of math – that’s a lot of text. 2.8MB in fact.

Well the problem is that 2.8MB of stuff needs to be downloaded from the server, and it can take quite a while.

Luckily Chris immediately suggested that I enable gzip compression on Apache. I thought that it was enabled by default, but I was wrong. Here’s what I was missing:

<IfModule mod_deflate.c>
        AddOutputFilterByType DEFLATE text/plain
        AddOutputFilterByType DEFLATE text/html
        AddOutputFilterByType DEFLATE text/xml
        AddOutputFilterByType DEFLATE text/css
        AddOutputFilterByType DEFLATE application/xml
        AddOutputFilterByType DEFLATE application/xhtml+xml
        AddOutputFilterByType DEFLATE application/rss+xml
        AddOutputFilterByType DEFLATE application/javascript
        AddOutputFilterByType DEFLATE application/x-javascript

        DeflateCompressionLevel 9

        DeflateFilterNote Input instream
        DeflateFilterNote Output outstream
        DeflateFilterNote Ratio ratio

        LogFormat '"%r" %{outstream}n/%{instream}n (%{ratio}n%%)'
        CustomLog       logs/deflate_log deflate

Seems to work great, thanks again Chris :)

Debian import complete

Tuesday, March 6th, 2012

Finally a couple of days ago the import of all the translated strings from most of the software in Debian into OSTD has been completed.

Now there is a grand total of 11236263 translated strings!

It took 1059647 seconds, which is just over 12 days. That’s 0.094 seconds per translation. I’m sure it could be sped up a lot in the future if I had a real need for to do that.

Neither my PHP import script nor MySQL crashed during the process, which is pretty cool. Also looks like I had enough memory for all this stuff, I don’t think MySQL was swapping a significant amount of data at any time.

I will probably have to do another pass through the po files from Debian to establish an NtoN relationship between the translated strings and the software they are used in, but what I have already is a great start.

Soon I’ll finish up some little things and will be announcing the project to the world.

6 million translated strings and counting

Saturday, February 25th, 2012

Since the 19th of this month (that’s 6 days ago, I don’t know where all that time has gone,oh yeah, tests) I’ve been importing the translated strings from Debian.

Right now I’ve done over 6 million (6036472) and I’ve only got to the end of the projects beginning with the letter “g”. Using some simple (i.e. inaccurate) math – it will take me another 7 days to finish importing everything I could guess the language code and parse.

The process is driven by a dedicated php script I wrote. PHP because the rest of my code is php, and I wasn’t going to rewrite things in bash :) Turns out that it works pretty well. At first I thought I was going to run out of memory, the script quickly ate up 5% of RAM, but over the next few days it went back down and now sits at a comfortable 1.8%.

I ran it manually (not through apache, actually apache isn’t allowed to read that script at all) in a screen session, which is one of the reasons I had to stop the first import attempt (that was in a plain terminal).

The other reason I had to restart the import was my MySQL configuration. Given that I’m not a database guy my MySQL was always using the minimum amount of resources, the defaults from my-small.cnf in Slackware. I’ve replaced that with my-huge.cnf and that had a very nice effect: no more swapping!

In the first attempt after about a day MySQL was using 120% of my CPU (dual-core). Now even after 6 days and 6 million strings inserted it’s using on average 15% of CPU and 25% of RAM. Everything else on the server (Apache, Sendmail, Imapd, etc) seem to be completely unaffected by the very heavy process.

One sucky thing about migrating from my-small.cnf to my-huge.cnf was that the Innodb backends are incompatible. So I had to:

  • figure this out,
  • reconfigure the server using the old settings,
  • dump the OSTD database into plain text SQL,
  • delete the backend,
  • reconfigure MySQL with the new settings, and
  • import the old SQL from the plain text

Luckily OSTD was the only MySQL user that was using the Innodb backend. So none of my blogs were affected. Though it all worked out fine in the end – I’m quite surprised that there is no automagic way to “upgrade” the Innodb backend. It’s bizzare to me that in this day and age of the cloud and enterprise scalability my storage backend woult be tied to MySQL memory settings.

I’ve started to clean up the site in preparation for the completion of the import, when I’ll be announcing its release. Still not sure if I’m going to register a domain for it or not, but probably not at first.

Language codes, part 2

Friday, February 17th, 2012

Most of the po files in the Debian tarball follow the naming convention packagename_version_languagecode.po

So for all of those I could figure out the language code using a regular expression (or three) on the filename. Armed with that and the exceptions I mentioned in the last post on this topic I was able to get to this point in my importer:

ostd$ ./manualpoupload.php unstable/
Examining tree... done (19.11 seconds)
85238 files found. Of those:
 - 68412 had a guessable language code
 - 16826 cannot be used because the language code could not be guessed

Hopefully soon I can run my PO parser against all those 68 thousand files and successfully read all the translated strings from them. It seems likely that they will not all work but I’m optimistic.

For the rest of the 17 thousand files I cannot use right now I will have to come up with a different strategy. Probably after the successful import above I will put the bad ones into a different tree and work on them separately. There are a few strategies I can try then:

  • Write different regexes for every piece of software. This is probably not realistic and would drive me crazy.
  • Try to find some patterns in the bad filenames that can lower the 16K to something much smaller. This idea seems to have some potential.
  • Attempt to find some metadata inside the po files, not just guess based on the filename. My experience with gettext suggests this is unlikely to succeed.
  • Or maybe wait till I get a better idea.

Anyway – 68 thousand files seems like a good start and it would definitely be enough to launch with, so maybe this problem will take a lower priority once I confirm I can parse all the files I guessed the language for. I look forward to finding out how many translated strings are in those files, how long it will take me to parse them and insert them into SQL, and how long a query on the resulting enormous table will take.


Homebrewed live server migration

Friday, February 17th, 2012

I mentioned that I’ll talk about the software migration from the old hardware to the new machine. The neat thing is – I accomplished it in less than a minute of downtime while preserving all my data/metadata.

Here’s the long story (shorter version at the bottom):

  1. First step was to install the OS on the new hardware. This was a full install (just like the old one) of the newest available Slackware version.
  2. At this point I had two servers running, on different internal IPs, both claiming to be but only one (the old one) beeing accessible from the internet.
  3. Then I had to remember/relearn how to use rsync (-avx).
  4. My first sync was from the entire root of the old server into a directory on the home partition on the new one. I’ve used this tree to set up the services the way I wanted them. Most services I reconfigured manually rather than using the old config files – partly because I was expecting the newer versions to have different options (which was sort of true with Apache) and partly because I wanted to make sure I’ve done it right the first time (mostly I have).
  5. Don’t underestimate the step above, that was a lot of work. Things I have completely forgotten about such as my aliases.db file and the stunnel config had to be accounted for.
  6. Originally I was going to keep all the keys from the old server, but instead I’ve decided to consolidate the keys and now I have one set for most of the services I use. Yeah, yeah, whatever.
  7. I also needed to migrate my MySQL databases (of which I have a few). It turned out that just copying /var/lib/mysql isn’t enough, so I had to make an SQL dump of the old database and restore it on the new server. That approach had these problems:
    • The old database wasn’t as secure as I liked, it still had the test db in it, and though I’m sure I went through the users thoroughly, I wasn’t sure enough.
    • The dump included the “mysql” database, which had some tables that changed slightly in the newer version. So mysql refused to work properly.
  8. So even after doing a dump, and transferring the dump over to the new server, and importing it, I still needed to run a couple of commands to secure the databases and make MySQL happy.
  9. The second rsync was more complex. Here I had to sync my home directories (lots of static/dynamic data), the SQL, and /var/spool/mail.
  10. And now the magic:

This is the short version, and the interesting part, here’s what I did:

  1. Opened up my router web configuration page in the browser, navigated to the port forwarding page, and changed all the IPs for forwarded services from the old server’s LAN address to the new one’s. But didn’t save the changes yet.
  2. Stopped all the relevant services on the new server (simple script).
  3. Ran the second rsync again, this completed much quicker than the first time because most of the data was unchanged.
  4. Restarted all the relevant services.
  5. Pressed save on my router config.

The trick worked so well I amazed myself :) I happend to be tailing my apache logs on both the old and new server while doing the final steps of the migration, and the second I saved my new port forwarding settings I saw the logs stop on the old server and start on the new one. It was an awesome feeling.

I’m sure I must have said in the past that rsync is a pain in the ass. I don’t necessarily take it back – but I will say I appreciated having such a powerful tool that day.

How [not] to make a book from your blog

Friday, February 17th, 2012

A couple of things happened recently which got me reading again:

  1. A book arrived at the library that I asked for about 6 month ago. “Nothing to Hide: The False Tradeoff between Privacy and Security”, by Daniel Solove. I’ve heard of it on an interview Moira Gunn had with the author.
  2. I started reading Garth Turner’s Greater Fool blog, and got all of his books they had at the library, this post is about “Greater Fool: The troubled future of real estate”.

Both of these books have been heavily based on the respective author’s prior work. Mostly essays by Solove and blog posts by Turner.

Despite the fascinating topic and great ideas and decent essays – Solove’s book is simply awful. From the introduction where he said “you can read the chapters in any order” I got suspicious, a few chapters in I realised this is not a book, it’s simply a collection of unrelated essays. Despite the author’s claim that he rewrote a lot of the stuff I saw no evidence of cohesion in either narrative or logic.

Turner on the other hand did a great job practically making an entire book out of blog posts. The content is the same but it’s been rewritten and carefully arranged into chapters, with select quotes from blog posts that brought a perceivable timeline to the story. Having been reading his works for a while I am comfortable claiming that he’s using the hammer the message technique, but whether that’s true or not, good or bad, I actually finished his book because it was so much better put together than Solove’s.

Perhaps academics aren’t as skilled as former (journalist+politician)s at putting books together, or perhaps Turner is better at it than Solove. I don’t know. I do recommend that if you’re considering making a book out of your blog – read these two and see the difference between a good one and a bad one.