{"id":371,"date":"2012-02-25T19:21:26","date_gmt":"2012-02-26T00:21:26","guid":{"rendered":"http:\/\/littlesvr.ca\/grumble\/?p=371"},"modified":"2012-12-05T00:47:16","modified_gmt":"2012-12-05T05:47:16","slug":"6-million-translated-strings-and-counting","status":"publish","type":"post","link":"http:\/\/littlesvr.ca\/grumble\/2012\/02\/25\/6-million-translated-strings-and-counting\/","title":{"rendered":"6 million translated strings and counting"},"content":{"rendered":"<p>Since the 19th of this month (that&#8217;s 6 days ago, I don&#8217;t know where all that time has gone,oh yeah, tests) I&#8217;ve been importing the translated strings from Debian.<\/p>\n<p>Right now I&#8217;ve done over 6 million (6036472) and I&#8217;ve only got to the end of the projects beginning with the letter &#8220;g&#8221;. Using some simple (i.e. inaccurate) math &#8211; it will take me another 7 days to finish importing everything I could guess the language code and parse.<\/p>\n<p>The process is driven by a dedicated php script I wrote. PHP because the rest of my code is php, and I wasn&#8217;t going to rewrite things in bash :) Turns out that it works pretty well. At first I thought I was going to run out of memory, the script quickly ate up 5% of RAM, but over the next few days it went back down and now sits at a comfortable 1.8%.<\/p>\n<p>I ran it manually (not through apache, actually apache isn&#8217;t allowed to read that script at all) in a screen session, which is one of the reasons I had to stop the first import attempt (that was in a plain terminal).<\/p>\n<p>The other reason I had to restart the import was my MySQL configuration. Given that I&#8217;m not a database guy my MySQL was always using the minimum amount of resources, the defaults from my-small.cnf in Slackware. I&#8217;ve replaced that with my-huge.cnf and that had a very nice effect: no more swapping!<\/p>\n<p>In the first attempt after about a day MySQL was using 120% of my CPU (dual-core). Now even after 6 days and 6 million strings inserted it&#8217;s using on average 15% of CPU and 25% of RAM. Everything else on the server (Apache, Sendmail, Imapd, etc) seem to be completely unaffected by the very heavy process.<\/p>\n<p>One sucky thing about migrating from my-small.cnf to my-huge.cnf was that the Innodb backends are incompatible. So I had to:<\/p>\n<ul>\n<li>figure this out,<\/li>\n<li>reconfigure the server using the old settings,<\/li>\n<li>dump the OSTD database into plain text SQL,<\/li>\n<li>delete the backend,<\/li>\n<li>reconfigure MySQL with the new settings, and<\/li>\n<li>import the old SQL from the plain text<\/li>\n<\/ul>\n<p>Luckily OSTD was the only MySQL user that was using the Innodb backend. So none of my blogs were affected. Though it all worked out fine in the end &#8211; I&#8217;m quite surprised that there is no automagic way to &#8220;upgrade&#8221; the Innodb backend. It&#8217;s bizzare to me that in this day and age of the cloud and enterprise scalability my storage backend woult be tied to MySQL memory settings.<\/p>\n<p>I&#8217;ve started to clean up the site in preparation for the completion of the import, when I&#8217;ll be announcing its release. Still not sure if I&#8217;m going to register a domain for it or not, but probably not at first.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Since the 19th of this month (that&#8217;s 6 days ago, I don&#8217;t know where all that time has gone,oh yeah, tests) I&#8217;ve been importing the translated strings from Debian. Right now I&#8217;ve done over 6 million (6036472) and I&#8217;ve only got to the end of the projects beginning with the letter &#8220;g&#8221;. Using some simple &hellip; <\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,4],"tags":[],"class_list":{"0":"entry","1":"post","2":"publish","3":"author-andrew","4":"post-371","6":"format-standard","7":"category-ostd","8":"category-safeforseneca"},"_links":{"self":[{"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/posts\/371","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/comments?post=371"}],"version-history":[{"count":6,"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/posts\/371\/revisions"}],"predecessor-version":[{"id":376,"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/posts\/371\/revisions\/376"}],"wp:attachment":[{"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/media?parent=371"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/categories?post=371"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/littlesvr.ca\/grumble\/wp-json\/wp\/v2\/tags?post=371"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}