For our research project we needed to use pHash to do some operations on a lot (tens of thousands) of image files. pHash uses ImageMagick internally, probably for simple operations such as resizing and changing the colour scheme.

I am pretty familiar with errors such as these coming from convert or mogrify:

convert.im6: no decode delegate for this image format `Ru-ей.ogg' @ error/constitute.c/ReadImage/544.
convert.im6: no images defined `pnm:-' @ error/convert.c/ConvertImageCommand/3044.
sh: 1: gm: not found

[CImg] *** CImgIOException *** [instance(0,0,0,0,(nil),non-shared)] CImg<unsigned char>::load(): Failed to recognize format of file 'Ru-ей.ogg'

What I wasn’t expecting was to get such errors in one of my own applications that uses a library (phash) that uses another library (imagemagick). What moron prints error messages to stdout from inside a library? Seriously!!??

But it gets worse. As I put this code in a loop it quickly found a reason (the first was a .djvu file) to eat up all my ram and then start on the swap. Crappy code, but it’s a complex codebase, I can forgive them. I figured I’ll just set my ulimit to not allow any program to use over half a gig of RAM with “ulimit -Sv 500000” and ran my program again:

[CImg] *** CImgInstanceException *** [instance(0,0,0,0,(nil),non-shared)] CImg<float>::CImg(): Failed to allocate memory (245.7 Mio) for image (6856,9394,1,1).
terminate called after throwing an instance of 'cimg_library::CImgInstanceException'
  what():  [instance(0,0,0,0,(nil),non-shared)] CImg<float>::CImg(): Failed to allocate memory (245.7 Mio) for image (6856,9394,1,1).
Aborted

Aborted? What sort of garbage were these people smoking? You don’t bloody abort from a library just because you ran out of memory, especially in a library that routinely runs out of memory! Bah. Anyway, I found a way to make sure it doesn’t abort. Set ulimit back to unlimited and instead created a global imagemagick configuration file /usr/share/ImageMagick-6.7.7/policy.xml:

<policymap>
  <policy domain="resource" name="memory" value="256MiB"/>
  <policy domain="resource" name="map" value="512MiB"/>
</policymap>

Now no more aborts and no more running out of memory. Good. Until I got to about file number 31000 and my machine ground to a halt again, as if out of RAM and swapping. What this time? Out of disk space of course, why not!

I’ve already set ImageMagick in my program to use a specific temporary directory (export MAGICK_TMPDIR=/tmp/magick1 && mkdir -p $MAGICK_TMPDIR) so that my program, after indirectly using the imagemagick library can run “system(“rm -f /tmp/magick?/*”);” because, you know, it’s too much to ask ImageMagick to clean up after itself. Barf… But it even got around that. For a single PDF file it used over 65GB of disk space in /tmp.

And if at least they said they’re using other people’s libraries it’s not their fault and so on and so forth maybe I wouldn’t be so pissed, but instead they give me bullshit like “oh what’s a lot of resources to you is nothing to someone else, we have 1TB of RAM, bla bla”.

Piss off, I’m going to find another solution that doesn’t involve using this garbage.