Why managed code is slow – Grumble Grumble

I’ll pick on Java but this should apply to other languages too. I’m just afraid that due to circumstances beyond my controll I’ll end up writing Java code this year – that’s what got me thinking of this.

All Java applications I ever used (web or local) were dreadfully slow and used way too much memory. Yet any report I read about a performance benchmark of Java vs C claimed that it’s just as fast or faster. I dismissed those reports as written by fanboys and thus irrelevant, but always wondered – how do they pass all the benchmarks, yet feel so slow when I use them?

I think I figured it out, at least part of it. Let’s see if I can express it.

C programmers have a fundamental understanding of how much resources of what kind any given call uses. There is nothing between most system calls and the kernel that executes them, and there is only a driver (a.k.a. the kernel) between some system calls and the hardware.

I will probably never read the implementation of write() for any filesystem, but I think I understand what it does: call a function in the kernel, have the request queued up, if there is a cache look in it, move the hard disk head and read from the platter.

And having used it a thousand times I know how it performs in various circumstances. Small or large blocks vs number of repeated calls, in a loop, in a function, mixed with read()s, etc.

Look at the same scenario for Java. It’s bad enough that there are 25 classes with OutputStream in the name, but even something simple like java.io.OutputStream::write() – noone really knows how it works. Does that just pass the request the system call? Does it iterate through the array of bytes calling a function each time? Does it serialize the data?

When bringing these things up around Java fans, you get a response like:

“that’s what’s great about Java, you don’t need to know, it just works”, or, less likely:
“experienced Java developers learn that too in time”

There’s also denial, but for simplicity’s sake I’ll pretend those people don’t exist.

The first excuse is used in 80 percent of the cases, and is 80% responsible for the problem. If you write a line of code that’s a call to a most basic function provided by the programming language and you have no idea how it works, it’s impossible to optimise your code, you just have to trust the runtime to figure it out.

And then comes the experience. Say you do learn as much about java.io.OutputStream::write() as I have learned from experience about write(). What you know is at best 50% likely to apply to java.io.PipedOutputStream::write(), and 25% likely to apply to BufferedOutputStream::write(). Whereas my knowledge about write() will apply to any write() no matter what device, as long as I have a trivial understranding of how that device works.

That’s the rest of 20% of the problem – the belief that you can actually be experienced enough in Java to understand its inners as well as you could uderstand C. Noone has that kind of time! Perhaps some Java language developers, though being hired by an enterprise to create a product for the enterprise, I really doubt that even the implementers need to know. After all Sun and the like never made a secret out of this – safety over speed, ease over understanding; it’s just not a priority.

And to compound the problem – your call to java.io.OutputStream::write() is not guaranteed to work at the same speed on different versions of the runtime (remember it’s the runtime executing your code, not the hardware). So the same program could behave drastically differently on Java 1.4 and Java 5.

Yes you have a similar problem in C where performance depends on the implementation in the kernel (thus the kernel version), but with a runtime you have the kernel version problems and the runtime version problems on top – they don’t cancel each other out :)

One curious note – I have even less experience writing C# than writing Java, but from the user’s point of view C# doesn’t seem to have the same performance troubles. I do wonder why. It could be just better code optimised to death for the operating system, or it could just be that most of the libraries it uses are DLLs loaded by windows when it boots. That’s something else to think about.

Anyway – I’m not a C fanatic like the Java fanboys I mentioned. I will use a different language when it will do a better job. But one factor that’s almost universally important in software problems is performance, and Java will have a hard time scoring high on that scale.