Why managed code is slow
By Andrew Smith
I’ll pick on Java but this should apply to other languages too. I’m just afraid that due to circumstances beyond my controll I’ll end up writing Java code this year – that’s what got me thinking of this.
All Java applications I ever used (web or local) were dreadfully slow and used way too much memory. Yet any report I read about a performance benchmark of Java vs C claimed that it’s just as fast or faster. I dismissed those reports as written by fanboys and thus irrelevant, but always wondered – how do they pass all the benchmarks, yet feel so slow when I use them?
I think I figured it out, at least part of it. Let’s see if I can express it.
C programmers have a fundamental understanding of how much resources of what kind any given call uses. There is nothing between most system calls and the kernel that executes them, and there is only a driver (a.k.a. the kernel) between some system calls and the hardware.
I will probably never read the implementation of write() for any filesystem, but I think I understand what it does: call a function in the kernel, have the request queued up, if there is a cache look in it, move the hard disk head and read from the platter.
And having used it a thousand times I know how it performs in various circumstances. Small or large blocks vs number of repeated calls, in a loop, in a function, mixed with read()s, etc.
Look at the same scenario for Java. It’s bad enough that there are 25 classes with OutputStream in the name, but even something simple like java.io.OutputStream::write() – noone really knows how it works. Does that just pass the request the system call? Does it iterate through the array of bytes calling a function each time? Does it serialize the data?
When bringing these things up around Java fans, you get a response like:
- “that’s what’s great about Java, you don’t need to know, it just works”, or, less likely:
- “experienced Java developers learn that too in time”
There’s also denial, but for simplicity’s sake I’ll pretend those people don’t exist.
The first excuse is used in 80 percent of the cases, and is 80% responsible for the problem. If you write a line of code that’s a call to a most basic function provided by the programming language and you have no idea how it works, it’s impossible to optimise your code, you just have to trust the runtime to figure it out.
And then comes the experience. Say you do learn as much about java.io.OutputStream::write() as I have learned from experience about write(). What you know is at best 50% likely to apply to java.io.PipedOutputStream::write(), and 25% likely to apply to BufferedOutputStream::write(). Whereas my knowledge about write() will apply to any write() no matter what device, as long as I have a trivial understranding of how that device works.
That’s the rest of 20% of the problem – the belief that you can actually be experienced enough in Java to understand its inners as well as you could uderstand C. Noone has that kind of time! Perhaps some Java language developers, though being hired by an enterprise to create a product for the enterprise, I really doubt that even the implementers need to know. After all Sun and the like never made a secret out of this – safety over speed, ease over understanding; it’s just not a priority.
And to compound the problem – your call to java.io.OutputStream::write() is not guaranteed to work at the same speed on different versions of the runtime (remember it’s the runtime executing your code, not the hardware). So the same program could behave drastically differently on Java 1.4 and Java 5.
Yes you have a similar problem in C where performance depends on the implementation in the kernel (thus the kernel version), but with a runtime you have the kernel version problems and the runtime version problems on top – they don’t cancel each other out :)
One curious note – I have even less experience writing C# than writing Java, but from the user’s point of view C# doesn’t seem to have the same performance troubles. I do wonder why. It could be just better code optimised to death for the operating system, or it could just be that most of the libraries it uses are DLLs loaded by windows when it boots. That’s something else to think about.
Anyway – I’m not a C fanatic like the Java fanboys I mentioned. I will use a different language when it will do a better job. But one factor that’s almost universally important in software problems is performance, and Java will have a hard time scoring high on that scale.
April 9th, 2008 at 2:26
Funny, I’ve been thinking about this a lot the past week or so. Here are some things to consider:
* Java does JIT by default. This is going to do inlining, loop unwinding, etc. at runtime. You can help it do it’s job better if you know (and measure) what it’s doing.
* garbage collection may run at any time, and totally screw your performance. It is tunable though, and is fairly predictable with not a lot of work.
* a lot of slow Java apps are still a result of bad code – e.g. creating new Strings instead of using StringBuffer in a loop which may take arbitrarily large input data (this is a general pattern of not reusing objects). This is not only be way slower per-invocation but will also fill up memory and make the garbage collector work harder.
Wikipedia has a pretty decent article, actually – http://en.wikipedia.org/wiki/Java_performance
However, all that said, Java has some really decent profiling tools, and if you spend time optimizing your critical paths I don’t think that I/O errors (to take your OutputStream::write() example) are going to have a dramatic effect from VM version to version. There are tons of variables when you’re installing on systems that you don’t control, and it gets a lot worse when you’re talking cross-platform (and not just multiple versions, e.g. kernel for a particular OS).
If you want to make performance improvements, the important thing is to be able to measure the current state and continue to measure as you make changes (e.g. by profiling). The big problem with big C or C++ apps tends to be working around strange platform and toolchain quirks that cause undesirable problems, which having a VM somewhat shields you from as it becomes someone else’s problem :) This of course makes it harder for _you_ to solve, though; when you’re compiling the binary you can call a lot more of the shots (which is both good and bad of course).
I’m not really sure how to measure .net runtime versus java runtime performance. I would not be surprised if only supporting one OS and being able to focus on supporting it’s version-specific quirks was a lot more productive than what Sun is trying to do :) The way that you can call unmanaged code is much easier and faster in .net (P/Invoke) than in java (JNI), and .net obviously uses native UI elements so that’s probably going to help user perceived as well as actual performance right there.
I personally don’t think Java is a reasonable choice for desktop apps for a lot of reasons besides just performance (although that’s surely one). Others include the non-standard UI, lack of distribution channel, and cross-platform focus. .net surely pwns Java on all of these. I think that Java is still reasonable on the server side, but I don’t see how it can possibly compete on the client.