Saturday, May 24, 2008

Cycling through the Integer range - A Fermi problem

After graduating in economics during the summer of 2005, I went interviewing for a business analyst position in a couple of business consulting firms (e.g. Mckinsey & Company).

06232007living.jpgSince, real life, business dilemmas requires estimating and decision making under uncertainty (not all of the required information is available nor it is accurate), a major part of the interview for these type of firms is confronting you with the "How many pay phones are there on the island of Manhattan?" type of problems, also known as Fermi's problems.
Although that, at first, these problems seem quite puzzling, given that you remain focused, methodical and leverage a modest amount of common sense, it all gets pretty easy. The "trick" is to combine basic facts which you already know, with some four grader algebra, doing this brings you to good enough estimates.


midtown-manhattan-city-street.jpg



Allow me to introduce to you a quick CS Fermi problem that someone through around while in the office. The problem might also be presented during an interview with a fresh graduate student candidates. Here goes:

Running on your average home computer (A single 2Ghz core), how long would it take for this Java program to complete it's operation?

long startTime = System.currentTimeMillis();
for (int i=Integer.MIN_VALUE; i<Integer.MAX_VALUE; i++) {
};
System.out.println(System.currentTimeMillis()-startTime);



How long? Two nanoseconds? Three seconds? Four hours? Five years? Six centuries? Seven millenniums? What's important here is the order of magnitude and not the exact answer, you might find this question to be trivial, but you will be surprised of how many people can't get a clue on how to start answering it. Take thirty seconds and try to come up with your own estimation, before reading through my estimation:

Let's compute a ball park figure:
Since an Integer is a 32Bit creature, the loop will cycle 2^32 times (about 4.3 billion times. Remember that a billion is 10^9). The 2006's average home computer CPU runs at about 2GHz, this means that the CPU can perform two billion simple instructions per second (Complex instructions consume several CPU cycles).

The loop does three obvious operations on each cycle: (1) I is incremented. (2) the values of i and the max Integer constant are compared between (3) we jump back to the beginning of the loop.
All are fairly simple instructions (don't have to be an assembly programmer to know that), so it's safe to assume that these instructions are executed with in a single CPU cycle.
BTW: Instructions 2 and 3 can be combined in to a single instruction (jump is less then).
If the loop would have been coded in assembly language, my guess is that it would take 4 seconds to complete: (2 instructions) * (4*10^9) loop cycles / (2*10^9) instructions/sec = 4 seconds. Thus, we have just found the lower limit value for our answer: the Java code couldn't execute in under 4 seconds.

My guesstimation would probably be between 4-40 seconds.

Other possible influencing factors:
(*) Now we know that Java is not effective as machine language and adds some overhead to our code. Depending on the implementation of the JVM in use, the method might be complied to machine native code, instead of executing in interpreted mode. This would improve the performance of course.

(*) As i recall, by JLS specification, the JVM is obliged to check for overflow while incrementing integers; If so, this will add a fix number of operations per loop cycle.

(*) Since our Integer isn't volatile (a local variable can't be volatile anyway), its value would be probably cached in one of the CPU's registers throughout all of the loop execution. Have it been declared a volatile, the JVM would had been forced to read and write the Integer value to the machine's main memory on each operation that involves the Integer variable. since memory CAS latency is measured in nanos as well, this should, theoretically, add a fixed cost for each loop cycle (~10-100 nanos), possibly increasing the estimation's order of magnitude by a factor of one.

(*) Running on a multi-core chip should have no direct positive effect, as this is a single threaded program.

Here is the relevant disassembled Bytecode:
4: ldc #3; //int -2147483648
6: istore_3
7: iload_3
8: ldc #4; //int 2147483647
10: if_icmpge 19
13: iinc 3, 1
16: goto 7

Actual results:
(1) On my IBM T41 ThinkPad it took 80 seconds to complete.
(2) On my workstation at home, equipped with an Intel core2 6300 1.8GHz CPU it tool only 9 seconds to complete.

Since I can't explain such discrepancies. I'll have to check further and update with new information. Try it yourselves!

6 comments:

  1. Software companies love to ask questions like “How many gas stations are in Cleveland?”. Here’s my thoughts on why they do this: http://www.zenternal.com/weblog/index.php/2008/06/15/fermi-problems-ie-how-many-iphones-are-in-austin/

    ReplyDelete
  2. Don't forget optimization. Just because you tell it to run that code doesn't mean it really will.

    Using the 64-bit Linux version of Sun's 1.6.0_03-b05 JDK, it completes in 1ms for me (yes, I mean 0.001s). Using the 64-bit version of OpenJDK's 1.6.0-b09 JDK, it takes 4ms. Both runs were on an otherwise mostly idle Intel Q6600, using one of its 2.4GHz cores. This is pretty consistent on repeated trials.

    However, I agree with your assessment of the approximate amount of time it would have to take to compute that loop, so it must be choosing not to run the loop. It's allowed to make that decision, as there are no side effects.

    According to "javap -c", the loop code is still present in the bytecode. And if I run with the -Xint flag, then those times jump up to 57.4s for both Sun 1.6.0_03-b05 and OpenJDK 1.6.0-b09. The interpreted runs are still performing the loop calculations.

    If I change the code, so it has a side effect that the compiler can't optimize, I can force it to keep the loop (though it may make other changes, like unrolling it partially). I created a long variable "dummy", and then put "dummy *= 3;" inside the loop. That slows both native implementations down to 5.43s, which is in the ballpark you're describing. It's actually running the loop.

    Interestingly, if I only have "++dummy;" inside the loop, the optimizer is able to infer the results of that calculation; it takes only 0.16sec to do that (again with approximately equal times from both JDKs).

    ReplyDelete
  3. Sarkari Recruitment is one of the biggest Indian Job Site so here you will getGovt Jobs in Telangana 2017so

    ReplyDelete