Daniel Pitts’ Tech Blog

Archive for December, 2007

Site back up.

Tuesday, December 18th, 2007

As you can see, the site is back up on schedule. Yippy.

One of these days, when I can afford it, I’ll get a dedicated server somewhere. :-)

Moving Server: Site downtime 12-11-2007 through ~12-18-2007.

Monday, December 10th, 2007

We’re moving tomorrow (12-11).  Since I host this site from my apartment, that means the site will experience some downtime.  I hope to “colo” the machine at a friends house during the interim, but there will be some down-time either way. I’ll try to minimize the downtime.

Why is C so slow? Java vs. C benchmark.

Saturday, December 8th, 2007

Recently I’ve seen a few attacks on Java’s performance on comp.lang.java.programmer. So, I’ve decided to write my own benchmarks and test it myself. I expected the C version to perform slightly better, but at least in the same range, as the Java version. I was surprised that the Java version performed better, on both the client vm and the server vm.

I did my own benchmarks using these files:

bench.c

#include <stdio.h>
#include <time.h>

void bench() {
  long foo = 0;
  clock_t start = clock();
  for (long i = 1; i < 5000; ++i) {
    for (long j = 1; j < i; ++j) {
      if ((i % j) == 0) {
        foo ++;
      }
    }
  }
  clock_t end = clock();
  printf("%d %dms\n", foo,
     (int) ((end - start) * 1000 / CLOCKS_PER_SEC));
}

int main() {
  for (long i = 1; i < 10; ++i) {
    printf("%d: ", i);
    bench();
  }
}

Bench.java

public class Bench {
  static final long CLOCKS_PER_SEC = 1000;
  static void bench() {
    int foo = 0;
    long start = System.currentTimeMillis();
    for (int i = 1; i < 5000; ++i) {
      for (int j = 1; j < i; ++j) {
        if ((i % j) == 0) {
          foo ++;
        }
      }
    }
    long end = System.currentTimeMillis();
    System.out.printf("%d %dms\n", foo,
       (int) ((end - start) * 1000 / CLOCKS_PER_SEC));
  }

  public static void main(String[] args) {
    for (int i = 1; i < 10; ++i) {
      System.out.printf("%d: ", i);
      bench();
    }
  }
}

Then I ran these:

-bash-3.00$ java -version
java version "1.5.0_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)
Java HotSpot(TM) Client VM (build 1.5.0_09-b03, mixed mode, sharing)
-bash-3.00$ javac Bench.java
-bash-3.00$ g++ --version
g++ (GCC) 3.3.3 (NetBSD nb3 20040520)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-bash-3.00$ g++ bench.c -o bench

Now I’m ready to run the individual tests:

-bash-3.00$ java -server Bench
1: 38357 457ms
2: 38357 416ms
3: 38357 401ms
4: 38357 394ms
5: 38357 394ms
6: 38357 401ms
7: 38357 395ms
8: 38357 401ms
9: 38357 394ms
-bash-3.00$ java -client Bench
1: 38357 421ms
2: 38357 400ms
3: 38357 394ms
4: 38357 400ms
5: 38357 393ms
6: 38357 393ms
7: 38357 400ms
8: 38357 394ms
9: 38357 401ms
-bash-3.00$ ./bench
1: 38357 450ms
2: 38357 440ms
3: 38357 450ms
4: 38357 430ms
5: 38357 450ms
6: 38357 440ms
7: 38357 450ms
8: 38357 440ms
9: 38357 450ms

As you can see, the Java version is approximately 10% faster than the c version. So, here is my challenge. Why is C so slow? I thought it was supposed to be faster than Java.

More Discussion On Operator Overloading

Wednesday, December 5th, 2007

Updated: See notes below.

I was surprised to see that within one day of posting my previous entry on Operator Overloading, I received several comments. Aviad Ben Dov from Chaotic Java even took my idea and ran with it. Ricky Clarkson suggested using Haskell’s approach of allowing anything that is of the “Num” type to define +,-,/,*, etc…. I have a few things to add to this discussion.

Aviad’s idea for operators by interface is not a bad one; it works well for overloading “[]” but it breaks down on a few use cases (such as ‘+’, ‘*’, etc..) that are important (to me). Ricky’s idea for subtypes of a specific class getting to have operator overloading isn’t bad either, but for physical unit manipulation it is too inflexible. The core concept that both of them seem to have suggest is that a limited selection of types can have overloaded operators, but the operations that are possible aren’t limited to the scalar quantities that this would limit the operators to.

Suppose I have the classes Distance, Area, and the built-in “Scalar” type Double I would expect at least these sets of operations:
Distance * Distance => Area
Distance * Double => Distance

If I had to implement the Multipliable<T> interface, I wouldn’t be able to handle Distance * Distance and Distance * Double. You can’t implement an interface twice, even with different type parameters. I don’t know if this is something that Reified generics would fix, but it feels like it might be. Maybe someone could comment on that.

Also, if Distance had to extend Number, what would doubleValue return? Meters? Inches? Smoots? There might be some way to solve these problems, but I can’t think of a way to prevent abuse while allow good use.

Actually, now that I have thought a little about it…

The semantics of plus (+), minus (-), times (*), dividedBy (/), moduloOf (%), shiftLeft(<<), shiftRight(>>), unsignedShiftRight(>>>), or(|), and(&), xor (^), negative(-), and inverse(~), are all well-defined enough for so many not-necessarily-numeric types that allowing, even if only through naming conventions, the overloading of those operations seems like a good idea.

I think a good way to go would be to convert at compile time a * b to the method call a.times(b). Assignment operators like a += b would be replaced with a = a.plus(b). This would help reduce abuse while creating a more expressive language. The assignment operator rule is important, as it will help prevent the “clever” idiom of using += for appending elements to a collection.

Note on updates: I previously misspelled Aviad as “Avaid”. I also have added clarification for which use-cases Aviad’s Indexer doesn’t work for me, namely for algebraic operators.

Almost Useful: Operator overloading

Tuesday, December 4th, 2007

On suns site, there is an open bug for operator overloading. Many people have pointed out that Java has one special case of operator overloading (String + String), so why not allow the programmer to overload operators?

Operator overloading would become especially useful when the addition of the units and measures API, or other custom libraries that are similar. It becomes especially useful when trying to avoid primitive obsession, and create numeric-like types.

Imagine this case:

Speed s = endDistance.minus(startDistance).divide(duration);

could be simplified to

Speed s = (endDistance - startDistance) / duration;

This of course is a simple example, and yet one that I would love to use in some of my existing code-bases.

Another use case would be a cleaner syntax for lists/maps:

myMap["Hey"] = "There";
System.out.println(myList[10]);

And hey, what about a special case for compareTo? Although it might be too dangerous to overload =/==, I could see overloading <, > <=, and >=. It might be nice to add a couple of operators to the mix. I’m officially suggesting “#” for concatenation. Maybe “:=” for shortand to .equals().

The Art of Decoupling.

Monday, December 3rd, 2007

Some people think the power of OO design is that the classes represent real-life concepts that are easier for a human to understand. Others think that its power comes from code reuse. Still other just accept OO as the paradigm that they are supposed to use, simply because they were told so.

The truth is that using any of those approaches to design may well leave you with a brittle, unmanageable, ugly mess. Personally, I strive for the “real-life concepts” as one top-level goal. The other top level goal is decoupling. Transcending all goals, of course, is creating a “correct” program. Not necessarily “correct” in the academic sense. I don’t verify every algorithm I use, and I don’t necessarily determine valid pre/post conditions. I mean correct in as much as, it works for what I need it to work for.

For small programs, this is pretty easy. When you get to larger systems, this becomes a slightly different problem to solve. In the days of goto, spaghetti code arose due to poor organization and ad hoc control transfer. Structured programming helped some by organizing the control codes into a set of well-defined idioms. Object oriented programming helped further by organizing responsibility into encapsulated modules. All of these paradigms had something in common; you could write brittle code or malleable code in any one of them. Granted, writing flexible code has gotten easier with Object Oriented programming, but only if you know what makes code brittle.

Over dependence of one section of code on another section of code. In software engineering, we call this coupling. If the behavior of A is dependent on the behavior of B, then A is dependent on B. Changing the behavior of B could change the behavior of A. This isn’t always a bad thing, but if A and B are written by different people (or the same person at different times) with different goals, it could be catastrophic. It isn’t always feasible to decouple two classes, or two methods, or even two lines of code, but if you can do it easily, consider what might be gained by that.

One syndrome I’ve noticed is that with any concept, some people take it to the extreme. For those readers who stopped at the above paragraph and decided I was right, and everything should be as decoupled as possible, you might see a design that I would call disintegrated. It is possible to create code that is so decoupled as to be impossible to figure out how one thing affects another, even though the overall system is design for A to affect B. Don’t do this :-)

One approach to decoupling two classes is to have them communicate through an event system. This will couple them both to the event system, but not to eachother. This approach allows you to replace one of the pair without changing the other side at all. This makes a lot of sense for GUI applications, for example, where user interactions with the component (such as a button) generates an event that can be handled very differently, depending on the needs of the program. Most object oriented languages, and Java in particular, already has a useful mechanism for invoking behavior on other objects. Its called method invocation, the act of making a method call. Event systems are useful for decoupling the method call from the class that needs to know about the call, its not so useful as a replacement for normal method invocation.

There are plenty of other ways to decouple you code (often through use of patterns), but I won’t get into them here. GIYF

Software design is about making difficult choices. If you didn’t have to make choices, then it would be easy to create a simple program that can design software. Your job as an engineer is to decide what should be decoupled from what. Decoupling can be a great tool to create a reusable component, but not everything should be designed to be reusable. If you’ve written something that was coupled, but needs to be made reusable… Well, thats what refactoring is for. Decoupling two classes that only communicate with each other can actually add complexity for that use case, so unless you strongly believe that at least one of the two classes will be useful in other situations, it is a waste of cognitive power to separate them.

My goal for production-quality software design? Create the simplest design that meets all my requirements, but is flexible enough to adapt to future requirements.