One of the things I can do now is try and define what my experiments with people will look like. To perform these experiments I would need to get approval from the UofT ethics board, and to get that approval I need to fill out an application describing what sort of information I collect from whom under what circumstances.

This is the first attempt. Let’s assume for the purpose of this excercise that the software will be an extension of Review Board.

This will give me potential access to the Basie project (my fellows Zuzel and Mike Conley are working with it), a bunch of open source projects and perhaps even a business or two of those listed on the Review Board site, and whomever else I can find.

Greg mentioned I need to be in physical proximity to the subjects, I forgot to ask why – perhaps it’s not necessary for the experiment the way I define it. The most likely candidate is Basie.

I would very much prefer to have a controlled experiment, but in a real environment. Which means that review board has to be already actively used in the project I’m looking at, and at least several reviews per week are done. This may be a challenge, but I’ll work with the assumption that I can find such a project.

Oh, the bigger problem is that the reviewers need tablets (using a mouse will be useless). This is probably what Greg meant.

I will have N reviewers participationg. For this description I will only mention one, and I’ll call this person Roy the reviewer. I will give Roy a tablet and a URL to 4 code submissions:

  • Two simple changes (something like a new 2nd year undergrad hello world function)
  • Two complex changes (maybe database access, or more than one file affected, or changes spread wide across one file)

Each of these will have the same number of problems, but they will be different so that one review doesn’t affect the other.

I will ask Roy to review one of the first and one of the second normally, by typing in comments. This would tell me:

  1. What number of substantively different comments they provided
  2. How many of those are of what type (design flaw, bug, usability problem, style issue)
  3. How much time was spent on the review

Then I will ask Roy to try the stylus, to have him get used to using it as a pen, then to use the stylus to add review comments either inline or on the side of the code they’re talking about, treating the monitor as a printout.

Then I would compare the results.

Using a stylus should not be a problem. But experienced reviewers have a system developed for providing good feedback using the oldschool method, so I have to account for that somehow. Perhaps a few days later I can repeat the experiment and see if their use of the new system is any different.

Do this with N people, and perhaps some statistically significant results can be deduced.