An extended example of re-writing code to make it more powerful, flexible, and clear, based on in-class discussion.
Calculating a standard error for the median of a particular Gaussian sample by repeated simulation, "manually" at the R console. Writing a function to automate this task, with everything hard-coded. Adjusting the function to let the number of simulation runs be an argument. Writing a parallel function to do the same job for an exponential distribution. Since this is almost entirely the same, why have two functions? Putting in a logical switch between hard-coded options. Better approach: abstract out the simulation into a separate function, and make the simulator an argument to the standard-error-in-median function. Example of applying the latter function to a much more complicated simulator. Advantages of the modular approach: flexibility, clarity, ease of adjustment. Example: removing a for loop in favor of replicate in the find-the-standard-error function, without having to change any of the simulators. Writing parallel functions to find the interquartile range of the median, or the standard error of the mean. Repeating the process of abstraction: the common element is taking a simulator, estimating some property of the simulation, and summarizing the simulated distribution. All three tasks are logically distinct and should be performed by separate functions. Reduction of bootstrapping to a two-line function taking other functions as arguments.
PDF handout, incorporating R examples
Posted by crshalizi at February 16, 2011 01:48 | permanent link