Anatomy of a Scientific Research Paper (Part 5): The Approach

Today we’ll talk about the approach.  In a typical “build a better mousetrap” paper, this is the only section which contains details about your new idea.  Yes, the only section.  Everything else is setup or evaluation or related work.

And the approach section is at most 2 pages long.  So how do we make effective use of it?  How can we share our great new idea with the world?

Well, you can usually divide the approach into 4 parts:

  1. A high-level overview / summary breakdown
  2. Data collection / key implementation details
  3. Detailed architecture and walkthrough
  4. Example of the input and output

For the first part, you’re looking for a 1 paragraph general summary.  Break down how your system works from a very high level.  Now, this can be difficult.  You have had your nose in the details for months to years with your project, and you’re going to want to start in on every tiny detail and nuance.  Stop that now.  Think of telling someone about how to jump start a car.  You 1) open the hood of both cars, 2) connect the positive terminal of the bad battery to the positive terminal of the good battery, 3) connect cable to negative terminal of good battery, 4) connect cable to chassis of the dead car, and 5) start the dead car!

You could have said that “the crown detachment apparatus is operated to expose the electrical storage box before the X-marked-joint is established to the Electrical Carriage Line (ECL) while also established to an adjacent electrical storage box until an I-shaped-marked-joint established with the ECL is attached, or otherwise connected, to the I-shaped-marked-joint of the adjacent ESB via the metallic frame objects.  However, pressure is not applied to the controller apparatus until this point.  Furthermore, detachments may be necessary regarding controller output depending on feedback.”

But that wouldn’t make any sense at all.

Take a look at a paper Ameer Armaly, Casey Ferris, and I wrote together:

Armaly, A., Ferris, C., McMillan, C., “Reusable Execution Replay: Execution Record and Replay for Source Code Reuse”, in Proc. of 36th IEEE/ACM International Conference on Software Engineering, New Ideas and Emerging Results Track (ICSE’14 NIER), Hyderabad, India, May 31-June 7 2014.

Here’s the first paragraph-and-a-half from the approach section of that paper:

Our approach enables the reuse of functions from C and C++ programs.  Given a function to reuse, our approach works in four steps: 1) from a log file, restore the state of the program containing the function at a point just prior to the function’s execution, 2) modify any parameters or global variables as instructed by the programmer, 3) pass control to the function so that it executes, and 4) catch the function return so that the programmer can read the function’s output.

In this section, we will elaborate on each of these steps.  We will use the example in Figures 2 and3 to illustrate how these steps work in practice.  These figures show how our approach can reuse the function nearestStar() from Section 2.1.

Now, you may have no idea about the details here, but if you understand programming, you can understand the gist of our idea.  That’s because we started off with a high-level description of the most important steps.

Next you will need some preliminary details, to prepare for explaining the big details.  These preliminary details are just pointing to some background sections you wrote earlier, how to set up your new gadgetry, and other important technical details.  This is also where you would mention how you actually get the input data (e.g., through some pre-processing, or from a database), what kind of programming languages you’re dealing with, etc.  Take a look at the next three paragraphs from our paper:

3.1 Supporting Technology

We have heavily modified the Jockey library [19] for our approach.  The most important modification we made was to add the ability to “go live.”  Many approaches, such as the one implemented in the GNU debugger, do not actually re-execute the logged instructions.  Instead, they log the output of each instruction and, during replay, restore the state as it was after the instruction.  This restoration produces an identical result when the logs are reviewed for debugging.  For our work in reuse, we alter the state before replay, which means that the instructions will need to be re-executed, rather than restored.  We implement a “go live” system after the state is restored, inspired by an approach described by Laadan et al. [15].

3.2 Preparation

To prepare to reuse a function, a programmer must first record a checkpoint for that function.  The checkpoint must be taken at a point just prior to the function’s execution.  We provide a recording utility based on Jockey’s checkpointing feature.  The utility takes a program and the name of a function in that program.  The utility then executes the program.  The programmer may interact with the program to ensure that some behavior is recorded, or run a test script.  The utility monitors the process — whenever the function is called, the utility directs Jockey to record the state of the program to a checkpoint file.  The function may be called several times, and there will be one checkpoint for each of these.  The programmer can choose a checkpoint that he or she prefers, otherwise the default is the first checkpoint.

3.3 Reusing Functions

We implemented our approach as a userspace C/C++ library for 32-bit Linux 2.6.10.  While implementation for different environments is possible, in this paper we limit the scope to one environment for clarity and reproducibility.  Figure 2 shows an example program using our library.  The remainder of this section will cover the steps of our approach, using this example for context.

Supporting Technology, Preparation, then gritty details about the kernel version our tool runs on and so forth.  Everything important for reproducing our approach.  Nothing extraneous.

At long last comes part 3 of the approach, which finally shows what you’re system does internally.  Check out page 3 from our paper.  Here it is for convenience:

Page 3 of ICSE NIER'14 paperEssentially what you want to do here is show a figure or two with the either the architecture of your system, or the user interface to the system.  Mark the figures with bubbles labeled 1, 2, 3, …, for each thing that happens.    Then clarify in the text.  Walk through what happens.  At the top-right of the page above: “To make this feature available for reuse, we have provided flashback_load_scene() in our library (Figure 2, area 1).”  Use as many gritty details are you need to explain what is happening.  For example, we point out that we use libdwarf.  That breakpoints are placed in pre-defined memory locations.  Etc.

In part 4 of the approach, the example, you want to make sure that somewhere there is an explicit example of the input to the system, the output of the system, and a walkthrough of how input became output.  In our short paper above, we rolled the example into the background as a motivating example, and into the approach as Figures 2 and 3.  In other papers, you may want to create a separate section with an example if you didn’t get a chance to put it elsewhere.  Do not let your paper end without a clear example of the input and output to your system.  Do NOT let your paper end without a CLEAR example of the INPUT and OUTPUT.

And that’s how it’s done!  To recap:

  1. A high-level overview / summary breakdown
  2. Data collection / key implementation details
  3. Detailed architecture and walkthrough
  4. Example of the input and output


Leave a Reply

Your email address will not be published. Required fields are marked *