Motivation

I observe many undergraduate students get frustrated with programming and give up trying to do more than just the minimum. I fear it limits their career possibilities and denies them an activity which can be a great source of pleasure and productivity. The frustration most often comes from their inability to debug their code. So this session is devoted to reviewing my own debugging techniques in the hope that you can learn to enjoy the process.

Typos

Debugging is inevitable. I began computer programming in 1969, over fifty years ago. I took a night school course in programming and then had a summer job as a programmer. One of my first big programs had a bug and it took me many hours to track it down. I remember reading a printout of the program over and over again, unable to spot an error until I finally realized that I had misspelled a variable name. Instead of containing the letter “O”, the variable name had the number “0”. It was the days of dot matrix printers and those two characters were almost indistinguishable in the printout. Since then, I have trained myself to look for spelling mistakes.

As I help students debug their code, I note that a very large fraction of the bugs are simply spelling mistakes: variables names that are slightly different from what they should be. For example, timesteps versus timessteps. So, when you are scanning a line of code, slow down and make one of the scans a letter-by-letter check for typos.

‘Google is Your Friend’

I tell students that Google is their friend, in the sense that there is a vast repository of webpages dealing with programming problems. If you get an error message, you are not likely the first person in history to experience that error. Copy and paste the error message into your browser search bar and let your search engine find webpages where that error is discussed. For example, consider the message “Object not found”:

Often I will prefix the search message with “R”, so that I don’t get taken to webpages dealing with unrelated languages or systems.

The problem here was that the variable “a” had not yet been assigned a value.

Syntax Errors

The next most common problem is a syntax error. I do most of my programming in either R or javascript and so I am frequently switching from one language to the other. It always takes my brain some time to make the switch: remembering to end my lines in javascript with a semicolon (“;”) but in R with just a carriage return; remembering to label comment lines in javascript with a double slash (“//”) but in R with a hash symbol (“#”) and so on.

Fortunately, RStudio is a great Integrated Development Environment (IDE) so it flags many common syntax problems for you right in the editor.

The more common problem we have with syntax is in using functions from packages. The R documentation for these functions can be obtuse so I typically search for examples either in the documentation or on the web. Look for simple examples and then add complexity as you need it. For advanced packages like the tidyverse I recommend working through tutorials first because they are almost like a language in themselves.

Put Your Code into Functions

You can do a lot in R just using the console window but to save your commands you will learn to write the commands in a script file, which can be saved, and execute selected lines from the script. It is tempting therefore to add code to your script file as though you were entering it directly in the console. But since no two R sessions are alike, you are unlikely to want to execute such a script in its entirety ever again. A better style is to break up your code into functions. The only code which you include outside a function is code you believe will be common across all sessions run within a given project. Therefore, you will run the script in its entirety as part of every session for that project.

With that in mind, design your script to be used more than once and to be executed in its entirety. The scripts you created for these tutorial sessions may be bad examples because they are not intended to be executed more than once.

For example, in Introduction to Databases we had you add the following to your script file and execute it.

# Open an SQLite connection using the filename shown
conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
# First delete the table if it already exists 
if (dbExistsTable(conn,"mtcars")) dbRemoveTable(conn,"mtcars")
# Create a blank table called "mtcars" using the columns of the mtcars data frame.
dbCreateTable(conn,"mtcars",mtcars)
# Now append the data from mtcars into the database table
dbAppendTable(conn,"mtcars",mtcars)
# Close the connection.
isclosed <- dbDisconnect(conn) # dbDisconnect() returns TRUE if successful

# Open an SQLite connection using the filename shown
conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
# List all the tables in the database
dbListTables(conn)
isclosed <- dbDisconnect(conn) # dbDisconnect() returns TRUE if successful

If the script is valuable and will be re-used, then a better style would be the following

createMTCARSTable <- function(){
  # Open an SQLite connection using the filename shown
  conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
  # First delete the table if it already exists 
  if (dbExistsTable(conn,"mtcars")) dbRemoveTable(conn,"mtcars")
  # Create a blank table called "mtcars" using the columns of the mtcars data frame.
  dbCreateTable(conn,"mtcars",mtcars)
  # Now append the data from mtcars into the database table
  dbAppendTable(conn,"mtcars",mtcars)
  # Close the connection.
  isclosed <- dbDisconnect(conn) # dbDisconnect() returns TRUE if successful
}

checkTables <- function(){
  # Open an SQLite connection using the filename shown
  conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
  # List all the tables in the database
  dbListTables(conn)
  isclosed <- dbDisconnect(conn) # dbDisconnect() returns TRUE if successful
}

testMTCARS <- function(){
  createMTCARSTable()
  checkTables()
}

This whole script can be safely executed without making any changes to the database. If we do indeed want to create the MTCARS table in a future session, we can execute the whole script and then enter testMTCARS() in the console window.

Develop Your Functions in Steps, Testing as You Go

The beauty of developing code in RStudio is that you can be examining variables and testing your code as you write. So you can build your code incrementally. As an example, let’s create a function that queries our database and returns a list of big airports. The function will accept an argument numairlines and return a list of all airports having at least that many airlines.

Start with the following code in a fresh script in your sandbox project.

library(DBI)
library(RSQLite)
library(stringr)

getBigAirports <- function(numairlines){
  query <- str_c("SELECT * FROM airports WHERE Airlines >=",numairlines)
}

testBigAirports <- function(){
  numairlines <- 30
  result <- getBigAirports(numairlines)
  print(result)
}

We create the test function at the same time as we start to develop the function of interest. Note that we create a variable name numairlines in the test function that matches the argument name of the function. That means we can initialize the argument variable and step into the function. To see what we mean, select the line numairlines <- 30 and execute (hit <ctrl>ENTER). That initializes the variable numairlines.

Now step into the getBigAirports function and select the str_c() expression.

Execute that expression to see what it will produce:

That looks like a valid SQL query so we can proceed to add more code.

library(DBI)
library(RSQLite)
library(stringr)

getBigAirports <- function(numairlines){
  query <- str_c("SELECT * FROM airports WHERE Airlines >=",numairlines)
  conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
  result <- dbGetQuery(conn,query)
  dbDisconnect(conn)
  result
}

testBigAirports <- function(){
  numairlines <- 30
  result <- getBigAirports(numairlines)
  print(result)
}

After entering this code, we can select whatever lines we want and execute those.

and we see that the result is what we expected:

Now that we see that the code within the function works as expected, we can select the entire script and execute it. That will compile all the functions. Finally, execute the command testBigAirports() from the command line to verify that the function getBigAirports() works as expected.

To recap, you can build your code incrementally, examining variables and testing your code as you write.

Tracking Down a Bug

When your project gets larger, with more complex code (functions calling functions), and the program crashes, it can be a challenge to find the source of the error. Then you must become a detective, and proceed methodically. For example, suppose we replace the test function as follows:

testBigAirports <- function(){
  numairlines <- 'aa'
  result <- getBigAirports(numairlines)
  print(result)
}

Execute this and then enter the command testBigAirports() in the console. This results in the following error:

There is no indication where the error occurred in your code. You could explore the Show Traceback feature of RStudio but in some circumstances even that is not available.

The important thing is to find what line of code caused the error to surface. Here is where I find students reading their code over and over trying to guess what line is wrong. A more reliable method is simply to insert numbered print() statements throughout the code. The bad line of code must be somewhere after the last successful print statement. Here is what your debugging code might look like:

library(DBI)
library(RSQLite)
library(stringr)

getBigAirports <- function(numairlines){
  print("Working 02")
  query <- str_c("SELECT * FROM airports WHERE Airlines >=",numairlines)
  conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
  print("Working 03")
  result <- dbGetQuery(conn,query)
  print("Working 04")
  dbDisconnect(conn)
  print("Working 05")
  result
}

testBigAirports <- function(){
  print("Working 01")
  numairlines <- 'aa'
  result <- getBigAirports(numairlines)
  print(result)
}

If you execute this, and enter the command testBigAirports() you will see the following:

We see that the program successfully printed “Working 03” but failed to print “Working 04”. We conclude that the offending line is result <- dbGetQuery(conn,query). Identifying the line of code that caused the crash is half the battle of fixing the bug. If you cannot spot an obvious spelling mistake or syntax error, then the next step would be to print out the values of critical variables just before the error. For example,

library(DBI)
library(RSQLite)
library(stringr)

getBigAirports <- function(numairlines){
  query <- str_c("SELECT * FROM airports WHERE Airlines >=",numairlines)
  conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
  print(query)
  result <- dbGetQuery(conn,query)
  dbDisconnect(conn)
  result
}

testBigAirports <- function(){
  numairlines <- 'aa'
  result <- getBigAirports(numairlines)
  print(result)
}

Execute this and enter the command testBigAirports() to see the value of the variable query just before the crash:

Now we see that the query was nonsensical (“Airlines >=aa”) and we can trace the error back to the statement numairlines <- "aa" in the test function.

This was a simple example, but the principle illustrated is to add print() statements until you locate the offending line of code.

Add Error Handling Code

If someone else is going to run your code, you can expect to get an e-mail from them someday saying something like “Your program crashed with an error ‘no such column: aa’. Can you fix the code and send it to us?” In that case you will wish you had written code like the following:

library(DBI)
library(RSQLite)
library(stringr)

getBigAirports <- function(numairlines){
  tryCatch({
    query <- str_c("SELECT * FROM airports WHERE Airlines >=",numairlines)
    conn <- dbConnect(RSQLite::SQLite(),"data/sandboxdata.db")
    result <- dbGetQuery(conn,query)
    dbDisconnect(conn)
    result
  }, warning = function(w) {
    print(str_c("getBigAirports WARNING:",w$message))
  }, error = function(e) {
    print(str_c("getBigAirports ERROR:",e$message))
    result <- NULL
    isclosed <- dbDisconnect(conn)
  }, finally = {
    result
  })
}

testBigAirports <- function(){
  numairlines <- 'aa'
  result <- getBigAirports(numairlines)
  print(result)
}

If you enter the command testBigAirports() after executing this code you will get:

Now when you get an e-mail that something is wrong with your program they will give you this message (“getBigAirports ERROR:no such column: aa”) and you will at least know which function (“getBigAirports”) experienced the error.

The tryCatch() function in R is a way to include error handling in your functions.

Summary

Many students give up on programming simply because they lack debugging skill
Many bugs are caused by typos so learn to scan a line of code character by character
Working in an IDE like RStudio means that the editor will detect many syntax errors for you
Google is your friend: use your search engine to find help
Build your project by encapsulating your code in functions
Develop your functions incrementally, checking the values of variables as you go
When your program crashes, concentrate your effort on finding the line where it crashed
Insert debugging print() statements to track down the offending line of code
Use error handling procedures in code that could fail depending on user input

Debugging in R

Peter Jackson

2021/8/28