Debugging

Learning objectives:

  • Overall: Find and fix errors.
  • Locate exactly where an error occurred with the traceback() function.
  • Pause the execution of a function and launch an environment to interactively explore what’s happening.
  • Debug non-interactively executed code.
  • Recognize other (non-error) problems that occasionally also need debugging.

Types of errors

  • Parsing errors: you are missing a closing ), } or " so the code AST cannot even be formed
    • Usually captured by IDEs such as RStudio and Positron with red flashing pointers
  • Syntax errors:
    • Parentheses are matched but one of them closes at wrong time
    • The name of a function input is misspelled
    • Passing more inputs to a function than it expects
  • Runtime errors: the code is syntactically correct, but it does a weird thing in your actual environment with your actual data

We want to subset our closures!!!

And then of course there’s this classic:

Explanation: df is a closure, in this case, the function stats::df(), the density of F-distribution. (It is not a data frame, as you thought it was, ha!) It does not have elements that you could subset.

Strategies for finding and fixing errors

Finding your bug is a process of confirming the many things that you believe are true — until you find one which is not true.

—Norm Matloff (emphasis added)

Debugging is like being the detective in a crime movie where you’re also the murderer.

-Filipe Fortes

Google first!

  • Google new error messages to help translate
  • Packages that help:

Repeatable bugs are more debuggable

  • May need to execute many times
  • Up-front investment to make a minimal {reprex} pays off

Use the scientific method to find bugs

  • Generate a hypothesis.
  • Design experiments to test that hypothesis.
  • Record your results.

Being systematic often saves time in the end.

Automated tests help

  • Add tests for nearby working code to avoid new bugs.
  • Add tests for specific broken cases.
  • Add tests for hypotheses.

Concepts of unit tests are generally associated with R package development, see chapters on testing in “Writing R Packages”.

Automated testing sometimes brings a dilemma: you add something to a working function, and the stuff that you added does work according to the new tests, but an old test gets broken. This is may not be an error per se, but it shows a gap in your knowledge of and assumptions about the code (see Norm Matloff’s quote above) and you have to deal with that one way or another.

Common syntax errors

Longer list here

  • Parenthesis mismatches

  • [[...]] vs. [...]

  • == vs. =

  • Comparing real numbers exactly using == after calculations that result in non-integers

  • You expect a single value but your code gives you a vector

    • You may need identical() or all(), or form the condition more thoroughly
  • Type coercion and dropping dimensions
inherits(mtcars[,1], "data.frame")
#> [1] FALSE
inherits(mtcars[,1, drop=FALSE], "data.frame")
#> [1] TRUE

Use traceback() to locate the error

Example: Chained functions

traceback() shows the call stack that lead to the error

Click “Show traceback”::

Read bottom to top.

traceback() is confusing with lazy evaluation

traceback() limits

In real world, traceback may look like 25 layers

  • Your code is the first 2-3 layers.
  • The last 5 layers are very clearly formatting the error and making a safe return of appropriate objects.
  • The middle 15 layers are classes and methods of dplyr, rlang, purrr and base and maybe packages you have never heard of, passing very opaque things to one another, your informatively named objects are abstracted somewhere in ..., .x and envir
  • … and you are still left wondering what’s going on.

rlang::global_handle() in .Rprofile makes traceback better

Interactive debugger

RStudio has tools for debugging

  • Click “Rerun with Debug” in error message
  • Enable Debug > On Error > Break in Code to always jump to error

Use browser() to set a break point in code

browser() can be conditional

A better practice would be to define your own debugging flags that would be turned off or disappear in production code

if (exists("my_debugging_flag")) browser()

browser() provides special commands

These commands work in the Console; RStudio also makes the toolbar buttons available.

  • Next (n): Execute the next step
  • Step into (s): Dive into function (or n)
  • Finish (f): Finish execution of the current loop/function
  • Continue (c): Continue regular execution of the function (leave interactive)
  • Stop (Q): Stop debugging, terminate the function, and return to the global workspace

browser() practical tips

  • RStudio runs its best attempt to show the code that is being executed in the “Source” pane
    • parsed AST for external functions (no comments, no indents);
    • actual source files within your packages when you are in the package project.
  • That means you can select the code and Ctrl+Enter to execute it
    • if in doubt about the source of the error, do that rather than hitting n “Next”
    • especially if the computations leading to this point in your code are costly in terms of time

Set breakpoints in RStudio for virtual browser()

Activate:

  • Click to left of line number, or
  • Press Shift + F9

Downsides:

Use options(error = recover) for interactive debugging prompt

Turn off with options(error = NULL)

debug(fn_name) to insert browser() in first line of fn_name()

  • undebug(fn_name) to remove it
  • debugonce(fn_name) to do it once (similar to rerun with debug)
  • utils::setBreakpoint("file_name", line_number)

Call stacks printed by traceback(), browser() & where, and recover() are not consistent

  • RStudio displays calls in the same order as traceback()
  • {rlang} functions use the same ordering & numbering as recover()
    • Also indent to reinforce hierarchy

Non-interactive debugging

Use callr::r() or fresh start to look for differences

callr::r(f, list(1, 2)) calls f(1, 2) in a fresh session

  • Global env
  • Packages
  • Object
  • Working directory
  • PATH environment variable
  • R_LIBS environment variable

dump.frames() is the equivalent to recover() for non-interactive code.

Debugging withing packages

Recall Chapter 8 on classed conditions:

  • you can create richer structured objects with
rlang::abort(
  message = "cli-formatted-message",
  class   = "your-special-class-to-distinguish-from-other-errors",
  ...
)
  • you can pass your situation report in ... so that error handlers and/or reporters can more effectively peek into your environment that errored
  • you can capture the class with testthat::expect_s3_class()

Debugging Rmarkdown/Quarto has challenges

  • Call rmarkdown::render("path/to/file.Rmd") instead of IDE knitting.
    • downside: the content of your current environment propagates into the markdown code
  • Use sink() for tricksy error handling

See also “Markdown test drive” in Jenny Bryan’s Happy Git With R book.

Functions can fail without errors

  • Unexpected warning: options(warn = 2) turns warnings into errors.
  • Unexpected message: Proposed solution in book no longer available.
  • Function might never return. Terminate & traceback().
  • Crashed R = bug in compiled (C, C++, etc) code.

Some useful resources on debugging