Skip to contents

Motivation

A probability distribution is fully characterized by any one of its standard functions: the PDF ff, CDF FF, survival function S=1FS = 1 - F, quantile function Q=F1Q = F^{-1}, or hazard function h=f/Sh = f / S. In R, base distribution families provide all of these (e.g. dnorm, pnorm, qnorm), but user-defined distributions or distributions from other packages may only supply one or two.

ggfunction lets you pass whichever function type you have to whichever geom you need. For example, if you have a PDF but want to plot the CDF, you can write geom_cdf(pdf_fun = my_pdf) and the conversion happens automatically. This vignette explains which conversions are available, how they work internally, and what accuracy to expect.

Available conversions

Each probability geom accepts its native function type via fun and one or more alternate types via named parameters. The table below summarizes the routes:

Geom Native (fun) Alternates
geom_pdf() PDF cdf_fun (differentiate), survival_fun (1S1 - S then differentiate), qf_fun (interpolate then differentiate)
geom_cdf() CDF pdf_fun (integrate), survival_fun (1S1 - S), qf_fun (interpolate)
geom_survival() Survival cdf_fun (1F1 - F), pdf_fun (integrate, then 1F1 - F), qf_fun (interpolate, then 1F1 - F)
geom_qf() Quantile cdf_fun (root-find), pdf_fun (integrate, then root-find), survival_fun (1S1 - S, then root-find)
geom_hf() Hazard pdf_fun + cdf_fun (ratio, or just one), survival_fun (1S1 - S then differentiate), qf_fun (interpolate then differentiate)
geom_cdf_discrete() CDF pmf_fun (cumulative sum), survival_fun (1S1 - S)
geom_survival_discrete() Survival cdf_fun (1F1 - F), pmf_fun (cumsum, then 1F1 - F)
geom_qf_discrete() Quantile cdf_fun (invert on support), pmf_fun (cumsum, then invert), survival_fun (1S1 - S, then invert)

Exactly one function source must be provided; supplying more than one (e.g. both fun and pdf_fun) is an error.

Example: one PDF, four plots

Suppose you only have the PDF of a Gamma(2, 1) distribution:

my_pdf <- function(x, shape, rate) dgamma(x, shape = shape, rate = rate)
gamma_args <- list(shape = 2, rate = 1)

You can plot the PDF, CDF, survival function, and quantile function without ever writing a CDF or quantile function yourself:

p1 <- ggplot() +
  geom_pdf(fun = my_pdf, xlim = c(0, 8), args = gamma_args) +
  ggtitle("PDF (native)")

p2 <- ggplot() +
  geom_cdf(pdf_fun = my_pdf, xlim = c(0, 8), args = gamma_args) +
  ggtitle("CDF (from PDF)")

p3 <- ggplot() +
  geom_survival(pdf_fun = my_pdf, xlim = c(0, 8), args = gamma_args) +
  ggtitle("Survival (from PDF)")

p4 <- ggplot() +
  geom_qf(pdf_fun = my_pdf, args = gamma_args) +
  ggtitle("Quantile (from PDF)")

(p1 | p2) / (p3 | p4)

How the conversions work

Three internal helper functions perform the numerical conversions. They are not exported, but understanding them helps set expectations for accuracy and performance.

PDF to CDF: numerical integration

Given a PDF ff, the CDF is

F(x)=xf(t)dt.F(x) = \int_{-\infty}^{x} f(t)\, dt.

Internally, pdf_to_cdf() evaluates this integral at each point using R’s stats::integrate(), which applies adaptive Gauss–Kronrod quadrature:

pdf_to_cdf <- function(pdf_fun, lower = -Inf) {
  function(x) {
    vapply(x, function(xi) {
      res <- try(
        stats::integrate(pdf_fun, lower = lower, upper = xi,
                         stop.on.error = FALSE),
        silent = TRUE
      )
      if (inherits(res, "try-error")) NA_real_ else res$value
    }, numeric(1))
  }
}

Each evaluation point requires a separate call to integrate(), so this is the most computationally expensive conversion. For distributions with known finite lower bounds (e.g. the exponential), setting lower = 0 avoids integrating over an infinite domain and improves both speed and accuracy.

Accuracy. For smooth, well-behaved densities the absolute error is typically below 10610^{-6}, well within plotting resolution. Densities with singularities (e.g. the Beta(0.5, 0.5) density, which diverges at 0 and 1) may produce larger errors near the singular points.

CDF to PDF: central finite differences

Given a CDF FF, the PDF is its derivative f(x)=F(x)f(x) = F'(x). Internally, cdf_to_pdf() approximates this with a central difference:

f(x)F(x+h)F(xh)2hf(x) \approx \frac{F(x + h) - F(x - h)}{2h}

with step size h=105h = 10^{-5}:

cdf_to_pdf <- function(cdf_fun, h = 1e-5) {
  function(x) {
    (cdf_fun(x + h) - cdf_fun(x - h)) / (2 * h)
  }
}

This is fast and naturally vectorized—a single call evaluates all grid points at once.

Accuracy. The central difference has O(h2)O(h^2) truncation error, so with h=105h = 10^{-5} the error is roughly 101010^{-10} for smooth CDFs. In practice the dominant error source is floating-point cancellation when F(x+h)F(xh)F(x+h) \approx F(x-h), which occurs in the extreme tails where the CDF is nearly flat. This is harmless for plotting because the density is negligibly small in those regions.

CDF to quantile function: root-finding

Given a CDF FF, the quantile function solves F(x)=pF(x) = p for each probability p(0,1)p \in (0, 1). Internally, cdf_to_qf() uses stats::uniroot() to find the root of g(x)=F(x)pg(x) = F(x) - p:

cdf_to_qf <- function(cdf_fun, search_lower = -10, search_upper = 10) {
  function(p) {
    vapply(p, function(pi) {
      if (pi <= 0) return(-Inf)
      if (pi >= 1) return(Inf)

      lo <- search_lower; hi <- search_upper

      # Adaptively widen bounds until they bracket the target
      for (i in 1:25) {
        if (!is.na(cdf_fun(lo)) && cdf_fun(lo) <= pi) break
        lo <- lo * 2
      }
      for (i in 1:25) {
        if (!is.na(cdf_fun(hi)) && cdf_fun(hi) >= pi) break
        hi <- hi * 2
      }

      res <- try(
        stats::uniroot(function(x) cdf_fun(x) - pi,
                       lower = lo, upper = hi,
                       tol = .Machine$double.eps^0.5),
        silent = TRUE
      )
      if (inherits(res, "try-error")) NA_real_ else res$root
    }, numeric(1))
  }
}

The algorithm starts with initial search bounds [10,10][-10, 10] and adaptively doubles them until the CDF values bracket the target probability. This handles distributions with arbitrary support without requiring the user to specify bounds. The boundary cases p=0p = 0 and p=1p = 1 return -\infty and ++\infty directly.

Accuracy. uniroot() converges to machine precision (1.5×108\approx 1.5 \times 10^{-8}), so the quantile values are effectively exact. The main cost is one CDF evaluation per uniroot() iteration (typically 10–20 iterations), applied independently at each probability point.

Survival to CDF: exact arithmetic

Given a survival function S(x)=1F(x)S(x) = 1 - F(x), the CDF is recovered by simple subtraction:

F(x)=1S(x).F(x) = 1 - S(x).

Internally, survival_to_cdf() wraps this in a closure:

survival_to_cdf <- function(survival_fun) {
  function(x) 1 - survival_fun(x)
}

Accuracy. This conversion is exact (up to floating-point rounding), with errors on the order of machine epsilon (2.2×1016\approx 2.2 \times 10^{-16}).

Quantile function to CDF: interpolation

Given a quantile function Q(p)Q(p), the CDF can be recovered by inverting the relationship: evaluate QQ on a dense grid of probabilities p1,,pnp_1, \ldots, p_n to get the corresponding xx values, then interpolate. Internally, qf_to_cdf() uses stats::approxfun() with rule = 2 (constant extrapolation at boundaries):

qf_to_cdf <- function(qf_fun, n = 10000) {
  p_grid <- seq(1 / (n + 1), n / (n + 1), length.out = n)
  x_grid <- qf_fun(p_grid)
  stats::approxfun(x_grid, p_grid, rule = 2)
}

The grid avoids p=0p = 0 and p=1p = 1 (which would map to ±\pm\infty for unbounded distributions). With n=10,000n = 10{,}000 grid points, the interpolation error is typically below 10310^{-3} for smooth distributions.

Accuracy. Linear interpolation on 10,00010{,}000 points gives roughly three decimal digits of accuracy. This is more than sufficient for plotting, though downstream differentiation (e.g. to obtain a PDF) amplifies the error by one order of magnitude.

Chained conversions

Some geoms require two conversions in sequence. For example, geom_qf(pdf_fun = ...) first integrates the PDF to obtain a CDF (pdf_to_cdf), then inverts that CDF by root-finding (cdf_to_qf). Each root-finding step calls the derived CDF, which itself calls integrate(). This double nesting makes it the most expensive path, but the cost is still modest for the default n=101n = 101 evaluation points.

Similarly, geom_survival(pdf_fun = ...) integrates the PDF to get FF, then computes S(x)=1F(x)S(x) = 1 - F(x). And when geom_hf() receives only pdf_fun, it derives the CDF by integration so that it can compute h(x)=f(x)/(1F(x))h(x) = f(x) / (1 - F(x)).

The new survival_fun and qf_fun paths also produce chains: geom_pdf(survival_fun = ...) computes F=1SF = 1 - S (exact), then differentiates to get ff. geom_pdf(qf_fun = ...) interpolates QQ to get FF, then differentiates. geom_qf(survival_fun = ...) computes F=1SF = 1 - S, then inverts by root-finding.

Accuracy comparison

We can compare derived values against the exact analytic functions to see the conversion error in practice. Here we use the standard normal distribution:

# CDF derived from PDF vs exact
p1 <- ggplot() +
  geom_cdf(pdf_fun = dnorm, xlim = c(-3, 3)) +
  geom_cdf(fun = pnorm, xlim = c(-3, 3), color = "red", linetype = "dashed") +
  ggtitle("CDF: derived (black) vs exact (red)")

# PDF derived from CDF vs exact
p2 <- ggplot() +
  geom_pdf(cdf_fun = pnorm, xlim = c(-3, 3)) +
  geom_pdf(fun = dnorm, xlim = c(-3, 3), color = "red", alpha = 0) +
  ggtitle("PDF: derived (black) vs exact (red)")

# QF derived from CDF vs exact
p3 <- ggplot() +
  geom_qf(cdf_fun = pnorm) +
  geom_qf(fun = qnorm, color = "red", linetype = "dashed") +
  ggtitle("QF: derived (black) vs exact (red)")

p1 | p2 | p3

The derived curves (black) are visually indistinguishable from the exact curves (red dashed), confirming that the numerical methods are accurate enough for plotting.

Example: from a survival function

Suppose you only have the survival function of an exponential distribution:

my_surv <- function(x, rate) 1 - pexp(x, rate = rate)

p1 <- ggplot() +
  geom_cdf(survival_fun = my_surv, xlim = c(0, 5), args = list(rate = 1)) +
  ggtitle("CDF (from survival)")

p2 <- ggplot() +
  geom_pdf(survival_fun = my_surv, xlim = c(0, 5), args = list(rate = 1)) +
  ggtitle("PDF (from survival)")

p3 <- ggplot() +
  geom_qf(survival_fun = my_surv, args = list(rate = 1)) +
  ggtitle("QF (from survival)")

p1 | p2 | p3

Example: from a quantile function

Similarly, if you only have a quantile function:

p1 <- ggplot() +
  geom_cdf(qf_fun = qnorm, xlim = c(-3, 3)) +
  ggtitle("CDF (from QF)")

p2 <- ggplot() +
  geom_pdf(qf_fun = qnorm, xlim = c(-3, 3)) +
  ggtitle("PDF (from QF)")

p3 <- ggplot() +
  geom_survival(qf_fun = qnorm, xlim = c(-3, 3)) +
  ggtitle("Survival (from QF)")

p1 | p2 | p3

Performance considerations

  • Integration (pdf_to_cdf): O(n)O(n) calls to integrate(), each with O(k)O(k) function evaluations where kk depends on the integrand’s smoothness. Typically the slowest single conversion.
  • Differentiation (cdf_to_pdf): A single vectorized evaluation—two CDF calls total. Essentially free.
  • Root-finding (cdf_to_qf): O(n)O(n) calls to uniroot(), each requiring O(m)O(m) CDF evaluations (m15m \approx 15). Comparable to integration in cost.
  • Survival to CDF (survival_to_cdf): A single vectorized evaluation—one subtraction per point. Essentially free.
  • QF to CDF (qf_to_cdf): One-time cost of evaluating the quantile function on 10,000 grid points to build the interpolation table. After setup, CDF lookups are O(1)O(1) via approxfun().
  • Chained (pdf_fun to quantile): O(nm)O(n \cdot m) calls to integrate(). The most expensive path but still completes in under a second for the default n=101n = 101.

When speed matters or when working with distributions that have analytic forms for multiple function types, supplying the native function directly avoids all conversion overhead. For example, geom_cdf(fun = pnorm) is faster than geom_cdf(pdf_fun = dnorm), though both produce visually identical output.

The discrete case

Discrete geoms (geom_cdf_discrete, geom_survival_discrete, geom_qf_discrete) perform exact arithmetic rather than numerical approximation:

  • PMF to CDF: cumulative summation of the PMF values over the integer (or explicit) support. This is exact up to floating-point rounding.
  • PMF to survival: cumulative summation followed by S(x)=1F(x)S(x) = 1 - F(x). Again exact.
  • CDF to quantile (discrete): the CDF is evaluated at each support point and the step function is inverted directly. No root-finding is needed.
  • Survival to CDF (discrete): F(x)=1S(x)F(x) = 1 - S(x) on each support point. Exact.
  • Survival to quantile (discrete): F(x)=1S(x)F(x) = 1 - S(x) on each support point, then invert. Exact.

These operations introduce no approximation error, so there is no accuracy trade-off for discrete conversions.