Skip to contents

This function identifies any change points in the forecast data or in the final observed data point. Change points are identified by any significant change in magnitude or direction of the slope of the time series.

Usage

plane_trend(location, input, seed, sig_lvl = 0.1)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be forecast

seed

Prepared seed

sig_lvl

The significance level at which to identify change points (between zero and one); default is 0.1

Value

A list with the following values:

  • indicator: Logical as to whether or not the any forecast data or the final observed data point are a significant change point

  • output: An n x 7 tibble. The length of the forecast plus the observed data determine the length of n. The columns are:

    • Location: A character vector with the location code

    • Index: An integer index of all observed and forecast data

    • Date: The dates corresponding to all observed and forecast data (formatted as date)

    • Value: The incidence of all observed and forecast data (e.g., hospitalization rates)

    • Type: Indicates whether the data row is observed or forecast data

    • Changepoint: Logical identifying any change point (whether in observed or forecast data). A TRUE is returned if any point is determined a change point based on the user defined significance level (sig_lvl).

    • Flagged: Logical indicating whether or not the change point was flagged. Change points are only flagged if they are in the forecast data or are the final observed data point. A TRUE is returned if the Changepoint is TRUE and is a final observed data point or any forecast point.

  • flagged_dates: The date of any flagged change point(s). If there are none, NA is returned

Details

This function uses e.divisive(), which implements a hierarchical divisive algorithm to identify change points based on distances between segments (calculated using equations 3 and 5 in Matteson and James, 2014; the larger the distance, the more likely a change point). Then a permutation test is used to calculate an approximate p-value.

The input to e.divisive() is transformed using differencing (i.e., diff(x) instead of the raw data, x). This slightly changes the way that change points are identified, as the index aligns with the gap between points rather than the points themselves. Instead of identifying a change point based on the change in size between two points, it identifies change points based on the change in the change itself. The dataframe below illustrates the difference between x and diff(x):

Indexxdiff(x)
136
290
3928
43737
5741
6750
7750

Given this data, e.divisive(x) would identify index 5 (74) as the change point, because there was a jump of +37 between index 4 and 5. But e.divisive(diff(x)) would pick both index 3 (28) and 5 (1), because there was a jump of +28 from index 2 and 3, and there was a jump of -36 between index 4 and 5.

Internally, the trend function uses an extra argument to e.divisive() for min.size = 2, which requires a gap of at least 2 points between detecting change points. This can indirectly increase the significance level or decrease the number of change points identified.

References

Matteson, D. S., & James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), 334–345. https://doi.org/10.1080/01621459.2013.849605

Matteson DS, James NA (2013). “A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.” ArXiv e-prints. To appear in the Journal of the American Statistical Association, 1306.4933.

Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.

Examples

## read in example observed data and prep observed signal
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))
tmp_hosp <-
  hosp %>%
  dplyr::select(date, location, flu.admits) %>%
  dplyr::mutate(date = as.Date(date))

prepped_observed <- to_signal(tmp_hosp, outcome = "flu.admits",
                             type = "observed", resolution = "weeks")

## read in example forecast and prep forecast signal
prepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv",
                                               package = "rplanes")) %>%
   to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)

## prepare seed with cut date
prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")

## run plane component
plane_trend(location = "05", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2)
## change location
plane_trend(location = "09", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2)
## change sig_lvl
plane_trend(location = "06", input = prepped_forecast, seed = prepped_seed, sig_lvl = .05)