| Title: | Changepoint Detection via Modified Genetic Algorithms |
|---|---|
| Description: | The Genetic Algorithm (GA) is used to perform changepoint analysis in time series data. The package also includes an extended island version of GA, as described in Lu, Lund, and Lee (2010, <doi:10.1214/09-AOAS289>). By mimicking the principles of natural selection and evolution, GA provides a powerful stochastic search technique for solving combinatorial optimization problems. In 'changepointGA', each chromosome represents a changepoint configuration, including the number and locations of changepoints, hyperparameters, and model parameters. The package employs genetic operators—selection, crossover, and mutation—to iteratively improve solutions based on the given fitness (objective) function. Key features of 'changepointGA' include encoding changepoint configurations in an integer format, enabling dynamic and simultaneous estimation of model hyperparameters, changepoint configurations, and associated parameters. The detailed algorithmic implementation can be found in the package vignettes and in the paper of Li (2024, <doi:10.48550/arXiv.2410.15571>). |
| Authors: | Mo Li [aut, cre], QiQi Lu [aut] |
| Maintainer: | Mo Li <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.5 |
| Built: | 2026-06-03 03:19:23 UTC |
| Source: | https://github.com/mli171/changepointga |
In this crossover operator designed for AMOC problem, the new child is
produced by taking the average of the changepoint locations from dad
and mom and round to an integer. Note, every chromosome has at most
one candidate changepoint location.
amoc_crossover(mom, dad, prange = NULL, minDist, lmax, N)amoc_crossover(mom, dad, prange = NULL, minDist, lmax, N)
mom |
Among two selected individuals, |
dad |
Among two selected individuals, |
prange |
The default value is |
minDist |
The minimum length between two adjacent changepoints. |
lmax |
The maximum possible length of the chromosome representation. |
N |
The length of time series. |
The child chromosome that produced from mom and dad for
next generation.
In a certain probability, the mutation genetic operator can be applied
to generate a new child. In this AMOC mutation operator, the new child
changepoint location can be down via a "jump" method. The child changepoint
location will jump minDist time units either to the left or right to
produce the changepoint location for the mutated child. The jump direction is
randomly decided with 0.5 probability.
amoc_mutation( child, prange = NULL, minDist, pchangepoint = NULL, lmax = NULL, mmax = NULL, N = NULL )amoc_mutation( child, prange = NULL, minDist, pchangepoint = NULL, lmax = NULL, mmax = NULL, N = NULL )
child |
The child chromosome resulting from the |
prange |
The default value is |
minDist |
The minimum length between two adjacent changepoints in
|
pchangepoint |
An auxiliary argument is needed for |
lmax |
An auxiliary argument is needed for |
mmax |
An auxiliary argument is needed for |
N |
An auxiliary argument is needed for |
The resulting child chromosome representation.
Randomly generate the individuals' chromosomes (changepoint configurations) to construct the first generation population for the at most one changepoint (AMOC) problem.
amoc_population(popSize, prange, N, minDist, pchangepoint, mmax, lmax)amoc_population(popSize, prange, N, minDist, pchangepoint, mmax, lmax)
popSize |
An integer represents the number of individual in each population for GA (or subpopulation for IslandGA). |
prange |
Default is |
N |
The length of time series. |
minDist |
The minimum length between two adjacent changepoints. |
pchangepoint |
The probability that a changepoint can occur. |
mmax |
The maximum possible number of changepoints in the data set. |
lmax |
The maximum possible length of the chromosome representation. |
Given the possible candidate changepoint location set, each chromosome in the first generation population can be obtained by randomly sampling one location from the candidate set. The first element of every chromosome represent the number of changepoints and the last non-zero element always equal to the length of time series plus one (N+1).
A matrix that contains each individual's chromosome.
The genetic algorithm require to select a pair of chromosomes, representing
dad and mom, for the crossover operator to
produce offspring (individual for next generation). Here, the same linear
ranking method in selection_linear_rank is used to select a pair
of chromosomes for dad and mom in the at most one changepoint
(AMOC) problem. By default, the dad has better fit/smaller fitness function
value/larger rank than mom.
amoc_selection(pop, popFit)amoc_selection(pop, popFit)
pop |
A matrix contains the chromosomes for all individuals. The number of
rows is equal to |
popFit |
A vector contains the objective function value (population fit) being associated to each individual chromosome from above. |
A list contains the chromosomes for dad and mom.
The objective function for changepoint search in Autoregressive moving average with model order selection via Bayesian Information Criterion (BIC).
arima_bic(chromosome, plen = 0, XMat, Xt)arima_bic(chromosome, plen = 0, XMat, Xt)
chromosome |
The chromosome consists of the number of changepoints, the order of AR part, the order of MA part, the changepoint locations, and a value of time series length plus 1 (N+1) indicating the end of the chromosome. |
plen |
The number of model order parameters that need to be selected.
If model order selection needs to be performed simultaneously with the
changepoint detection task, |
XMat |
A matrix contains the covariates, but not includes changepoint effects, for time series regression. |
Xt |
The simulated ARMA time series from |
The BIC value of the objective function.
Ts <- 1000 betaT <- c(0.5) # intercept XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 phiT <- c(0.5) thetaT <- NULL DeltaT <- c(2, -2) Cp.prop <- c(1 / 4, 3 / 4) CpLocT <- floor(Ts * Cp.prop) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, phi = phiT, theta = thetaT, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 ) # candidate changepoint configuration chromosome <- c(2, 250, 750, 1001) arima_bic(chromosome, XMat = XMatT, Xt = myts)Ts <- 1000 betaT <- c(0.5) # intercept XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 phiT <- c(0.5) thetaT <- NULL DeltaT <- c(2, -2) Cp.prop <- c(1 / 4, 3 / 4) CpLocT <- floor(Ts * Cp.prop) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, phi = phiT, theta = thetaT, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 ) # candidate changepoint configuration chromosome <- c(2, 250, 750, 1001) arima_bic(chromosome, XMat = XMatT, Xt = myts)
The objective function for changepoint search in Autoregressive moving average with model order selection via Bayesian Information Criterion (BIC).
arima_bic_order_pq(chromosome, plen = 2, XMat, Xt)arima_bic_order_pq(chromosome, plen = 2, XMat, Xt)
chromosome |
A vector consists of the number of changepoints, the order of AR component (refers to the number of lagged terms used to model the current value of a time series), the order of MA component (refers to the number of lagged error terms used to model the current value of a time series), the changepoint locations, and a value of time series sample size plus 1 ($N+1$) indicating the end of the chromosome. |
plen |
The number of model order parameters that need to be selected.
If model order selection needs to be performed simultaneously with the
changepoint detection task, |
XMat |
A matrix contains the covariates, but not includes changepoint effects, for time series regression. |
Xt |
The simulated ARMA time series from |
The BIC value of the objective function.
Ts <- 1000 XMatT <- matrix(1, nrow = Ts, ncol = 1) Xt <- ts_sim( Ts = Ts, beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = 0.8, Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234 ) # one chromosome representation chromosome <- c(2, 1, 1, 250, 750, 1001) arima_bic_order_pq(chromosome, plen = 2, XMat = XMatT, Xt = Xt)Ts <- 1000 XMatT <- matrix(1, nrow = Ts, ncol = 1) Xt <- ts_sim( Ts = Ts, beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = 0.8, Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234 ) # one chromosome representation chromosome <- c(2, 1, 1, 250, 750, 1001) arima_bic_order_pq(chromosome, plen = 2, XMat = XMatT, Xt = Xt)
This function is to calculate the distance metric (Shi et al., 2022) between
two changepoint configurations,
and , where 's are the
changepoint locations.
cpt_dist(tau1, tau2, N)cpt_dist(tau1, tau2, N)
tau1 |
A vector contains the changepoint locations for |
tau2 |
A vector contains the changepoint locations for |
N |
The simulated time series sample size. Two changepoint configurations
should have the same |
The pairwise distance was proposed by Shi et al. (2022),
where is the number of changepoints in configuration and
is the number of changepoints in configuration .
The term reflects the cost of matching
changepoint locations between and and can be calculated
using the linear assignment method. Details can be found in Shi et al. (2022).
Note: if one configuration doesn't contain any changepoints (valued NULL),
the distance is defined as . The function can also be used
to examine changepoint detection performance in simulation studies. Given the
true changepoint configuration, , used in generating the time
series, the calculated distance between the estimated multiple changepoint
configuration, , and
can be used to evaluate the performance of the detection algorithm.
dist |
The calculated distance. |
Shi, X., Gallagher, C., Lund, R., & Killick, R. (2022). A comparison of single and multiple changepoint techniques for time series data. Computational Statistics & Data Analysis, 170, 107433.
N <- 100 # both tau1 and tau2 has detected changepoints tau2 <- c(25, 50, 75) tau1 <- c(20, 35, 70, 80, 90) cpt_dist(tau1 = tau1, tau2 = tau2, N = N) # either tau1 or tau2 has zero detected changepoints cpt_dist(tau1 = tau1, tau2 = NULL, N = N) cpt_dist(tau1 = NULL, tau2 = tau2, N = N) # both tau1 and tau2 has zero detected changepoints cpt_dist(tau1 = NULL, tau2 = NULL, N = N)N <- 100 # both tau1 and tau2 has detected changepoints tau2 <- c(25, 50, 75) tau1 <- c(20, 35, 70, 80, 90) cpt_dist(tau1 = tau1, tau2 = tau2, N = N) # either tau1 or tau2 has zero detected changepoints cpt_dist(tau1 = tau1, tau2 = NULL, N = N) cpt_dist(tau1 = NULL, tau2 = tau2, N = N) # both tau1 and tau2 has zero detected changepoints cpt_dist(tau1 = NULL, tau2 = NULL, N = N)
Perform the genetic algorithm for changepoint detection. This involves the minimization of an objective function using a genetic algorithm (GA). The algorithm can be run sequentially or with explicit parallelization.
cptga( ObjFunc, N, prange = NULL, popSize = 200, pcrossover = 0.95, pmutation = 0.3, pchangepoint = 0.01, minDist = 1, mmax = NULL, lmax = NULL, maxgen = 50000, maxconv = 5000, option = "cp", monitoring = FALSE, parallel = FALSE, nCore = NULL, tol = 1e-05, seed = NULL, popInitialize = "random_population", suggestions = NULL, selection = "selection_linear_rank", crossover = "uniform_crossover", mutation = "mutation", ... )cptga( ObjFunc, N, prange = NULL, popSize = 200, pcrossover = 0.95, pmutation = 0.3, pchangepoint = 0.01, minDist = 1, mmax = NULL, lmax = NULL, maxgen = 50000, maxconv = 5000, option = "cp", monitoring = FALSE, parallel = FALSE, nCore = NULL, tol = 1e-05, seed = NULL, popInitialize = "random_population", suggestions = NULL, selection = "selection_linear_rank", crossover = "uniform_crossover", mutation = "mutation", ... )
ObjFunc |
The fitness function to be minimized. Users can specify any R or Rcpp
function as the fitness function, setting the input as the potential solution to
the optimization problem and returning a numerical value as the output/fitness.
Depending on the user-specified chromosome representation, the optimization task
can be changepoint detection only or changepoint detection plus model order selection,
which can be specified via the |
N |
The sample size of the time series. |
prange |
The default value is |
popSize |
An integer represents the number of individuals/chromosomes in each population. |
pcrossover |
The probability that the crossover operator will apply to the two selected parents' chromosomes to produce the offspring. The typical value is close to 1, with the default setting in this package being 0.95. |
pmutation |
The probability that the mutation operator applies on one individual chromosome. Similar to the natural mutation process, new genetic information is introduced to the offspring chromosome with a relatively small probability (close to 0), with a default value of 0.3. |
pchangepoint |
The probability that a changepoint has occurred with a default value of 0.01. User could change this probability based on domain knowledge and the time series length. This probability is used during population initialization and in the creation of new chromosomes by the mutation operator. By default, the mutation operator function generates a new individual as the mutated offspring. |
minDist |
The minimum length between two adjacent changepoints. Default value equals to one. |
mmax |
The maximum number of changepoints allowed in the time series data
corresponds to the maximum possible length of |
lmax |
The maximum possible length of the chromosome representation.
For a time series of length 1000 and we only want to detect the changepoint
( |
maxgen |
The maximum number of generations the GA can run before the search is forcibly terminated. |
maxconv |
If the overall best fitted value doesn't change after
|
option |
A character string controls the optimization task. |
monitoring |
A logical value with either |
parallel |
A logical value with either |
nCore |
An integer with the default value of |
tol |
An numerical value with the default value of |
seed |
An integer with the default value of |
popInitialize |
A function or sourced function name character string.
It should be designed for initializing a population. The default population
initialization is random initialization with some imposed constraints. See
|
suggestions |
A list object. Default value is |
selection |
A function or sourced function name character string. This
GA operator can help select |
crossover |
A function or sourced function name character string. This
GA operator can apply crossover to the chosen parents to produce child for
next generation with specified probability. The default for crossover uses
the uniform crossover method. See |
mutation |
A function or sourced function name character string. This
GA operator can apply mutation to the produced |
... |
additional arguments that will be passed to the fitness function. |
For any pre-specified time series model with a specified set of changepoint locations,
model fit is evaluated using a fitness function ,
where
denotes the full parameter vector. Here, denotes the set
of model hyperparameters, potentially including the AR or MA orders, the degree
of ARIMA differencing, or the periodic AR order for PAR models. The vector
specifies the
changepoint locations, with the number of changepoints inferred as
part of the estimation. Each individual chromosome representation is specified as a vector,
where represents the number of changepoints and is also equivalent to
the length of vector containing all the changepoint
locations. contains the integer-valued parameter for time
series model structure specification, such as AR, MA, or PAR orders.
If no model order selection is desired, then is omitted
and the GA detects changepoints only. The changepoint locations in
are encoded as interger values between 2 and ,
allowing the length of to vary dynamically with .
The value is appended at the end of to serve as a delimiter
marking the boundary of the chromosome.
Return an object class cptga-class. See
cptga-class for a more detailed description.
Ts <- 1000 XMatT <- matrix(1, nrow = Ts, ncol = 1) Xt <- ts_sim( Ts = Ts, beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = NULL, Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234 ) ## Multiple changepoint detection without model order selection # without suggestions GAres <- cptga(ObjFunc = arima_bic, N = Ts, XMat = XMatT, Xt = Xt) summary(GAres) plot(GAres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAres <- cptga(ObjFunc = arima_bic, N = Ts, suggestions = suggestions, XMat = XMatT, Xt = Xt) summary(GAres) plot(GAres, data = Xt) ## Multiple changepoint detection with model order selection prange <- list(ar = c(0, 3), ma = c(0, 3)) # without suggestions GAres <- cptga( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, option = "both", XMat = XMatT, Xt = Xt ) summary(GAres) plot(GAres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAres <- cptga( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, suggestions = suggestions, option = "both", XMat = XMatT, Xt = Xt ) summary(GAres) plot(GAres, data = Xt)Ts <- 1000 XMatT <- matrix(1, nrow = Ts, ncol = 1) Xt <- ts_sim( Ts = Ts, beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = NULL, Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234 ) ## Multiple changepoint detection without model order selection # without suggestions GAres <- cptga(ObjFunc = arima_bic, N = Ts, XMat = XMatT, Xt = Xt) summary(GAres) plot(GAres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAres <- cptga(ObjFunc = arima_bic, N = Ts, suggestions = suggestions, XMat = XMatT, Xt = Xt) summary(GAres) plot(GAres, data = Xt) ## Multiple changepoint detection with model order selection prange <- list(ar = c(0, 3), ma = c(0, 3)) # without suggestions GAres <- cptga( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, option = "both", XMat = XMatT, Xt = Xt ) summary(GAres) plot(GAres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAres <- cptga( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, suggestions = suggestions, option = "both", XMat = XMatT, Xt = Xt ) summary(GAres) plot(GAres, data = Xt)
S4 Class for Genetic Algorithm-Based Changepoint Detection
## S4 method for signature 'cptga' summary(object, ...)## S4 method for signature 'cptga' summary(object, ...)
object |
An object of class |
... |
Additional arguments (ignored). |
An object of class cptga stores results and configuration settings
for changepoint detection using a Genetic Algorithm (GA), optionally with
simultaneous model order selection. This class records GA control parameters,
intermediate population structures, and the optimal solution found.
An object of class cptga.
callThe matched call that created the object.
NThe sample size of the time series.
prangeA list object. Default is NULL. If specified, it contains the
ranges for each model order parameter (integers). Required when option = "both"
is used for joint changepoint and model selection.
popSizeAn integer representing the number of individuals in each GA population.
pcrossoverThe probability that the crossover operator is applied to two chromosomes.
pmutationThe probability that the mutation operator is applied to a chromosome.
pchangepointThe prior probability that a changepoint has occurred at each location.
minDistThe minimum allowed distance between two adjacent changepoints.
mmaxThe maximum possible number of changepoints. Typically set based on time series length and option.
lmaxThe maximum length of the chromosome. Typically set based on time series length and option.
maxgenThe maximum number of generations the GA is allowed to run.
maxconvIf the optimal fitness value does not improve over this many generations, GA stops.
optionA character string: either "cp" for changepoint detection only, or "both" for changepoint detection and model order selection.
monitoringLogical. If TRUE, prints intermediate GA progress.
parallelLogical. If TRUE, enables parallel computation for fitness evaluation.
nCoreInteger or NULL. Number of cores used for parallel computation when parallel = TRUE.
tolNumeric. Tolerance for determining GA convergence. Default is 1e-5.
seedAn integer or NULL. Random seed for reproducibility.
suggestionsA list or NULL. Each element provides suggested changepoint locations
to guide initial population design and potentially accelerate convergence.
populationA matrix where each row represents an individual chromosome in the current population.
fitnessA numeric vector containing the fitness values of individuals in the current generation.
overbestchromA vector representing the best chromosome found over all generations.
overbestfitA numeric scalar. The best (smallest) fitness value achieved.
bestfitA numeric vector recording the best fitness value in each generation.
countA numeric value indicating the number of generations the GA actually ran.
convgA numeric vector representing convergence information. A value of 0 indicates the algorithm successful completion. A value of 1 indicates the the total number of generations exceeds the pre-specified maxgen limit.
cptga, cptga-class, random_population, selection_linear_rank, uniform_crossover, mutation.
Perform the modified island-based genetic algorithm (IslandGA) for multiple changepoint detection. Minimization of an objective function using genetic algorithm (GA). The algorithm can be run sequentially or in explicit parallelisation.
cptgaisl( ObjFunc, N, prange = NULL, popSize = 200, numIslands = 5, pcrossover = 0.95, pmutation = 0.3, pchangepoint = 0.01, minDist = 1, mmax = NULL, lmax = NULL, maxMig = 1000, maxgen = 50, maxconv = 100, option = "cp", monitoring = FALSE, parallel = FALSE, nCore = NULL, tol = 1e-05, seed = NULL, popInitialize = "random_population", suggestions = NULL, selection = "selection_linear_rank", crossover = "uniform_crossover", mutation = "mutation", ... )cptgaisl( ObjFunc, N, prange = NULL, popSize = 200, numIslands = 5, pcrossover = 0.95, pmutation = 0.3, pchangepoint = 0.01, minDist = 1, mmax = NULL, lmax = NULL, maxMig = 1000, maxgen = 50, maxconv = 100, option = "cp", monitoring = FALSE, parallel = FALSE, nCore = NULL, tol = 1e-05, seed = NULL, popInitialize = "random_population", suggestions = NULL, selection = "selection_linear_rank", crossover = "uniform_crossover", mutation = "mutation", ... )
ObjFunc |
The fitness function to be minimized. Users can specify any R or Rcpp
function as the fitness function, setting the input as the potential solution to
the optimization problem and returning a numerical value as the output/fitness.
Depending on the user-specified chromosome representation, the optimization task
can be changepoint detection only or changepoint detection plus model order selection,
which can be specified via the |
N |
The sample size of the time series. |
prange |
The default value is |
popSize |
An integer representing the total number of individuals in each generation, which
equal to the number of islands multiplied by the size of each island (i.e.,
|
numIslands |
An integer representing the number of islands (sub-populations). |
pcrossover |
The probability that the crossover operator will apply to the two selected parents' chromosomes to produce the offspring. The typical value is close to 1, with the default setting in this package being 0.95. |
pmutation |
The probability that the mutation operator applies on one individual chromosome. Similar to the natural mutation process, new genetic information is introduced to the offspring chromosome with a relatively small probability (close to 0), with a default value of 0.3. |
pchangepoint |
The probability that a changepoint has occurred. User could change this probability based on domain knowledge and the time series length. This probability is used during population initialization and in the creation of new chromosomes by the mutation operator. By default, the mutation operator function generates a new individual as the mutated offspring. |
minDist |
The minimum length between two adjacent changepoints. Default value equals to one. |
mmax |
The maximum number of changepoints allowed in the time series data
corresponds to the maximum possible length of |
lmax |
The maximum possible length of the chromosome representation.
For a time series of length 1000 and we only want to detect the changepoint
( |
maxMig |
An integer indicates the maximum number of migrations. After
conducting |
maxgen |
An integer indicates the maximum number of generations that each
island (subpopulation) undergoes before migration. It also determines the
requency of migration. The migration will occur after |
maxconv |
An integer value is also used for algorithm termination. If
the overall best-fitted value remains unchanged after |
option |
A character string controls the optimization task. |
monitoring |
A logical value with either |
parallel |
A logical value with |
nCore |
An integer indicates the number of cores utilized in parallel
computing. For the island model GA, the number of cores used in parallel is
set to the |
tol |
An numerical value with the default value of |
seed |
An integer with the default value of |
popInitialize |
A function or sourced function name character string.
It should be designed for initializing a population. The default population
initialization is random initialization with some imposed constraints. See
|
suggestions |
A list object. Default value is |
selection |
A function or sourced function name character string. This
GA operator can help select |
crossover |
A function or sourced function name character string. This
GA operator can apply crossover to the chosen parents to produce child for
next generation with specified probability. The default for crossover uses
the uniform crossover method. See |
mutation |
A function or sourced function name character string. This
GA operator can apply mutation to the produced |
... |
additional arguments that will be passed to the fitness function. |
For any pre-specified time series model with a specified set of changepoint locations,
model fit is evaluated using a fitness function ,
where
denotes the full parameter vector. Here, denotes the set
of model hyperparameters, potentially including the AR or MA orders, the degree
of ARIMA differencing, or the periodic AR order for PAR models. The vector
specifies the
changepoint locations, with the number of changepoints inferred as
part of the estimation. Each individual chromosome representation is specified as a vector,
where represents the number of changepoints and is also equivalent to
the length of vector containing all the changepoint
locations. contains the integer-valued parameter for time
series model structure specification, such as AR, MA, or PAR orders.
If no model order selection is desired, then is omitted
and the GA detects changepoints only. The changepoint locations in
are encoded as interger values between 2 and ,
allowing the length of to vary dynamically with .
The value is appended at the end of to serve as a delimiter
marking the boundary of the chromosome.
Return an object class cptgaisl-class. See
cptgaisl-class for a more detailed description.
Ts <- 1000 XMatT <- matrix(1, nrow = Ts, ncol = 1) Xt <- ts_sim( Ts = Ts, beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = NULL, Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234 ) ## Multiple changepoint detection without model order selection # without suggestions GAISLres <- cptgaisl(ObjFunc = arima_bic, N = Ts, XMat = XMatT, Xt = Xt) summary(GAISLres) plot(GAISLres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAISLres <- cptgaisl(ObjFunc = arima_bic, N = Ts, suggestions = suggestions, XMat = XMatT, Xt = Xt) summary(GAISLres) plot(GAISLres, data = Xt) ## Multiple changepoint detection with model order selection prange <- list(ar = c(0, 3), ma = c(0, 3)) # without suggestions GAISLres <- cptgaisl( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, option = "both", XMat = XMatT, Xt = Xt ) summary(GAISLres) plot(GAISLres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAISLres <- cptgaisl( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, suggestions = suggestions, option = "both", XMat = XMatT, Xt = Xt ) summary(GAISLres) plot(GAISLres, data = Xt)Ts <- 1000 XMatT <- matrix(1, nrow = Ts, ncol = 1) Xt <- ts_sim( Ts = Ts, beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = NULL, Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234 ) ## Multiple changepoint detection without model order selection # without suggestions GAISLres <- cptgaisl(ObjFunc = arima_bic, N = Ts, XMat = XMatT, Xt = Xt) summary(GAISLres) plot(GAISLres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAISLres <- cptgaisl(ObjFunc = arima_bic, N = Ts, suggestions = suggestions, XMat = XMatT, Xt = Xt) summary(GAISLres) plot(GAISLres, data = Xt) ## Multiple changepoint detection with model order selection prange <- list(ar = c(0, 3), ma = c(0, 3)) # without suggestions GAISLres <- cptgaisl( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, option = "both", XMat = XMatT, Xt = Xt ) summary(GAISLres) plot(GAISLres, data = Xt) # with suggestions suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750)) GAISLres <- cptgaisl( ObjFunc = arima_bic_order_pq, N = Ts, prange = prange, suggestions = suggestions, option = "both", XMat = XMatT, Xt = Xt ) summary(GAISLres) plot(GAISLres, data = Xt)
S4 Class for Island Model Genetic Algorithm-Based Changepoint Detection
## S4 method for signature 'cptgaisl' summary(object, ...)## S4 method for signature 'cptgaisl' summary(object, ...)
object |
An object of class |
... |
Additional arguments (ignored). |
This class stores results and settings for the island model genetic algorithm ('cptgaisl') used in multiple changepoint detection and optional model order selection.
An object of class cptgaisl
callThe matched call that created the object.
NThe sample size of the time series.
prangeA list or NULL. Ranges for each model order parameter when option = "both".
popSizeInteger. It represents the total number of individuals in each generation, which equal to the number of islands multiplied by the size of each island (i.e., popSize = numIslands × Islandsize).
numIslandsInteger. The number of islands (sub-populations).
IslandsizeNumerical value. The number of individuals in each island.
pcrossoverProbability of crossover operation.
pmutationProbability of mutation operation.
pchangepointPrior probability of changepoint occurrence.
minDistMinimum distance between adjacent changepoints.
mmaxMaximum number of changepoints allowed.
lmaxMaximum length of chromosome.
maxMigMaximum number of migrations allowed.
maxgenMaximum number of generations per island before migration.
maxconvNumber of migrations with no improvement before stopping.
optionEither "cp" or "both".
monitoringLogical. If TRUE, prints intermediate output.
parallelLogical. Whether parallel computation is used.
nCoreInteger or NULL. Number of cores for parallelization.
tolTolerance threshold for fitness improvement.
seedInteger or NULL. Seed for reproducibility.
suggestionsA list or NULL. Suggested changepoint configurations.
IslandA 3D array storing all individual chromosomes in current generation across islands. Dimensions are lmax × Islandsize × numIslands, representing chromosome length, individuals per island, and number of islands, respectively.
IslandFitA matrix of fitness values in current generation with dimensions Islandsize × numIslands, where each column corresponds to one island's population.
overbestchromA vector. The best chromosome ever found.
overbestfitNumeric. The best fitness score obtained.
bestfitNumeric vector recording best fitness per migration.
countMigInteger vector tracking number of migrations.
countInteger vector tracking total generations.
convgInteger vector for convergence diagnostics.
cptgaisl, cptgaisl-class, random_population, selection_linear_rank, uniform_crossover, mutation.
In a certain probability, the mutation genetic operator can be applied
to generate a new child. By default, the new child selection can be
down by the similar individual selection method in population initialization,
select_tau.
mutation(child, prange = NULL, minDist, pchangepoint, lmax, mmax, N)mutation(child, prange = NULL, minDist, pchangepoint, lmax, mmax, N)
child |
The child chromosome resulting from the |
prange |
Default is |
minDist |
The required minimum distance between two adjacent changepoints. |
pchangepoint |
The probability of changepoints for every time series. |
lmax |
The user specified maximum number of changepoints, by default,
as |
mmax |
The user specified maximum length of individual chromosome,
by default, as |
N |
The sample size of the time series. |
A function can apply mutation to the produced child with the specified
probability pmutation in cptga and
cptgaisl. If order selection is not requested
(option = "cp" in cptga and cptgaisl), the default
mutation operator function uses select_tau to select
a completely new individual with a new chromosome as the mutated child.
For details, see select_tau. If order selection is needed
(option = "both" in cptga and cptgaisl), we first
decide whether to keep the produced child's model order with a probability
of 0.5. If the child's model order is retained, the select_tau
function is used to select a completely new individual with a new chromosome
as the mutated child. If a new model order is selected from the candidate
model order set, there is a 0.5 probability to either select a completely new
individual with new changepoint locations or retain the original child's
changepoint locations for the mutated child. Note that the current model
orders in the child's chromosome are excluded from the set to avoid redundant
objective function evaluation. Finally, the function returns a vector
containing the modified chromosomes for mutated child.
The resulting child chromosome representation.
This function visualizes a univariate time series along with the changepoints identified by a basic genetic algorithm, as represented by a 'cptga' object. Vertical dashed lines mark changepoint locations, and segment means are shown as horizontal dashed lines. The optimal fitness value and changepoint locations are displayed as margin text.
## S3 method for class 'cptga' plot( x, data, show_segmean = TRUE, main = NULL, XTickLab = NULL, XTickPos = NULL, XAxisLab = "Time", YAxisLab = "Data", cex.lab = 1.3, cex.axis = 1.3, cex.main = 1.3, lwd = 2, ... )## S3 method for class 'cptga' plot( x, data, show_segmean = TRUE, main = NULL, XTickLab = NULL, XTickPos = NULL, XAxisLab = "Time", YAxisLab = "Data", cex.lab = 1.3, cex.axis = 1.3, cex.main = 1.3, lwd = 2, ... )
x |
An object of class |
data |
A numeric vector representing the observed univariate time series. |
show_segmean |
Binary, whether to include the segments' means. |
main |
Optional main title for the plot. |
XTickLab |
Optional vector (e.g., numeric or date) for custom x-axis labels.
Must be the same length as |
XTickPos |
Optional vector specifying which elements of |
XAxisLab |
Optional label for the x-axis. Default is |
YAxisLab |
Optional label for the y-axis. Default is |
cex.lab |
Text size for axis labels and margin text. Default is |
cex.axis |
Text size for axis tick labels. Default is |
cex.main |
Text size for the main title. Default is |
lwd |
Line width for vertical and horizontal dashed lines. Default is |
... |
Additional graphical parameters passed to |
If XTickLab is supplied and matches the length of data, it is used for
the x-axis; otherwise, the default sequence 1:length(data) is used.
If the genetic algorithm was run with option = "both", the function skips hyperparameters
in the chromosome when extracting changepoint positions.
The plot displays vertical dashed lines at changepoint locations and horizontal dashed lines for the mean of each segment. Fitness and changepoint summaries are shown above the plot.
This function is called for its side effects and returns NULL invisibly.
This function visualizes a univariate time series along with the changepoints identified by a island model genetic algorithm, as represented by a 'cptgaisl' object. Vertical dashed lines mark changepoint locations, and segment means are shown as horizontal dashed lines. The optimal fitness value and changepoint locations are displayed as margin text.
## S3 method for class 'cptgaisl' plot( x, data, show_segmean = TRUE, main = NULL, XTickLab = NULL, XTickPos = NULL, XAxisLab = "Time", YAxisLab = "Data", cex.lab = 1.3, cex.axis = 1.3, cex.main = 1.3, lwd = 2, ... )## S3 method for class 'cptgaisl' plot( x, data, show_segmean = TRUE, main = NULL, XTickLab = NULL, XTickPos = NULL, XAxisLab = "Time", YAxisLab = "Data", cex.lab = 1.3, cex.axis = 1.3, cex.main = 1.3, lwd = 2, ... )
x |
An object of class |
data |
A numeric vector representing the observed univariate time series. |
show_segmean |
Binary, whether to include the segments' means. |
main |
Optional main title for the plot. |
XTickLab |
Optional vector (e.g., numeric or date) for custom x-axis labels.
Must be the same length as |
XTickPos |
Optional vector specifying which elements of |
XAxisLab |
Optional label for the x-axis. Default is |
YAxisLab |
Optional label for the y-axis. Default is |
cex.lab |
Text size for axis labels and margin text. Default is |
cex.axis |
Text size for axis tick labels. Default is |
cex.main |
Text size for the main title. Default is |
lwd |
Line width for vertical and horizontal dashed lines. Default is |
... |
Additional graphical parameters passed to |
If XTickLab is supplied and matches the length of data, it is used for
the x-axis; otherwise, the default sequence 1:length(data) is used.
If the genetic algorithm was run with option = "both", the function skips hyperparameters
in the chromosome when extracting changepoint positions.
The plot displays vertical dashed lines at changepoint locations and horizontal dashed lines for the mean of each segment. Fitness and changepoint summaries are shown above the plot.
This function is called for its side effects and returns NULL invisibly.
summary, print.summary.cptgaisl
Displays key information about the settings and results from a changepoint detection procedure using the Genetic Algorithm (GA) stored in a 'cptga' object. This includes the algorithm configuration, population settings, optimization mode, and final solution such as the number and location of changepoints and model parameters (if applicable).
## S3 method for class 'summary.cptga' print(x, digits = getOption("digits"), max_display = 5, ...)## S3 method for class 'summary.cptga' print(x, digits = getOption("digits"), max_display = 5, ...)
x |
An object of class |
digits |
Number of digits to print for probabilities and fitness. Default taken from |
max_display |
Maximum number of suggested solutions to display if suggestions are provided. |
... |
Additional arguments (currently not used). |
When the GA is run in option = "cp" mode, only changepoint locations are shown.
If option = "both", the output includes the selected model hyperparameters along
with changepoint locations.
The function uses plain text output to print a formatted summary to the console. If
x@suggestions is provided, only up to max_display suggestions will be shown.
Invisibly returns NULL. Called for its side effect of printing to the console.
Displays key information about the settings and results from a changepoint detection procedure using the Island model Genetic Algorithm (IMGA) stored in a 'cptgaisl' object. This includes the algorithm configuration, population settings, optimization mode, and final solution such as the number and location of changepoints and model parameters (if applicable).
## S3 method for class 'summary.cptgaisl' print(x, digits = getOption("digits"), max_display = 5, ...)## S3 method for class 'summary.cptgaisl' print(x, digits = getOption("digits"), max_display = 5, ...)
x |
An object of class |
digits |
Number of digits to print for probabilities and fitness. Default taken from |
max_display |
Maximum number of suggested solutions to display if suggestions are provided. |
... |
Additional arguments (currently not used). |
When the GA is run in option = "cp" mode, only changepoint locations are shown.
If option = "both", the output includes the selected model hyperparameters along
with changepoint locations.
The function uses plain text output to print a formatted summary to the console. If
x@suggestions is provided, only up to max_display suggestions will be shown.
Invisibly returns NULL. Called for its side effect of printing to the console.
cptgaisl, summary, plot.cptgaisl
Randomly generate the individuals' chromosomes (changepoint confirgurations) to construct the first generation population.
random_population(popSize, prange, N, minDist, pchangepoint, mmax, lmax)random_population(popSize, prange, N, minDist, pchangepoint, mmax, lmax)
popSize |
An integer represents the number of individual in each population for GA (or subpopulation for IslandGA). |
prange |
Default is |
N |
The length of time series. |
minDist |
The minimum length between two adjacent changepoints. |
pchangepoint |
Same as |
mmax |
The maximum possible number of changepoints in the data set. |
lmax |
The maximum possible length of the chromosome representation. |
Each population can be stored in a matrix with lmax rows and
popSize columns, where each column represents an individual chromosome
in the format of
in cptga or cptgaisl. This function can randomly
initialize the population matrix with some imposed constraints, such as
ensuring the number of time points between two adjacent changepoints is
greater than or equal to minDist. This prevents unrealistic scenario
of two changepoints being too close to each other and helps reduce the total
number of admissible solutions in the search space. Users can adjust the
level of searching space reduction by setting an appropriate value for
minDist. During population initialization, each changepoint location
in can be selected sequentially. With a specified
probability pchangepoint denoting the probability of a time point
being selected as a changepoint, the first changepoint is
randomly picked between and . Then
is randomly selected from a smaller range between
and . The process is continued until the last admissible changepoint
is exceeded, and the number of changepoints is
obtained automatically. For added flexibility, users can specify their own
population initialization function. The default population initialization uses
select_tau to select the chromosome for the first generation
population.
A matrix that contains each individual's chromosome.
Randomly select the changepoint configuration for population initialization. The selected changepoint configuration represents a changepoint chromosome. The first element of the chromosome represent the number of changepoints and the last non-zero element always equal to the length of time series + 1.
select_tau(N, prange, minDist, pchangepoint, mmax, lmax)select_tau(N, prange, minDist, pchangepoint, mmax, lmax)
N |
The length of time series. |
prange |
A list object containing the possible range for other pre-defined model parameters, i.e. AR/MA order of ARMA models. |
minDist |
The minimum length between two adjacent changepoints. |
pchangepoint |
Same as |
mmax |
The maximum possible number of changepoints in the data set. |
lmax |
The maximum possible length of the chromosome representation. |
A single changepoint configuration format as above.
The genetic algorithm require to select a pair of chromosomes, representing
dad and mom, for the crossover operator to
produce offspring (individual for next generation). The parents chromosomes
are randomly selectd from the initialized population by a linear ranking
method according to each individual's fittness in the input argument
popFit. By default, the dad has better fit/smaller fitness function
value/larger rank than mom.
selection_linear_rank(pop, popFit)selection_linear_rank(pop, popFit)
pop |
A matrix contains the chromosomes for all individuals. The number of
rows is equal to |
popFit |
A vector contains the objective function value (population fit) being associated to each individual chromosome from above. |
A list contains the chromosomes for dad and mom.
This is a function to simulate time series with changepoint effects. See details below.
ts_sim( Ts, beta, XMat, sigma, phi = NULL, theta = NULL, d = 0, Delta = NULL, CpLoc = NULL, seed = NULL )ts_sim( Ts, beta, XMat, sigma, phi = NULL, theta = NULL, d = 0, Delta = NULL, CpLoc = NULL, seed = NULL )
Ts |
A integer indicating simulated time series length (sample size). |
beta |
A parameter vector contains other mean function parameters without changepoint parameters. |
XMat |
The covairates for time series mean function without changepoint indicators. |
sigma |
The standard deviation for time series residuals |
phi |
A vector for the autoregressive (AR) parameters for AR part. |
theta |
A vector for the moving average (MA) parameters for MA part. |
d |
A nonnegative integer for the order of differencing in the ARIMA model. Default is 0, which reduces to an ARMA model. |
Delta |
The parameter vector contains the changepoint parameters for time series mean function. |
CpLoc |
A vector contains the changepoint locations range from |
seed |
The random seed for simulation reproducibility. |
The simulated time series is generated from
The error process can be:
IID Gaussian noise when phi = NULL, theta = NULL,
and d = 0.
An ARMA(p,q) process when d = 0 and at least one of
phi or theta is specified.
An ARIMA(p,d,q) process when d > 0.
The mean structure is specified through XMat and
beta, and changepoint effects can be introduced as
where are changepoint
locations and are changepoint magnitudes.
The simulated time series with attributes:
DesignX: The covariates including changepoint indicators.
mu: The mean vector for the simulated time series.
theta: The true parameter vector including changepoint effects.
CpEff: The true changepoint magnitudes.
CpLoc: The true changepoint locations.
arEff: The AR coefficients.
maEff: The MA coefficients.
diffEff: The differencing order.
seed: The random seed used for simulation.
##### M1: Time series observations are IID Ts <- 1000 betaT <- c(0.5) XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 DeltaT <- c(2, -2) Cpprop <- c(1 / 4, 3 / 4) CpLocT <- floor(Ts * Cpprop) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 ) ##### M2: ARMA(2,1) model with constant mean Ts <- 1000 betaT <- c(0.5) XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 phiT <- c(0.5, -0.5) thetaT <- c(0.8) DeltaT <- c(2, -2) CpLocT <- floor(Ts * c(1 / 4, 3 / 4)) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, phi = phiT, theta = thetaT, d = 0, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 ) ##### M3: ARIMA(1,1,1) model with changepoint effects Ts <- 1000 betaT <- c(0.5) XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 phiT <- c(0.5) thetaT <- c(-0.3) DeltaT <- c(2, -2) CpLocT <- floor(Ts * c(1 / 4, 3 / 4)) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, phi = phiT, theta = thetaT, d = 1, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 )##### M1: Time series observations are IID Ts <- 1000 betaT <- c(0.5) XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 DeltaT <- c(2, -2) Cpprop <- c(1 / 4, 3 / 4) CpLocT <- floor(Ts * Cpprop) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 ) ##### M2: ARMA(2,1) model with constant mean Ts <- 1000 betaT <- c(0.5) XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 phiT <- c(0.5, -0.5) thetaT <- c(0.8) DeltaT <- c(2, -2) CpLocT <- floor(Ts * c(1 / 4, 3 / 4)) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, phi = phiT, theta = thetaT, d = 0, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 ) ##### M3: ARIMA(1,1,1) model with changepoint effects Ts <- 1000 betaT <- c(0.5) XMatT <- matrix(1, nrow = Ts, ncol = 1) colnames(XMatT) <- "intercept" sigmaT <- 1 phiT <- c(0.5) thetaT <- c(-0.3) DeltaT <- c(2, -2) CpLocT <- floor(Ts * c(1 / 4, 3 / 4)) myts <- ts_sim( Ts = Ts, beta = betaT, XMat = XMatT, sigma = sigmaT, phi = phiT, theta = thetaT, d = 1, Delta = DeltaT, CpLoc = CpLocT, seed = 1234 )
In uniform crossover, typically, each bit is chosen from either parent with
equal probability. Other mixing ratios are sometimes used, resulting in
offspring which inherit more genetic information from one parent than the
other. In a uniform crossover, we don’t divide the chromosome into segments,
rather we treat each gene separately. In this, we essentially flip a coin
for each chromosome to decide whether or not it will be included in the
off-spring. If model order selection is requested, each child's model order
has the equal probability (0.5) from dad and mom.
uniform_crossover(mom, dad, prange, minDist, lmax, N)uniform_crossover(mom, dad, prange, minDist, lmax, N)
mom |
Among two selected individuals, |
dad |
Among two selected individuals, |
prange |
Default value is |
minDist |
The required minimum distance between two adjacent changepoints. |
lmax |
The user specified maximum number of changepoints, by default,
as |
N |
The length of time series. |
The child chromosome that produced from mom and dad for
next generation.