| Title: | Genetic Algorithms in Regression |
|---|---|
| Description: | Provides a genetic algorithm framework for regression problems requiring discrete optimization over model spaces with unknown or varying dimension, where gradient-based methods and exhaustive enumeration are impractical. Uses a compact chromosome representation for tasks including spline knot placement and best-subset variable selection, with constraint-preserving crossover and mutation, exact uniform initialization under spacing constraints, steady-state replacement, and optional island-model parallelization from Lu, Lund, and Lee (2010, <doi:10.1214/09-AOAS289>). The computation is built on the 'GA' engine of Scrucca (2017, <doi:10.32614/RJ-2017-008>) and 'changepointGA' engine from Li and Lu (2024, <doi:10.48550/arXiv.2410.15571>). In challenging high-dimensional settings, 'GAReg' enables efficient search and delivers near-optimal solutions when alternative algorithms are not well-justified. |
| Authors: | Mo Li [aut, cre], QiQi Lu [aut], Robert Lund [aut], Xueheng Shi [aut] |
| Maintainer: | Mo Li <[email protected]> |
| License: | Apache License (== 2.0) |
| Version: | 0.1.2 |
| Built: | 2026-05-18 07:07:48 UTC |
| Source: | https://github.com/mli171/gareg |
cptga/cptgaisl
Convenience constructor for GA control parameters used by
changepointGA::cptga and changepointGA::cptgaisl. It merges
named overrides into engine-specific defaults
(.cptga.default or .cptgaisl.default), with light validation.
cptgaControl( ..., .list = NULL, .persist = FALSE, .env = asNamespace("GAReg"), .validate = TRUE, engine = NULL )cptgaControl( ..., .list = NULL, .persist = FALSE, .env = asNamespace("GAReg"), .validate = TRUE, engine = NULL )
... |
Named overrides for control fields (e.g., |
.list |
Optional named list of overrides (merged with |
.persist |
Logical; if |
.env |
Environment where defaults live (defaults to |
.validate |
Logical; validate values/ranges (default |
engine |
Character; one of |
Unknown names are rejected. When both ... and .list are present,
they are combined, with later entries overwriting earlier ones.
A list of class "cptgaControl".
gareg_knots, .cptga.default, .cptgaisl.default
) with Feasibility-First RestartsProduces a child chromosome from two fixed- parents (same number of
knots) by alternately sampling candidate knot locations from the parents and
enforcing the spacing constraint diff(child) > minDist. If a conflict
is encountered, the routine restarts the construction up to a small cap.
crossover_fixknots(mom, dad, prange = NULL, minDist, lmax, N)crossover_fixknots(mom, dad, prange = NULL, minDist, lmax, N)
mom, dad
|
Integer vectors encoding parent chromosomes:
first entry |
prange |
Unused placeholder (kept for compatibility with other GA
operators). Default |
minDist |
Integer; minimum spacing between adjacent knots in the child. |
lmax |
Integer; chromosome length (number of rows in the population matrix). |
N |
Integer; series length. Used to place the sentinel |
Let mom and dad be chromosomes of the form
c(m, tau_1, ..., tau_m, ...). This operator:
Initializes an empty child of size .
Picks the first knot at random from mom or dad.
For each subsequent position , considers the
pair (mom[i], dad[i]) and chooses the first value that
maintains the spacing constraint relative to the previously chosen
knot (> minDist); if both work, one is chosen at random.
If no feasible choice exists at some step, the construction restarts
from the first position (up to a small cap governed internally by
up_tol).
The result is written back as a full-length chromosome with the sentinel
N+1 in position m+2, and zeros elsewhere.
An integer vector of length lmax encoding the child chromosome:
c(m, child_knots, N+1, 0, 0, ...).
crossover_fixknots, mutation_fixknots, selectTau_uniform_exact, Popinitial_fixknots, gareg_knots
N <- 120 lmax <- 30 minDist <- 5 m <- 3 mom <- c(m, c(20, 50, 90), rep(0, lmax - 1 - m)) mom[m + 2] <- N + 1 dad <- c(m, c(18, 55, 85), rep(0, lmax - 1 - m)) dad[m + 2] <- N + 1 child <- crossover_fixknots(mom, dad, minDist = minDist, lmax = lmax, N = N) childN <- 120 lmax <- 30 minDist <- 5 m <- 3 mom <- c(m, c(20, 50, 90), rep(0, lmax - 1 - m)) mom[m + 2] <- N + 1 dad <- c(m, c(18, 55, 85), rep(0, lmax - 1 - m)) dad[m + 2] <- N + 1 child <- crossover_fixknots(mom, dad, minDist = minDist, lmax = lmax, N = N) child
Computes the False Discovery Rate (FDR) and True Positive Rate (TPR, a.k.a. recall)
by comparing a set of true labels to a set of predicted labels.
Labels are treated as positive integer indices in . Duplicates
are ignored (unique indices are used).
FDRCalc(truelabel, predlabel, N)FDRCalc(truelabel, predlabel, N)
truelabel |
Integer vector of ground-truth positive indices (values in |
predlabel |
Integer vector of predicted positive indices (values in |
N |
Integer scalar; size of the full index universe (total number of candidates). |
Let truelabel and predlabel be sets of indices. The function
derives the confusion-matrix counts:
tp =
fp =
fn =
tn =
and returns
Inputs are coerced to integer and uniqued. A warning is emitted if any label
is < 1, and an error is thrown if any label exceeds N. If tn < 0,
a warning is issued indicating that N may not reflect the full universe.
A named list with components:
fdr False Discovery Rate, (NaN if fp+tp == 0).
tpr True Positive Rate (recall), (NaN if tp+fn == 0).
fp, fn, tp, tn Confusion-matrix counts.
If predlabel is empty, fdr is NaN and tpr is 0 (unless
truelabel is also empty, in which case both fdr and tpr are NaN).
If truelabel is empty and predlabel non-empty, tpr is NaN
and fdr is 1.
# Simple example N <- 10 true <- c(2, 4, 7) pred <- c(4, 5, 7, 7) # duplicates are ignored FDRCalc(true, pred, N) # Empty predictions FDRCalc(true, integer(0), N) # All correct predictions FDRCalc(true, true, N)# Simple example N <- 10 true <- c(2, 4, 7) pred <- c(4, 5, 7, 7) # duplicates are ignored FDRCalc(true, pred, N) # Empty predictions FDRCalc(true, integer(0), N) # All correct predictions FDRCalc(true, true, N)
Computes an information criterion (BIC, AIC, or AICc) for a regression of
y on a spline basis of x when the number of interior knots is
fixed. This is designed to be used as a fitness/objective function inside a
GA search where the chromosome encodes the indices of the interior knots.
fixknotsIC( knot_bin, plen = 0, y, x, x_unique, x_base = NULL, fixedknots, degree = 3L, type = c("ppolys", "ns", "bs"), intercept = TRUE, ic_method = "BIC" )fixknotsIC( knot_bin, plen = 0, y, x, x_unique, x_base = NULL, fixedknots, degree = 3L, type = c("ppolys", "ns", "bs"), intercept = TRUE, ic_method = "BIC" )
knot_bin |
Integer vector (chromosome). Gene 1 stores m, the number of
interior knots. Genes 2:(1+m) are indices into |
plen |
Unused placeholder kept for API compatibility with other objective functions. Ignored. |
y |
Numeric response vector of length |
x |
Numeric predictor (same length as |
x_unique |
Optional numeric vector of unique candidate knot locations.
If |
x_base |
Optional matrix (or vector) of additional covariates to include linearly alongside the spline basis. If supplied, it is coerced to a matrix and column-bound to the design. |
fixedknots |
Integer |
degree |
Integer polynomial degree for |
type |
One of |
intercept |
Logical; forwarded to [splineX()]. For |
ic_method |
Character; which information criterion to return:
|
We decode the interior indices up to the sentinel length(x_unique)+1,
validate them (finite, interior, non-duplicated), sort the resulting knot
locations internally, and build the design as
X <- cbind(splineX(..., intercept=intercept), x_base).
Invalid chromosomes/inputs return Inf.
A single numeric value: the requested information criterion. Lower is
better. Returns Inf for invalid chromosomes/inputs.
[varyknotsIC()], [splineX()], bs, ns
library(MASS) y <- mcycle$accel x <- mcycle$times x_unique <- sort(unique(x)) # chromosome encoding 5 interior knot indices with sentinel: chrom <- c(5, 24, 30, 46, 49, 69, length(x_unique) + 1) fixknotsIC(chrom, y = y, x = x, x_unique = x_unique, fixedknots = 5, ic_method = "BIC" )library(MASS) y <- mcycle$accel x <- mcycle$times x_unique <- sort(unique(x)) # chromosome encoding 5 interior knot indices with sentinel: chrom <- c(5, 24, 30, 46, 49, 69, length(x_unique) + 1) fixknotsIC(chrom, y = y, x = x, x_unique = x_unique, fixedknots = 5, ic_method = "BIC" )
Runs a GA-based search for changepoints/knots and returns a compact
"gareg" S4 result that stores the backend GA fit
("cptga" or "cptgaisl") plus the essential run settings.
gareg_knots( y, x, ObjFunc = NULL, fixedknots = NULL, minDist = 3L, degree = 3L, type = c("ppolys", "ns", "bs"), intercept = TRUE, gaMethod = "cptga", cptgactrl = NULL, monitoring = FALSE, seed = NULL, ... )gareg_knots( y, x, ObjFunc = NULL, fixedknots = NULL, minDist = 3L, degree = 3L, type = c("ppolys", "ns", "bs"), intercept = TRUE, gaMethod = "cptga", cptgactrl = NULL, monitoring = FALSE, seed = NULL, ... )
y |
Numeric vector of responses (length |
x |
Optional index/time vector aligned with |
ObjFunc |
Objective function or its name. If
A custom function must accept the chromosome and needed data via named arguments (see the defaults for a template function). |
fixedknots |
|
minDist |
Integer minimum distance between adjacent changepoints.
If omitted ( |
degree |
Integer polynomial degree for |
type |
One of |
intercept |
Logical; include intercept column where applicable.
Default: |
gaMethod |
GA backend to call: function or name. Supports
|
cptgactrl |
Control list built with |
monitoring |
Logical; print short progress messages (also forwarded into the backend control). |
seed |
Optional RNG seed; also stored into the backend control. |
... |
Additional arguments passed to the GA backend. If the backend
does not accept |
Engine selection and controls.
The function detects the engine from gaMethod and constructs a
matching control via cptgaControl():
"cptga" uses .cptga.default.
"cptgaisl" uses .cptgaisl.default (supports
numIslands, maxMig, etc.).
Top-level monitoring, seed, and minDist given to
gareg_knots() take precedence over the control list.
Fix-knots operators.
When fixedknots is provided and the control does not already
override them, the following operators are injected:
Popinitial_fixknots, crossover_fixknots, mutation_fixknots.
Spline basis options.
To build spline design matrices (via splineX):
type = "ppolys": Degree- regression spline via truncated-power piecewise polynomials.
type = "ns": Degree-3 natural cubic spline with zero second-derivative at boundaries.
type = "bs": Degree- B-spline basis (unconstrained).
An object of class "gareg" with key slots:
call, method ("varyknots" or "fixknots"), N.
objFunc, gaMethod, gaFit (class "cptga" or "cptgaisl"), ctrl.
fixedknots, minDist, polydegree, type, intercept.
bestFitness, bestChrom, bestnumbsol, bestsol.
Use summary(g) to print GA settings and the best solution (extracted
from g@gaFit); show(g) prints a compact header.
Values are combined as control < core < .... That is,
cptgactrl provides defaults, then core arguments from
gareg_knots() override those, and finally any matching names in
... override both.
cptgaControl, changepointGA::cptga,
changepointGA::cptgaisl, fixknotsIC, varyknotsIC
set.seed(1) N <- 120 y <- c(rnorm(40, 0), rnorm(40, 3), rnorm(40, 0)) x <- seq_len(N) # 1) Varying-knots with single-pop GA g1 <- gareg_knots( y, x, minDist = 5, gaMethod = "cptga", cptgactrl = cptgaControl(popSize = 150, pcrossover = 0.9, maxgen = 500) ) summary(g1) # 2) Fixed knots (operators auto-injected unless overridden) g2 <- gareg_knots( y, x, fixedknots = 5, minDist = 5 ) summary(g2) # 3) Island GA with island-specific controls g3 <- gareg_knots( y, x, gaMethod = "cptgaisl", minDist = 6, cptgactrl = cptgaControl( engine = "cptgaisl", numIslands = 8, maxMig = 250, popSize = 120, pcrossover = 0.9 ) ) summary(g3)set.seed(1) N <- 120 y <- c(rnorm(40, 0), rnorm(40, 3), rnorm(40, 0)) x <- seq_len(N) # 1) Varying-knots with single-pop GA g1 <- gareg_knots( y, x, minDist = 5, gaMethod = "cptga", cptgactrl = cptgaControl(popSize = 150, pcrossover = 0.9, maxgen = 500) ) summary(g1) # 2) Fixed knots (operators auto-injected unless overridden) g2 <- gareg_knots( y, x, fixedknots = 5, minDist = 5 ) summary(g2) # 3) Island GA with island-specific controls g3 <- gareg_knots( y, x, gaMethod = "cptgaisl", minDist = 6, cptgactrl = cptgaControl( engine = "cptgaisl", numIslands = 8, maxMig = 250, popSize = 120, pcrossover = 0.9 ) ) summary(g3)
Runs a GA-based search over variable subsets using a user-specified
objective (default: subsetBIC) and returns a compact
"gareg" S4 result with method = "subset".
The engine can be ga (single population) or
gaisl (islands), selected via gaMethod.
gareg_subset( y, X, ObjFunc = NULL, gaMethod = "ga", gacontrol = NULL, monitoring = FALSE, seed = NULL, ... )gareg_subset( y, X, ObjFunc = NULL, gaMethod = "ga", gacontrol = NULL, monitoring = FALSE, seed = NULL, ... )
y |
Numeric response vector (length |
X |
Numeric matrix of candidate predictors ( |
ObjFunc |
Objective function or its name. Defaults to |
gaMethod |
GA backend to call: |
gacontrol |
Optional named list of GA engine controls (e.g., |
monitoring |
Logical; if |
seed |
Optional RNG seed (convenience alias for |
... |
Additional arguments forwarded to |
The fitness passed to GA is ObjFunc itself. Because the engine expects
a function with signature f(chrom, ...), your ObjFunc must interpret
chrom as a 0/1 mask over the columns of X. The function then computes a score
(e.g., negative BIC) using y, X, and any extra arguments supplied via ....
With the default subsetBIC, the returned value is -BIC, so we set
max = TRUE in the GA call to maximize fitness. If you switch to an objective that
returns a quantity to minimize, either negate it in your objective or change
the engine setting to max = FALSE.
Engine controls belong in gacontrol; objective-specific options belong in ....
This separation prevents accidental name collisions between GA engine parameters and
objective arguments.
An object of S4 class "gareg" (with method = "subset") containing:
call – the matched call.
N – number of observations.
objFunc – the objective function used.
gaMethod – "ga" or "gaisl".
gaFit – the GA fit object returned by GA (if your class allows it).
featureNames – column names of X (or empty).
bestFitness – best fitness value (GA::ga@fitnessValue).
bestChrom – c(m, idx): number of selected variables and their indices.
bestnumbsol – m, number of selected variables.
bestsol – vector of selected column indices in X.
if (requireNamespace("GA", quietly = TRUE)) { set.seed(1) n <- 100 p <- 12 X <- matrix(rnorm(n * p), n, p) y <- 1 + X[, 1] - 0.7 * X[, 4] + rnorm(n, sd = 0.5) # Default: subsetBIC (Gaussian – negative BIC), engine = GA::ga fit1 <- gareg_subset(y, X, gaMethod = "ga", gacontrol = list(popSize = 60, maxiter = 80, run = 40, parallel = FALSE) ) summary(fit1) # Island model: GA::gaisl fit2 <- gareg_subset(y, X, gaMethod = "gaisl", gacontrol = list(popSize = 40, maxiter = 60, numIslands = 4, parallel = FALSE) ) summary(fit2) # Logistic objective (subsetBIC handles GLM via ...): ybin <- rbinom(n, 1, plogis(0.3 + X[, 1] - 0.5 * X[, 2])) fit3 <- gareg_subset(ybin, X, gaMethod = "ga", family = stats::binomial(), # <- passed to subsetBIC via ... gacontrol = list(popSize = 60, maxiter = 80, parallel = FALSE) ) summary(fit3) }if (requireNamespace("GA", quietly = TRUE)) { set.seed(1) n <- 100 p <- 12 X <- matrix(rnorm(n * p), n, p) y <- 1 + X[, 1] - 0.7 * X[, 4] + rnorm(n, sd = 0.5) # Default: subsetBIC (Gaussian – negative BIC), engine = GA::ga fit1 <- gareg_subset(y, X, gaMethod = "ga", gacontrol = list(popSize = 60, maxiter = 80, run = 40, parallel = FALSE) ) summary(fit1) # Island model: GA::gaisl fit2 <- gareg_subset(y, X, gaMethod = "gaisl", gacontrol = list(popSize = 40, maxiter = 60, numIslands = 4, parallel = FALSE) ) summary(fit2) # Logistic objective (subsetBIC handles GLM via ...): ybin <- rbinom(n, 1, plogis(0.3 + X[, 1] - 0.5 * X[, 2])) fit3 <- gareg_subset(ybin, X, gaMethod = "ga", family = stats::binomial(), # <- passed to subsetBIC via ... gacontrol = list(popSize = 60, maxiter = 80, parallel = FALSE) ) summary(fit3) }
S4 Class for Genetic Algorithm-Based Regression
S4 container for GA-based regression/changepoint tasks. Holds the GA backend fit and a normalized summary of the best solution.
callThe matched call that created the object.
NThe effective size of the x grid used for knot search (i.e., 'length(x_unique)'), typically the number of unique 'x'.
calllanguage. The original call.
methodcharacter. One of "varyknots", "fixknots", "subset".
Nnumeric. Length of 'x_unique' used by the GA (also 'sentinel-1').
objFuncfunctionOrNULL. Objective function used.
gaMethodcharacter. GA engine name ("cptga","cptgaisl","ga","gaisl").
gaFitBackend GA fit object (union of classes from GA and changepointGA).
ctrllistOrNULL. Control list used to run the GA (if stored by caller).
fixedknotsnumericOrNULL. Fixed number of interior knots ('m') for fixed-knots mode, or NULL.
minDistnumeric. Minimum distance between adjacent changepoints.
polydegreenumericOrNULL. Spline degree for default objectives.
typecharacter. One of 'c("ppolys", "ns", "bs")' indicating piecewise polynomials, natural cubic, or B-spline.
interceptlogical. Whether the spline basis included an intercept column.
subsetSpeclistOrNULL. Constraints for subset selection (unused for knots).
featureNamescharacter. Candidate feature names (subset tasks).
bestFitnessnumeric. Best fitness value found.
bestChromnumeric. Raw best chromosome returned by the backend (may include a sentinel equal to 'N+1' and optional padding).
bestnumbsolnumeric. Count of selected elements (e.g., 'm' for knots).
bestsolnumericOrChara. For knots: the 'm' interior indices (pre-sentinel); for subset: mask/indices/names.
[gareg_knots], [cptgaControl]
gareg
show(object): Compact header with call, engine, and N.
summary(object, ...): GA settings (when available) and best solution.
## S4 method for signature 'gareg' show(object) ## S4 method for signature 'gareg' summary(object, ...)## S4 method for signature 'gareg' show(object) ## S4 method for signature 'gareg' summary(object, ...)
object |
A |
... |
Currently unused. |
Methods for displaying and summarizing 'gareg' objects
show: invisible NULL. summary: invisibly returns object.
Replaces a child with a fresh feasible sample having the same ,
drawn by selectTau_uniform_exact.
mutation_fixknots(child, p.range = NULL, minDist, Pb, lmax, mmax, N)mutation_fixknots(child, p.range = NULL, minDist, Pb, lmax, mmax, N)
child |
Current chromosome (its first entry defines |
p.range, Pb
|
Unused placeholders (kept for compatibility). |
minDist |
Integer minimum spacing. |
lmax, mmax
|
Integers; chromosome length and maximum |
N |
Integer series length. |
New feasible chromosome with the same .
The 2,000-Year Northern Hemisphere Temperature Reconstruction dataset (Moberg et al., 2005) provides annual temperature anomalies for the Northern Hemisphere from AD 1 to 1979, relative to the 1961–1990 mean. The reconstruction combines high-resolution proxy data (e.g., tree rings) with low-resolution proxies (e.g., sediments) using a wavelet-based method to capture variability across multiple time scales.
nhtr2005nhtr2005
A data frame with 2 variables:
Year AD
Temperature anomaly relative to the 1961–1990 mean
Moberg A, Sonechkin DM, Holmgren K, Datsenko NM, Karlén W. 2,000-Year Northern Hemisphere Temperature Reconstruction. IGBP PAGES/World Data Center for Paleoclimatology Data Contribution Series #2005-019. NOAA/NGDC Paleoclimatology Program, Boulder, Colorado, USA.
Moberg A, Sonechkin DM, Holmgren K, Datsenko NM, Karlén W. (2005). Highly variable Northern Hemisphere temperatures reconstructed from low- and high-resolution proxy data. *Nature*, 433(7026), 613–617.
Initializes a population matrix for the fixed-knots GA. Each column is a feasible chromosome sampled by selectTau_uniform_exact.
Popinitial_fixknots( popSize, prange = NULL, N, minDist, Pb, mmax, lmax, fixedknots )Popinitial_fixknots( popSize, prange = NULL, N, minDist, Pb, mmax, lmax, fixedknots )
popSize |
Integer; number of individuals (columns). |
prange |
Optional hyperparameter range (unused here). |
N |
Series length. |
minDist |
Integer minimum spacing between adjacent changepoints. |
Pb |
Unused placeholder (kept for compatibility). |
mmax, lmax
|
Integers; maximum number of knots and chromosome length. |
fixedknots |
Integer; number of knots to place. |
Integer matrix of size lmax x popSize; each column is a
chromosome c(m, tau_1, ..., tau_m, N+1, ...).
selectTau_uniform_exact, gareg_knots
Samples ordered changepoint indices uniformly from all feasible
configurations on 1:N subject to a minimum spacing minDist.
Encodes the result as a chromosome for downstream GA operators.
selectTau_uniform_exact(N, m, minDist, lmax)selectTau_uniform_exact(N, m, minDist, lmax)
N |
Integer series length. |
m |
Integer number of changepoints to place. |
minDist |
Integer minimum spacing between adjacent changepoints. |
lmax |
Integer chromosome length. |
Integer vector length lmax:
c(m, tau_1, ..., tau_m, N+1, 0, 0, ...).
Popinitial_fixknots, mutation_fixknots
Unified wrapper to generate spline covariates for three common cases:
type = "ppolys": Degree- regression spline via
truncated-power **piecewise polynomials** (uses internal tp_basis()).
type = "ns": Degree-3 **natural cubic spline**; enforces
at the boundary.
type = "bs": Degree- **B-spline** basis (unconstrained).
splineX( x, knots, degree = NULL, type = c("ppolys", "ns", "bs"), intercept = TRUE )splineX( x, knots, degree = NULL, type = c("ppolys", "ns", "bs"), intercept = TRUE )
x |
Numeric vector of predictor values. |
knots |
Numeric vector of interior knots. |
degree |
Integer polynomial degree for |
type |
One of |
intercept |
Logical; include intercept column where applicable. Default: 'TRUE'. |
Knots are sorted, no-duplicated, and any knots outside range(x) are
dropped with a warning. For type = "ns", degree is ignored
(natural splines are cubic).
A numeric design matrix. Attributes are attached:
"knots" — the interior knots used
"boundary" — range(x)
"degree" — effective degree (i.e., 3 for "ns")
"type" — the requested spline type
set.seed(1) x <- sort(rnorm(100)) k <- quantile(x, probs = c(.25, .5, .75)) # 1) Piecewise polynomials (degree 3) X_pp <- splineX(x, knots = k, degree = 3, type = "ppolys", intercept = TRUE) dim(X_pp) # n x ((3+1) + 3) = n x 7 # 2) Natural cubic spline (cubic, degree ignored) X_ns <- splineX(x, knots = k, type = "ns", intercept = TRUE) # 3) B-spline basis (degree 3) X_bs <- splineX(x, knots = k, degree = 3, type = "bs", intercept = TRUE) # Fit without a duplicated intercept: # fit <- lm(y ~ 0 + X_pp)set.seed(1) x <- sort(rnorm(100)) k <- quantile(x, probs = c(.25, .5, .75)) # 1) Piecewise polynomials (degree 3) X_pp <- splineX(x, knots = k, degree = 3, type = "ppolys", intercept = TRUE) dim(X_pp) # n x ((3+1) + 3) = n x 7 # 2) Natural cubic spline (cubic, degree ignored) X_ns <- splineX(x, knots = k, type = "ns", intercept = TRUE) # 3) B-spline basis (degree 3) X_bs <- splineX(x, knots = k, degree = 3, type = "bs", intercept = TRUE) # Fit without a duplicated intercept: # fit <- lm(y ~ 0 + X_pp)
Computes a BIC-like criterion for a chromosome that encodes a variable subset. The same expression
is used for all families, where:
For Gaussian with identity link, rss_like is the residual sum of squares (RSS),
computed via a fast .lm.fit.
For other GLM families, rss_like is the residual deviance
from glm.fit.
The effective parameter count includes the intercept.
subsetBIC( subset_bin, y, X, family = stats::gaussian(), weights = NULL, offset = NULL, control = stats::glm.control() )subsetBIC( subset_bin, y, X, family = stats::gaussian(), weights = NULL, offset = NULL, control = stats::glm.control() )
subset_bin |
Integer/numeric 0–1 vector (length |
y |
Numeric response vector of length |
X |
Numeric matrix of candidate predictors; columns correspond to variables. |
family |
A GLM family object (default |
weights |
Optional prior weights (passed to |
offset |
Optional offset (passed to |
control |
GLM fit controls; default |
The chromosome subset_bin is a binary vector (0/1 by column),
indicating which predictors from X are included. The design matrix
always includes an intercept. Rank-deficient selections return Inf
(which the GA maximizer treats as a very poor score). The value returned is
-BIC so that GA engines can maximize it.
A single numeric value: -BIC. Larger is better for GA maximizers.
Returns Inf for rank-deficient designs.
Evaluates an information criterion (BIC, AIC, or AICc) for a regression of
y on a spline basis of x where the number and locations of
interior knots are encoded in the chromosome. Designed for use as a GA
objective/fitness function. The spline basis is constructed via [splineX()].
varyknotsIC( knot_bin, plen = 0, y, x, x_unique, x_base = NULL, degree = 3L, type = c("ppolys", "ns", "bs"), intercept = TRUE, ic_method = "BIC" )varyknotsIC( knot_bin, plen = 0, y, x, x_unique, x_base = NULL, degree = 3L, type = c("ppolys", "ns", "bs"), intercept = TRUE, ic_method = "BIC" )
knot_bin |
Integer vector (chromosome). Gene 1 stores m, the number of
interior knots. Genes 2:(1+m) are indices into |
plen |
Unused placeholder kept for API compatibility; ignored. |
y |
Numeric response vector of length |
x |
Numeric predictor (same length as |
x_unique |
Optional numeric vector of unique candidate knot locations.
If missing or |
x_base |
Optional matrix (or vector) of additional covariates to include linearly alongside the spline basis; coerced to a matrix if supplied. |
degree |
Integer polynomial degree for |
type |
One of |
intercept |
Logical; forwarded to [splineX()]. For |
ic_method |
Which information criterion to return: |
If , the model is a pure-linear baseline using only an intercept
and x_base: X <- cbind(1, x_base) (no spline terms).
For , the spline block is built with [splineX()] using the selected
interior knots, with X <- cbind(splineX(..., intercept=intercept), x_base).
The criteria are computed as:
where is the residual sum of squares and is the
number of columns in the design matrix X.
A single numeric value: the requested information criterion (lower is
better). Returns Inf for invalid chromosomes/inputs.
This function allows (no spline terms) so that the GA can
compare against a pure-linear baseline (intercept + x_base).
Spacing constraints (e.g., minimum distance between indices) should be
enforced by the GA operators or an external penalty.
[fixknotsIC()], [splineX()], bs, ns
## Example with 'mcycle' data (MASS) # y <- mcycle$accel; x <- mcycle$times # x_unique <- sort(unique(x)) # chrom <- c(5, 24, 30, 46, 49, 69, length(x_unique) + 1) # varyknotsIC(chrom, y=y, x=x, x_unique=x_unique, # type="ppolys", degree=3, ic_method="BIC")## Example with 'mcycle' data (MASS) # y <- mcycle$accel; x <- mcycle$times # x_unique <- sort(unique(x)) # chrom <- c(5, 24, 30, 46, 49, 69, length(x_unique) + 1) # varyknotsIC(chrom, y=y, x=x, x_unique=x_unique, # type="ppolys", degree=3, ic_method="BIC")