Setup
Mixed, low-rank, and sparse multivariate regression (mixedLSR) provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. mixedLSR allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix.
Simulate Data
To demonstrate mixedLSR, we simulate a heterogeneous population where the coefficient matrix is low-rank and sparse and the number of coefficients to estimate is much larger than the sample size.
sim <- simulate_lsr(N = 100, k = 2, p = 30, m = 35)
Compute Model
Then, we compute the model. We limit the number of iterations the model can run.
model <- mixed_lsr(sim$x, sim$y, k = 2, alt_iter = 1, anneal_iter = 10, em_iter = 10, verbose = TRUE)
#> mixedLSR Start: 1
#> Selecting Lambda..................................................
#> EM Step.....
#> Simulated Annealing Step
#> Full Cycle 1
#> Computing Final Model...
#> Done!
Clustering Performance
Next, we can evaluate the clustering performance of mixedLSR by viewing a cross-tabulation of the partition labels and by computing the adjusted Rand index (ARI). In this case, mixedLSR perfectly clustered the data.
table(sim$true, model$assign)
#>
#> 1 2
#> 1 52 0
#> 2 0 48
ari <- mclust::adjustedRandIndex(sim$true, model$assign)
print(paste("ARI:",ari))
#> [1] "ARI: 1"
Coefficient Heatmaps
Lastly, we can view a heatmap of the coefficient matrices and compare them to the true simulated matrices.
plot_lsr(model$a)
plot_lsr(sim$a)
Reproducibility
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] mixedLSR_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] highr_0.10 bslib_0.4.2 compiler_4.2.2 pillar_1.8.1
#> [5] jquerylib_0.1.4 tools_4.2.2 mclust_6.0.0 digest_0.6.31
#> [9] viridisLite_0.4.1 lattice_0.20-45 jsonlite_1.8.4 evaluate_0.20
#> [13] memoise_2.0.1 lifecycle_1.0.3 tibble_3.2.0 gtable_0.3.1
#> [17] pkgconfig_2.0.3 rlang_1.0.6 Matrix_1.5-1 cli_3.6.0
#> [21] yaml_2.3.7 pkgdown_2.0.7 xfun_0.37 fastmap_1.1.1
#> [25] withr_2.5.0 stringr_1.5.0 knitr_1.42 desc_1.4.2
#> [29] fs_1.6.1 vctrs_0.5.2 sass_0.4.5 systemfonts_1.0.4
#> [33] rprojroot_2.0.3 grid_4.2.2 glue_1.6.2 R6_2.5.1
#> [37] grpreg_3.4.0 textshaping_0.3.6 fansi_1.0.4 rmarkdown_2.20
#> [41] farver_2.1.1 purrr_1.0.1 ggplot2_3.4.1 magrittr_2.0.3
#> [45] scales_1.2.1 htmltools_0.5.4 MASS_7.3-58.1 colorspace_2.1-0
#> [49] labeling_0.4.2 ragg_1.2.5 utf8_1.2.3 stringi_1.7.12
#> [53] munsell_0.5.0 cachem_1.0.7