Skip to contents

Setup

Mixed, low-rank, and sparse multivariate regression (mixedLSR) provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. mixedLSR allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix.

Simulate Data

To demonstrate mixedLSR, we simulate a heterogeneous population where the coefficient matrix is low-rank and sparse and the number of coefficients to estimate is much larger than the sample size.

sim <- simulate_lsr(N = 100, k = 2, p = 30, m = 35)

Compute Model

Then, we compute the model. We limit the number of iterations the model can run.

model <- mixed_lsr(sim$x, sim$y, k = 2, alt_iter = 1, anneal_iter = 10, em_iter = 10, verbose = TRUE)
#> mixedLSR Start: 1 
#> Selecting Lambda..................................................
#> EM Step.....
#> Simulated Annealing Step
#> Full Cycle 1 
#> Computing Final Model...
#> Done!

Clustering Performance

Next, we can evaluate the clustering performance of mixedLSR by viewing a cross-tabulation of the partition labels and by computing the adjusted Rand index (ARI). In this case, mixedLSR perfectly clustered the data.

table(sim$true, model$assign)
#>    
#>      1  2
#>   1 52  0
#>   2  0 48
ari <- mclust::adjustedRandIndex(sim$true, model$assign)
print(paste("ARI:",ari))
#> [1] "ARI: 1"

Coefficient Heatmaps

Lastly, we can view a heatmap of the coefficient matrices and compare them to the true simulated matrices.

plot_lsr(model$a)

plot_lsr(sim$a)

Reproducibility

sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mixedLSR_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] highr_0.10        bslib_0.4.2       compiler_4.2.2    pillar_1.8.1     
#>  [5] jquerylib_0.1.4   tools_4.2.2       mclust_6.0.0      digest_0.6.31    
#>  [9] viridisLite_0.4.1 lattice_0.20-45   jsonlite_1.8.4    evaluate_0.20    
#> [13] memoise_2.0.1     lifecycle_1.0.3   tibble_3.2.0      gtable_0.3.1     
#> [17] pkgconfig_2.0.3   rlang_1.0.6       Matrix_1.5-1      cli_3.6.0        
#> [21] yaml_2.3.7        pkgdown_2.0.7     xfun_0.37         fastmap_1.1.1    
#> [25] withr_2.5.0       stringr_1.5.0     knitr_1.42        desc_1.4.2       
#> [29] fs_1.6.1          vctrs_0.5.2       sass_0.4.5        systemfonts_1.0.4
#> [33] rprojroot_2.0.3   grid_4.2.2        glue_1.6.2        R6_2.5.1         
#> [37] grpreg_3.4.0      textshaping_0.3.6 fansi_1.0.4       rmarkdown_2.20   
#> [41] farver_2.1.1      purrr_1.0.1       ggplot2_3.4.1     magrittr_2.0.3   
#> [45] scales_1.2.1      htmltools_0.5.4   MASS_7.3-58.1     colorspace_2.1-0 
#> [49] labeling_0.4.2    ragg_1.2.5        utf8_1.2.3        stringi_1.7.12   
#> [53] munsell_0.5.0     cachem_1.0.7