This repository contains the implementation of the Group Fused D-trace LASSO (GFDtL) estimator for identifying sparse precision matrices with breaks in time-series data. The estimator is obtained by solving the following optimization problem:
where
For further details on the problem and the algorithm, please refer to our paper:
@article{LPPT24GFDtL,
title={Break recovery in graphical networks with {D}-trace loss},
author={Ying Lin and Benjamin Poignard and Ting Kei Pong and Akiko Takeda},
year={2024},
eprint={2410.04057},
archivePrefix={arXiv},
primaryClass={math.ST},
url={https://arxiv.org/abs/2410.04057},
}
GFDtL
is licensed under the MIT License (see LICENSE).
-
Clone this repository.
git clone --recursive https://github.com/linyopt/GFDtL.git cd GFDtL
-
Use Matlab with version higher than R2023a to run the files with name following the format
runcode_*.m
.Note: The implementation relies on functions pagemtimes and pageeig that are introduced in R2020a and R2023a, respectively, and function argument validation that is introduced in R2019b.
-
Normally, you should be able to run the code directly if your matlab version is higher than R2023a.
If you find there are some dependence errors, please check requirements of three libraries in
GFGL/libs
. The folderGFGL
is for the competing algorithm in Regularized Estimation of Piecewise Constant Gaussian Graphical Models: The Group-Fused Graphical Lasso, which depends on several toolboxes. Although we have tried our best to remove most of dependence, there may still be some omissions.
- The folder
@GFDtL
is the implementation of our algorithm, which is organized as a class. Interested readers can read@GFDtL/GFDtL.m
for the declaration of properties/functions of the class, and read@GFDtL/run.m
for the main content of our ADMM algorithm. - The folder
simu
is the code for simulating data. Three main functions aresimu/DGP_Erdos_Renyi.m
,simu/DGP_iid.m
andsimu/DGP_banded.m
for generating Erdos Renyi dataset, iid dataset and banded dataset, respectively. For more details, please read Section 6.1 in our paper. - The folder
utils
contains some helper functions. The two fileshanning.m
andperform_thresholding.m
are used to remove the toolbox dependence of the competing algorithm. The folderboundedline
is used for plotting line with shaded error/confidence bounds, the credit goes for kakearney/boundedline-pkg. Please callhelp
for more details of these functions. - There are six runcode files:
runcode_simu_er.m
: The runcode for experiments on simulated datasets simulated according to Setting (i) in Section 6.1 of the paper.runcode_simu_banded.m
: The runcode for experiments on simulated datasets simulated according to Setting (ii) in Section 6.1 of the paper.runcode_simu_iid.m
: The runcode for experiments on simulated datasets simulated according to Setting (iii) in Section 6.1 of the paper.runcode_sensitivity_analysis.m
The runcode for empirical sensitivity analysis in Section 6.2 of the paper.runcode_computational_complexity.m
: The runcode for empirical computational complexity analysis in Section 6.3 of the paper.runcode_real.m
: The runcode for experiments on the real datasetSP500.mat
. See Section 7 in our paper for more details about this dataset.
addpath(genpath(pwd));
T = 100; % The sample size.
p = 10; % The dimension.
m = 1; % The number of breaks.
% Generate Erdos Renyi data.
% Y is the training dataset, Y_test is the testing dataset,
% Theta is the sequence of true precision matrices, breaks is the array of true breaks.
[Y, Y_test, Theta, breaks] = DGP_Erdos_Renyi(T, p, m);
% Declare GFDtL.
% Check `help GFDtL` for more details.
% Pass arguments by name, do NOT use positional arguments!
Est = GFDtL(Y=Y, lamb1=1, lamb2=50, lamb3=1, epsilon=0.01, beta_=21, disp_freq=1, maxiter=inf, tol=1e-3, tol_pcg=1e-2, tol_pcg_up=0.9);
% Run the algorithm.
Est.run
% Plot the difference of Theta.
% Check `help GFDtL.plot` for more details about the optional inputs.
Est.plot
% Estimate the breaks.
Est_breaks = Est.EstBreaks;
Let Est
be the GFDtL estimator.
Our algorithm can detect the possible unsolvability of the original problem, which may occur when lamb2 is too small. If this happens, Est.infeas
will be set to true.
The AIC/BIC values for the estimator are stored in Est.AIC
and Est.BIC
. These are computed by calling Est.GetICs
. If the algorithm terminates successfully, it will be called automatically, allowing you to access Est.AIC
and Est.BIC
directly. Otherwise, you can manually call Est.GetICs
to compute these values.
See documents of functions by calling help
in matlab for more details of the three DGP functions, GFDtL
, and methods of GFDtL
.