Projects
Selected Projects
2024
Best Subset Selection via a Modern Optimization Lens – A Review
Description
We reviewed the landmark work of Bertsimas et al. (2016), which proposes scalable Mixed Integer Optimization (MIO) techniques for best subset selection in linear regression. This approach allows for exact sparse solutions by formulating the ℓ₀-constrained problem into a Mixed Integer Quadratic Optimization (MIQO) framework. We also explored their Discrete First Order (DFO) algorithm for fast approximate solutions and warm-start initialization, along with theoretical guarantees on convergence and sparsity recovery.
RegressionSubset SelectionOptimization
2023
Conformalized Survival Analysis – A Review
Description
In this project we review the methodology involving inference using conformal prediction proposed by Candès et al. (2023), which can be used with any survival prediction methods to produce calibrated, covariate-dependent lower predictive bounds on survival times. We begin with conformal prediction for basic regression models and adapt similar techniques to the survival analysis regime.
2022
Conditional Calibration for FDR Control Under Dependence – A Review
Description
Studied the theory and methodology proposed by Fithian and Lei (2022) to adaptively calibrate separate rejection thresholds for each p-value to control overall FDR under different dependence structures.
Slides
MultivariateReview
Sparse Principal Components Analysis and Its Generalization
Description
Principal Component Analysis is a widely used technique for dimension reduction. We discuss Sparse Principal Component Analysis (SPCA), which provides sparse loadings to resolve PCA’s interpretability issues. We also explore GAS-PCA, a generalization of SPCA that performs better in finite samples. Based on Zou et al. (2006) and Leng & Wang (2009).
Bayesian Forecasting of UEFA Champions League
Description
We simulate the UEFA Champions League under alternative seeding schemes using a probabilistic Bayesian forecasting model. The aim is to evaluate the effect of seeding changes on tournament outcomes and team advancement. Based on Corona et al. (2019).
Fast and Robust Bootstrap for Regression Estimate Distribution
Description
This report discusses Fast Bootstrap for estimating the distribution of robust regression estimates, based on MM-estimators. It avoids non-convex optimization and uses a system of linear equations for each bootstrap sample, with robustness from residual-based weights.
ReportSlidesCode
RobustnessBootstrapRegression
2021
Robust Variable Selection via Rank-Based LASSO
Description
We present Rank-LASSO, a robust variable selection method for high-dimensional regression settings (\(p \gg n\)), which improves over LAD-LASSO using thresholded score statistics. Based on Rejchel and Bogdan (2020), this approach performs well even with heavy-tailed errors.
Nonparametric Modal Regression via KDE
Description
This project reviews nonparametric modal regression using kernel density estimation. We explore its consistency properties, prediction sets, and bandwidth selection strategies. Based on Chen et al. (2016).
ReportSlidesCode
NonparametricRegressionReview
2020
Study of Physicochemical Properties of Protein Tertiary Structure
Description
Modeled RMSD (protein tertiary structure deviation) using multiple linear regression with 9 features. We performed diagnostics including multicollinearity tests, residual analysis, and variable selection to validate the model.
Report
Regression