S S Banerjee
  • Home
  • Publications
  • Projects
  • CV

On this page

  • 2024
    • Best Subset Selection via a Modern Optimization Lens – A Review
  • 2023
    • Conformalized Survival Analysis – A Review
  • 2022
    • Conditional Calibration for FDR Control Under Dependence – A Review
    • Sparse Principal Components Analysis and Its Generalization
    • Bayesian Forecasting of UEFA Champions League
    • Fast and Robust Bootstrap for Regression Estimate Distribution
  • 2021
    • Robust Variable Selection via Rank-Based LASSO
    • Nonparametric Modal Regression via KDE
  • 2020
    • Study of Physicochemical Properties of Protein Tertiary Structure

Projects

Selected Projects

2024

Best Subset Selection via a Modern Optimization Lens – A Review

Description

We reviewed the landmark work of Bertsimas et al. (2016), which proposes scalable Mixed Integer Optimization (MIO) techniques for best subset selection in linear regression. This approach allows for exact sparse solutions by formulating the ℓ₀-constrained problem into a Mixed Integer Quadratic Optimization (MIQO) framework. We also explored their Discrete First Order (DFO) algorithm for fast approximate solutions and warm-start initialization, along with theoretical guarantees on convergence and sparsity recovery.

RegressionSubset SelectionOptimization


2023

Conformalized Survival Analysis – A Review

Description

In this project we review the methodology involving inference using conformal prediction proposed by Candès et al. (2023), which can be used with any survival prediction methods to produce calibrated, covariate-dependent lower predictive bounds on survival times. We begin with conformal prediction for basic regression models and adapt similar techniques to the survival analysis regime.

ReportSlides
SurvivalReview


2022

Conditional Calibration for FDR Control Under Dependence – A Review

Description

Studied the theory and methodology proposed by Fithian and Lei (2022) to adaptively calibrate separate rejection thresholds for each p-value to control overall FDR under different dependence structures.

Slides
MultivariateReview

Sparse Principal Components Analysis and Its Generalization

Description

Principal Component Analysis is a widely used technique for dimension reduction. We discuss Sparse Principal Component Analysis (SPCA), which provides sparse loadings to resolve PCA’s interpretability issues. We also explore GAS-PCA, a generalization of SPCA that performs better in finite samples. Based on Zou et al. (2006) and Leng & Wang (2009).

ReportSlidesCode
MultivariateRegressionReview

Bayesian Forecasting of UEFA Champions League

Description

We simulate the UEFA Champions League under alternative seeding schemes using a probabilistic Bayesian forecasting model. The aim is to evaluate the effect of seeding changes on tournament outcomes and team advancement. Based on Corona et al. (2019).

ReportSlidesCode
BayesianSports Analytics

Fast and Robust Bootstrap for Regression Estimate Distribution

Description

This report discusses Fast Bootstrap for estimating the distribution of robust regression estimates, based on MM-estimators. It avoids non-convex optimization and uses a system of linear equations for each bootstrap sample, with robustness from residual-based weights.

ReportSlidesCode
RobustnessBootstrapRegression


2021

Robust Variable Selection via Rank-Based LASSO

Description

We present Rank-LASSO, a robust variable selection method for high-dimensional regression settings (\(p \gg n\)), which improves over LAD-LASSO using thresholded score statistics. Based on Rejchel and Bogdan (2020), this approach performs well even with heavy-tailed errors.

ReportSlidesCode
RegressionHigh-DimensionalRobustnessReview

Nonparametric Modal Regression via KDE

Description

This project reviews nonparametric modal regression using kernel density estimation. We explore its consistency properties, prediction sets, and bandwidth selection strategies. Based on Chen et al. (2016).

ReportSlidesCode
NonparametricRegressionReview


2020

Study of Physicochemical Properties of Protein Tertiary Structure

Description

Modeled RMSD (protein tertiary structure deviation) using multiple linear regression with 9 features. We performed diagnostics including multicollinearity tests, residual analysis, and variable selection to validate the model.

Report
Regression

Shubha Sankar Banerjee ©

Built using Quarto Academic Template