Á. Cía Mina, J. López Fidalgo
Subsampling is commonly employed to improve computation efficiency in regression models. However, existing methods primarily focus on minimizing errors in estimating parameters, whereas the main practical goal of statistical models often lies in minimizing prediction errors. This study introduces a novel approach to selecting subdata for linear models, which takes into account the distribution of covariates. Our method specifically addresses scenarios with large samples where obtaining labels for the response variable is costly. The introduction of the "J-optimality" criterion is supported by theoretical justifications and aligned with standard linear optimality criteria. We also explore sequential selection. As expected based on theory, our method demonstrates a reduction in prediction mean squared error compared to existing methods. Through simulations, we present empirical evidence of the performance and potential of our approach in enhancing prediction accuracy.
Keywords: Subsampling, Active Learning, Random-X Regression,
Scheduled
GT06.DEX1 Invited Session
November 7, 2023 6:40 PM
CC4: Room 2