J-Optimality Criterion for Subdata Selection in Linear Models

Á. Cía Mina, J. López Fidalgo

Subsampling is commonly employed to improve computation efficiency in regression models. However, existing methods primarily focus on minimizing errors in estimating parameters, whereas the main practical goal of statistical models often lies in minimizing prediction errors. This study introduces a novel approach to selecting subdata for linear models, which takes into account the distribution of covariates. Our method specifically addresses scenarios with large samples where obtaining labels for the response variable is costly. The introduction of the "J-optimality" criterion is supported by theoretical justifications and aligned with standard linear optimality criteria. We also explore sequential selection. As expected based on theory, our method demonstrates a reduction in prediction mean squared error compared to existing methods. Through simulations, we present empirical evidence of the performance and potential of our approach in enhancing prediction accuracy.

Keywords: Subsampling, Active Learning, Random-X Regression,

Scheduled

GT06.DEX1 Invited Session
November 7, 2023 6:40 PM
CC4: Room 2

Other papers in the same session

Green Algorithms by using Response surface analysis

H. Grass Boada, J. López Fidalgo, E. Benitez, C. De La Calle Arroyo

Diseño óptimo en ensayos de clonogenicidad.

M. J. Rivas Lopez, J. M. Rodríguez Díaz

J-Optimality Criterion for Subdata Selection in Linear Models

Other papers in the same session

Cookie policy