Á. Cía Mina, J. López Fidalgo

Subsampling is commonly employed to improve computation efficiency in regression models. However, existing methods primarily focus on minimizing errors in estimating parameters, whereas the main practical goal of statistical models often lies in minimizing prediction errors. This study introduces a novel approach to selecting subdata for linear models, which takes into account the distribution of covariates. Our method specifically addresses scenarios with large samples where obtaining labels for the response variable is costly. The introduction of the "J-optimality" criterion is supported by theoretical justifications and aligned with standard linear optimality criteria. We also explore sequential selection. As expected based on theory, our method demonstrates a reduction in prediction mean squared error compared to existing methods. Through simulations, we present empirical evidence of the performance and potential of our approach in enhancing prediction accuracy.

Keywords: Subsampling, Active Learning, Random-X Regression,


GT06.DEX1 Invited Session
November 7, 2023  6:40 PM
CC4: Room 2

Other papers in the same session

Green Algorithms by using Response surface analysis

H. Grass Boada, J. López Fidalgo, E. Benitez, C. De La Calle Arroyo

Diseño óptimo en ensayos de clonogenicidad.

M. J. Rivas Lopez, J. M. Rodríguez Díaz

Cookie policy

We use cookies in order to be able to identify and authenticate you on the website. They are necessary for the correct functioning of it, and therefore they can not be disabled. If you continue browsing the website, you are agreeing with their acceptance, as well as our Privacy Policy.

Additionally, we use Google Analytics in order to analyze the website traffic. They also use cookies and you can accept or refuse them with the buttons below.

You can read more details about our Cookie Policy and our Privacy Policy.