P. Delicado, C. Pachón García
We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets with n individuals. When n is large, MDS is unaffordable with classical MDS algorithms because their extremely large memory and time requirements. We overcome these difficulties by means of three non-standard algorithms based on the central idea of partitioning the data set into small pieces, where classical MDS methods can work. In order to check the performance of the algorithms as well as to compare them, we have done a simulation study. Additionally, we have used the algorithms to obtain an MDS configuration for a EMNSIT: a real large data set with more than 800000 points. We conclude that the three algorithms are appropriate to use for obtaining an MDS configuration, but we recommend to use any of the two new proposals since they are fast algorithms with satisfactory statistical properties when working with big data. An R package implementing the algorithms has been created.
Palabras clave: Computational efficiency, Divide and conquer, Gower’s interpolation formula, Landmark MDS, Procrustes transformation
Programado
GT18.SOFTW1 Sesión Invitada
7 de noviembre de 2023 16:50
HC1: Sala Canónigos 1