Multidimensional scaling for big data

P. Delicado, C. Pachón García

We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets with n individuals. When n is large, MDS is unaffordable with classical MDS algorithms because their extremely large memory and time requirements. We overcome these difficulties by means of three non-standard algorithms based on the central idea of partitioning the data set into small pieces, where classical MDS methods can work. In order to check the performance of the algorithms as well as to compare them, we have done a simulation study. Additionally, we have used the algorithms to obtain an MDS configuration for a EMNSIT: a real large data set with more than 800000 points. We conclude that the three algorithms are appropriate to use for obtaining an MDS configuration, but we recommend to use any of the two new proposals since they are fast algorithms with satisfactory statistical properties when working with big data. An R package implementing the algorithms has been created.

Keywords: Computational efficiency Divide and conquer Gower’s interpolation formula Landmark MDS Procrustes transformation

Scheduled

GT18.SOFTW1 Invited Session

November 7, 2023 4:50 PM

HC1: Canónigos Room 1

Other papers in the same session

Hostility measure: a multi-perspective of data complexity

C. Lancho Martín

SurvLIMEpy: A Python package implementing SurvLIME

C. Pachón García, C. Hernández-Pérez, P. Delicado, V. Vilaplana

Selección de variables en Análisis Envolvente de Datos: el paquete adea

F. Fernández Palacín, M. Muñoz Márquez

Multidimensional scaling for big data

Other papers in the same session

Cookie policy