Multidimensional scaling for big data

P. Delicado, C. Pachón García

We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets with n individuals. When n is large, MDS is unaffordable with classical MDS algorithms because their extremely large memory and time requirements. We overcome these difficulties by means of three non-standard algorithms based on the central idea of partitioning the data set into small pieces, where classical MDS methods can work. In order to check the performance of the algorithms as well as to compare them, we have done a simulation study. Additionally, we have used the algorithms to obtain an MDS configuration for a EMNSIT: a real large data set with more than 800000 points. We conclude that the three algorithms are appropriate to use for obtaining an MDS configuration, but we recommend to use any of the two new proposals since they are fast algorithms with satisfactory statistical properties when working with big data. An R package implementing the algorithms has been created.

Palabras clave: Computational efficiency Divide and conquer Gower’s interpolation formula Landmark MDS Procrustes transformation

Programado

GT18.SOFTW1 Sesión Invitada

7 de noviembre de 2023 16:50

HC1: Sala Canónigos 1

Otros trabajos en la misma sesión

Hostility measure: a multi-perspective of data complexity

C. Lancho Martín

SurvLIMEpy: A Python package implementing SurvLIME

C. Pachón García, C. Hernández-Pérez, P. Delicado, V. Vilaplana

Selección de variables en Análisis Envolvente de Datos: el paquete adea

F. Fernández Palacín, M. Muñoz Márquez

Multidimensional scaling for big data

Otros trabajos en la misma sesión

Política de cookies