M. Comas Cufi, J. Palarea Albaladejo, J. A. Martín Fernández, G. Mateu Figueras
Compositional analysis of multivariate count data has grown in popularity in recent years, particularly in the molecular biosciences. This recognises that the observed abundances only represent a fraction of the actual abundances in the studied environment. However, sparsity is a common issue, with data sets often containing over 70% zero entries. Crucially, zeros impede the computation of either logarithms (common in ordinary analysis) or logratios (used in CoDA), and this has motivated different workarounds. We present a replacement method based on the logratio-normal-multinomial distribution, compounding the logratio-normal and multinomial distributions. It offers a model-based, flexible alternative to common, often oversimplistic, practices. However, it requires dealing with computation burden issues regarding model parameter estimation. Different formulations to enable its practical feasibility, especially in high-dimensional contexts, are discussed and compared by simulation.
Palabras clave: Compositional data, multivariate analysis, zeros, imputation
Programado
GT03.AMC3 Datos Composicionales
9 de noviembre de 2023 16:50
CC3: Sala 1