Presentation Details
Empirical evaluation of human odor quality datasets supports the use of lexical methods for collecting big data

Emily J.Mayhew1, Joel D.Mainland2.

1Michigan State University, East Lansing, MI, USA.2Monell Chemical Senses Center, Philadelphia, PA, USA

Abstract


Mapping stimulus chemistry to odor percept has long been an elusive goal, but recent strides in machine learning for olfaction suggest that the challenge is surmountable. Progress now requires large-scale perceptual datasets, yet existing olfactory data are limited to tens or hundreds of stimuli collected with unstandardized methods. Critically, the sensory method used to generate such data will profoundly affect data quality, collection efficiency, and model reliability—but no systematic comparison of methods exists. The field has historically made use of both lexical methods (using verbal descriptions) and non-lexical methods (i.e. similarity) to measure odor quality, although lexical methods are often critiqued as subjective and biased by verbal artifacts. In this study, we directly compared data resolution, test-retest reliability, and efficiency of data collection between 6 sensory methods (including lexical Rate-All-That-Apply, RATA, and non-lexical explicit Similarity, SIM) conducted by 2 sensory panels (highly or moderately trained) on a standardized set of 50 odor stimuli. We find that SIM generates the highest resolution (AUC=0.78) and test-retest reliability (R=0.96), followed by RATA (AUC=0.55, R=0.85), but that RATA is orders of magnitude more efficient (e.g. SIM is 29x slower with n=100 stimuli), especially as number of stimuli increases. Importantly, we also confirmed that odor spaces and distances generated by each method were highly correlated to each other, with odor distances extracted from RATA PCA closely approximating explicit SIM ratings (R=0.82). We conclude that rapid descriptive methods using a standardized lexicon (RATA) provide high quality data efficiently and recommend their use for the collection of odor quality “big data.”

No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.