Presentation Details
| A Computational Framework for Multisensory Grounding of Olfactory Representations Kordel France1, Tian Yu2, Michelle Niedziela3. 1Scentience, Dallas, TX, USA.2Amai, Denver, CO, USA.3Nerdoscientist, Chalfont, PA, USA |
Abstract
Modern generative AI has made significant progress in modeling language and vision, yet still remains largely disconnected from the chemical properties of the physical world. Olfaction and gustation, though central to biological intelligence and environmental understanding, are typically absent from computational world models, leaving AI systems reliant on linguistic descriptions rather than chemical structure. We present a multimodal representational framework integrating olfactory signals at the molecular level with visual objects and linguistic descriptors within a shared joint-embedding space. Using publicly available datasets, molecular structures, images of physical objects, and odor-related language mapped into a unified multidimensional representation, we enables systematic exploration of relationships between chemical features, perceptual descriptors, and real-world contexts, supporting analyses of odor similarity, categorization, and cross-modal inference beyond chemistry alone. While we demonstrate that meaningful cross-modal alignment between chemical, visual, and semantic information is computationally feasible, results also reveal a key bottleneck: the scarcity of high-quality, curated chemosensory datasets that directly link chemical structure to human perceptual experience. By placing chemical senses on equal footing with vision and language, this framework offers a flexible platform for studying olfactory representation, multisensory integration, and semantic grounding. We invite discussion with the chemosensory community on how such models can be refined, validated, and expanded and how advances in chemosensory science can actively shape the next generation of multimodal AI systems.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.