Presentation Details
| Can AI Understand the Physical World Without Smelling it? A Multimodal Representational Framework for Olfaction Kordel France1, Tian Yu2, Michelle Niedziela3. 1University of Texas at Dallas, Dallas, TX, USA.2Amai Consulting, LLC, Denver, CO, USA.3Nerdoscientist, LLC, Chalfont, PA, USA |
Abstract
Modern generative AI has achieved remarkable success in simulating human-like text and images. However, these models remain "disembodied," lacking the chemical grounding that is fundamental to biological intelligence. While AI integrates vision and audition to move closer to a "world model," the chemical senses, olfaction and gustation, remain largely absent. Consequently, AI lacks a true representation of the physical environment, relying on linguistic descriptions of smells rather than the underlying chemical reality.
We present a novel multimodal framework that integrates olfactory signals at the molecular level into a shared "joint-embedding" space alongside visual and linguistic data. Using public datasets, we developed a system where molecular structures, physical objects, and semantic descriptors are mapped into a unified multidimensional map. These results demonstrate the feasibility of this cross-modal alignment and serve as a “call to action”: high-fidelity, curated chemosensory datasets are essential to unlock the full predictive potential of these models to bridge the gap between chemical structure and human perception.
By placing chemical senses on equal footing with vision and language, we move beyond building “smarter” AI, and invite the chemosensory community to lead the evolution of next-generation AI where the digitization of smell and taste fundamentally transforms how we interact with technology, our environment, and each other.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.
We present a novel multimodal framework that integrates olfactory signals at the molecular level into a shared "joint-embedding" space alongside visual and linguistic data. Using public datasets, we developed a system where molecular structures, physical objects, and semantic descriptors are mapped into a unified multidimensional map. These results demonstrate the feasibility of this cross-modal alignment and serve as a “call to action”: high-fidelity, curated chemosensory datasets are essential to unlock the full predictive potential of these models to bridge the gap between chemical structure and human perception.
By placing chemical senses on equal footing with vision and language, we move beyond building “smarter” AI, and invite the chemosensory community to lead the evolution of next-generation AI where the digitization of smell and taste fundamentally transforms how we interact with technology, our environment, and each other.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.