ACHEMS Virtual Meeting

Presentation Details

Olfaction Benchmark for Large Language Models

Eftychia Makri¹, Nikolaos Nakis², Laura Sisson³, Gigi Minsky⁴, Leandros Tassiulas⁵, Vahid Satarifard⁶, Nicholas A.Christakis⁷.

¹Department of Electrical Engineering Yale University, New Haven, CT, USA.²Yale Institute for Network Science, Yale University, New Haven, CT, USA.³Department of Computer Science, Boston University, Boston, MA, USA.⁴Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, CA, USA.⁵Department of Electrical Engineering Yale University, New Haven, CT, USA.⁶Yale Institute for Network Science, Yale University, New Haven, CT, USA.⁷Yale Institute for Network Science, Yale University, New Haven, CT, USA

Abstract

We introduce the olfactoion benchmark, a standardized, multi-task evaluation of large language models on diverse set of problems in olfaction and odor reasoning. The benchmark spans diverse capabilities, including odor classification and descriptors, perceptual attributes including intensity and pleasantness, olfactory perception, receptor activation, and multi-label semantic profiling, with different prompting procedures. We evaluate a broad set of state-of-the-art commercial and open-source models and assess performance in both aggregated performance and fine-grained analyses of error patterns. To probe robustness beyond English, we extend subset of benchmark to a multilingual setting via translated prompts and compute performance across languages and models. Our results provide a high-level map of current LLM strengths and limitations in olfactory intelligence and establish a reproducible framework for tracking progress in sensory reasoning.

No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.

Content Locked. Log into a registered attendee account to access this presentation.