Presentation Details
Olfaction Benchmark for Large Language Models

Eftychia Makri1, Nikolaos Nakis2, Laura Sisson3, Gigi Minsky4, Leandros Tassiulas5, Vahid Satarifard6, Nicholas A.Christakis7.

1Department of Electrical Engineering Yale University, New Haven, CT, USA.2Yale Institute for Network Science, Yale University, New Haven, CT, USA.3Department of Computer Science, Boston University, Boston, MA, USA.4Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, CA, USA.5Department of Electrical Engineering Yale University, New Haven, CT, USA.6Yale Institute for Network Science, Yale University, New Haven, CT, USA.7Yale Institute for Network Science, Yale University, New Haven, CT, USA

Abstract


We introduce the olfactoion benchmark, a standardized, multi-task evaluation of large language models on diverse set of problems in olfaction and odor reasoning. The benchmark spans diverse capabilities, including odor classification and descriptors, perceptual attributes including intensity and pleasantness, olfactory perception, receptor activation, and multi-label semantic profiling, with different prompting procedures. We evaluate a broad set of state-of-the-art commercial and open-source models and assess performance in both aggregated performance and fine-grained analyses of error patterns. To probe robustness beyond English, we extend subset of benchmark to a multilingual setting via translated prompts and compute performance across languages and models. Our results provide a high-level map of current LLM strengths and limitations in olfactory intelligence and establish a reproducible framework for tracking progress in sensory reasoning.

No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author.
Content Locked. Log into a registered attendee account to access this presentation.