Comparative of LLM-Generated Rewards for Training RL Agents in Robot Manipulation Tasks

Published in IEEE Transactions on Robotics (T-RO), 2025

The advent of Large Language Models (LLMs) has provided researchers with another Artificial Intelligence (AI) tool to solve challenging tasks. Despite the negligible likelihood of a reduction in the release pace of these models in the near future, they have captured the attention of researchers in the field. Consequently, numerous academic papers have concentrated on enhancing the efficiency of these models, ignoring a straightforward yet effective task: comparing their performance. In this journal paper, several State-Of-The-Art (SOTA) LLMs from OpenAI, Google DeepMind, Anthropic, xAI and DeepSeek have been analysed in order to solve the same group of robotic grasping tasks with four and five fingered robotic hands, in order to compare their performance. These models, in conjunction with newly developed grasping tasks, have been integrated into the Eureka platform. A comprehensive analysis of the subjects’ performance has been conducted, encompassing the Human Normalised Score (HNS), the Consecutive Successes (CS) they have attained, the quantity of tokens necessary to complete the task, and the economic expense associated with their training. Consequently, optimal performance at a minimal cost is achieved using Gemini 2.0 Flash models, while Claude 3.7 Sonnet would be ideal for researchers with limited computational resources.

Keywords: Large language models, manipulation, reinforcement learning, reward design, chain of thought.

Recommended citation: Santiago T. Puente, Ignacio de Loyola Páez-Ubieta, Moisés Fernández-Herrero, Carlos Mateo-Agulló (2025). "Comparative of LLM-Generated Rewards for Training RL Agents in Robot Manipulation Tasks. " IEEE Transactions on Robotics (T-RO). Under review