Reflexion
How it works
In each trial, the agent performs a task and receives a feedback signal. A reflection module (the same LLM) analyzes the feedback signal and generates a verbal reflection describing errors and how to avoid them. The reflection is added to an episodic memory buffer. In the next trial, the memory buffer is appended to the agent's context, enabling it to reason based on previous experience.
Problem solved
Traditional reinforcement learning methods require a large number of trials and expensive model fine-tuning; LLM agents should be able to learn quickly from trial-and-error without parameter modification.
Implementation
As reflections accumulate, the context window fills up, limiting the number of trials from which the agent can learn.
If the agent generates incorrect reflections (misattributes failures), subsequent trials may be misguided.
Evolution
Yao et al. propose ReAct, interleaving reasoning traces with actions, precursor to Reflexion.
Shinn et al. introduce verbal reflection with episodic memory as a substitute for RL fine-tuning.
Reflexion-style reflection is adopted in LangChain, AutoGen, and other agent frameworks.