Robots Atlas>ROBOTS ATLAS
Reasoning

ToT

2023ActiveUpdated: 6 May 2026Published
Key innovation
Extended Chain-of-Thought to tree exploration of potential reasoning paths with evaluation and backtracking, enabling strategic planning and solution space search.
Category
Reasoning
Abstraction level
Pattern
Operation level
Inference
Use cases
Math puzzles requiring planningCombinatorial problemsCode generation with testingAgent planningLogic games

How it works

ToT decomposes reasoning into three components: (1) Thought Generator - LLM produces candidates for the next reasoning step, (2) State Evaluator - LLM or a separate function evaluates the promise of each state, (3) Search Algorithm - BFS or DFS (or beam search) explores the state tree. The model can backtrack to a previous state and try a different branch.

Problem solved

Language models with CoT are confined to left-to-right, token-by-token decisions during inference, which fails on tasks requiring exploration, strategic planning, or where initial decisions are pivotal.

Components

Thought GeneratorProduces candidates for the next reasoning step from the current tree state.

LLM-based component generating k candidate thoughts. Two strategies: 'sample' (independent sampling from the same prompt - for rich thought spaces, e.g. Creative Writing) or 'propose' (sequential proposal of multiple thoughts in one prompt - for constrained spaces, e.g. Game of 24).

sampleIndependent sampling with temperature.
proposeSequential proposal of multiple thoughts in one call.

Official

State EvaluatorEvaluates the promise of each intermediate state (thought) as a heuristic for the search algorithm.

LLM evaluates each state in two ways: 'value' (assigns numeric or categorical value to a single state, used in Game of 24) or 'vote' (compares multiple states and votes for the best, used in Creative Writing). The heuristic is soft and depends on LLM quality.

valueIndependent value evaluation of a single state.
voteLLM voting over a set of candidate states.

Official

Search AlgorithmExplores the tree of thought states using classical graph algorithms.

ToT uses classical algorithms: BFS (Game of 24, Creative Writing - maintains b most promising states per level) or DFS (Mini Crosswords - explores deeply with backtracking when a state is judged unpromising). The algorithm is LLM-independent.

BFSBreadth-first search with beam of width b.
DFSDepth-first search with backtracking.

Official

Implementation

Implementation pitfalls
High computational costHigh

ToT requires multiple LLM calls per step (generation + evaluation x branches), making it 10-100x more expensive than CoT.

Thought evaluator can be unreliableMedium

Using the same LLM as both generator and evaluator creates circular reasoning risks; evaluation quality depends on model capability.

Evolution

Original paper · 2023 · NeurIPS 2023 · Shunyu Yao
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
2022
Chain-of-Thought prompting as foundation

Wei et al. establish CoT as the baseline linear reasoning approach that ToT extends.

2022
Self-Consistency - first multi-path exploration

Wang et al. propose sampling multiple CoT paths and voting, a precursor to ToT.

2023
Tree of Thoughts (Yao et al., NeurIPS 2023)
Inflection point

Yao et al. formalize tree-structured deliberate reasoning with evaluation and backtracking.

2024
Graph of Thoughts and successors

Besta et al. extend ToT to arbitrary graph structures for more flexible reasoning topologies.

Technical details

Hyperparameters (configurable axes)

Number of thought candidates (k)High

Number of thoughts generated at each step by the generator. Larger k = broader exploration but linearly higher cost.

1Game of 24 (propose)
5-10Creative Writing (sample)
Beam width (b)Critical

Number of states kept per BFS level. Classical beam search parameter - tradeoff between exploration and cost.

5Default for Game of 24 in paper
1Greedy (degenerates to CoT)
Number of evaluation samplesMedium

Number of LLM calls to evaluate each state (averaged to reduce variance).

3Paper default
Tree depth (T)High

Maximum number of reasoning steps in the tree. Task-dependent (Game of 24: 3, Crosswords: much deeper).

3Game of 24
Generator strategyHigh

Choice of thought generation strategy: sample (independent sampling) or propose (sequential proposals in one prompt).

sampleRich thought space
proposeConstrained thought space
Evaluator strategyHigh

Choice of state evaluation strategy: value (independent scoring) or vote (voting over a set).

valueGame of 24
voteCreative Writing

Computational complexity

Time complexity: O(k^T · c_LLM). Space complexity: O(b · T).

Compute bottleneck

LLM API calls

Each ToT step requires many LLM calls: generation of k candidates + evaluation n_evaluate_sample times for each of b states. API latency and cost grow multiplicatively with search parameters.

Depends on
Branching factor (k)Beam width (b)Glebokosc drzewa (T)

Execution paradigm

Primary mode
conditional

Computation trajectory is conditional - depends on state evaluations. Different inputs produce different tree shapes and different paths.

Activation pattern
input_dependent
Routing mechanism

Search algorithm routes computation to the most promising tree branches based on evaluator scores. Unpromising branches are pruned (not expanded further).

Parallelism

Parallelism level
partially_parallel

Within a BFS level, generation of k candidates and their evaluation are fully parallel. Between levels the search remains sequential.

Scope
inferenceacross_tokens
Constraints
!Level t+1 requires completed evaluations from level t (to select top-b).

Hardware requirements

Primary

ToT is an inference framework - it runs on any backend capable of LLM execution (API, local GPU, TPU). The search algorithm itself is CPU-bound and lightweight.

Good fit

Local ToT execution benefits from GPUs for parallel batching of k candidates and b states in one LLM inference call.