Voice Agents

Why LLMs struggle with hopping multiple documents to draft an answer

multi hop problem with LLMs

Thomas Kousholt

What is Multi-Hop Reasoning and Why It Matters

Multi-hop reasoning is when LLMs connect multiple pieces of information to answer complex questions, like finding "the mother of the singer of 'Superstition'" by first identifying Stevie Wonder and then his mother. This is crucial for tasks beyond simple queries, such as advanced question answering or problem-solving, where understanding relationships across data is key.

Challenges LLMs Face

LLMs often struggle with multi-hop reasoning due to:

  • Sensitivity to how information is ordered, which can disrupt their ability to link facts.

  • Difficulties in later processing stages, where they might lose necessary knowledge for complex reasoning.

  • Challenges in combining external knowledge, especially for tasks with many reasoning steps.

Sensitivity to Information Order: Li et al. (2024) in "Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context" (Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context) identify that LLMs struggle when supporting documents are misordered, with performance sensitive to the sequence, impacting F1 scores significantly.

  • Limitations in Later Layers: Yang et al. (2024) in "Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries" (Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries) found that later layers may lack necessary functionality, with up to 57% of incorrect cases improved by back-patching, indicating processing bottlenecks.

  • Difficulties with External Knowledge: Zhang (2024) in "Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge" (Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge) notes challenges in selecting and combining external knowledge, especially for non-sequential tasks and tasks with many hops, showing a gap compared to human performance.

  • Generalization Issues: The same study highlights LLMs' struggles to generalize to data with larger numbers of hops, limiting their applicability in complex scenarios.

How Chunking can help ?

Chunking involves breaking down large texts into smaller, manageable parts or structuring the reasoning process into steps. Better chunking methods can:

  • Ensure input texts are divided to preserve context, making it easier for LLMs to retrieve and connect relevant information.

  • Use techniques like chain-of-thought prompting to break reasoning into steps, helping LLMs handle complex tasks more accurately.

Multi-hop reasoning remains a complex challenge for LLMs, with issues like order sensitivity and generalization limiting performance. Better chunking methods, encompassing input text segmentation and reasoning step structuring, offer promising solutions by enhancing context preservation and processing efficiency. As research progresses, integrating these methods with advanced RAG systems and prompting techniques could significantly improve LLMs' reasoning capabilities, aligning them closer to human-level performance.

What is Multi-Hop Reasoning and Why It Matters

Multi-hop reasoning is when LLMs connect multiple pieces of information to answer complex questions, like finding "the mother of the singer of 'Superstition'" by first identifying Stevie Wonder and then his mother. This is crucial for tasks beyond simple queries, such as advanced question answering or problem-solving, where understanding relationships across data is key.

Challenges LLMs Face

LLMs often struggle with multi-hop reasoning due to:

  • Sensitivity to how information is ordered, which can disrupt their ability to link facts.

  • Difficulties in later processing stages, where they might lose necessary knowledge for complex reasoning.

  • Challenges in combining external knowledge, especially for tasks with many reasoning steps.

Sensitivity to Information Order: Li et al. (2024) in "Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context" (Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context) identify that LLMs struggle when supporting documents are misordered, with performance sensitive to the sequence, impacting F1 scores significantly.

  • Limitations in Later Layers: Yang et al. (2024) in "Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries" (Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries) found that later layers may lack necessary functionality, with up to 57% of incorrect cases improved by back-patching, indicating processing bottlenecks.

  • Difficulties with External Knowledge: Zhang (2024) in "Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge" (Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge) notes challenges in selecting and combining external knowledge, especially for non-sequential tasks and tasks with many hops, showing a gap compared to human performance.

  • Generalization Issues: The same study highlights LLMs' struggles to generalize to data with larger numbers of hops, limiting their applicability in complex scenarios.

How Chunking can help ?

Chunking involves breaking down large texts into smaller, manageable parts or structuring the reasoning process into steps. Better chunking methods can:

  • Ensure input texts are divided to preserve context, making it easier for LLMs to retrieve and connect relevant information.

  • Use techniques like chain-of-thought prompting to break reasoning into steps, helping LLMs handle complex tasks more accurately.

Multi-hop reasoning remains a complex challenge for LLMs, with issues like order sensitivity and generalization limiting performance. Better chunking methods, encompassing input text segmentation and reasoning step structuring, offer promising solutions by enhancing context preservation and processing efficiency. As research progresses, integrating these methods with advanced RAG systems and prompting techniques could significantly improve LLMs' reasoning capabilities, aligning them closer to human-level performance.

What is Multi-Hop Reasoning and Why It Matters

Multi-hop reasoning is when LLMs connect multiple pieces of information to answer complex questions, like finding "the mother of the singer of 'Superstition'" by first identifying Stevie Wonder and then his mother. This is crucial for tasks beyond simple queries, such as advanced question answering or problem-solving, where understanding relationships across data is key.

Challenges LLMs Face

LLMs often struggle with multi-hop reasoning due to:

  • Sensitivity to how information is ordered, which can disrupt their ability to link facts.

  • Difficulties in later processing stages, where they might lose necessary knowledge for complex reasoning.

  • Challenges in combining external knowledge, especially for tasks with many reasoning steps.

Sensitivity to Information Order: Li et al. (2024) in "Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context" (Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context) identify that LLMs struggle when supporting documents are misordered, with performance sensitive to the sequence, impacting F1 scores significantly.

  • Limitations in Later Layers: Yang et al. (2024) in "Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries" (Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries) found that later layers may lack necessary functionality, with up to 57% of incorrect cases improved by back-patching, indicating processing bottlenecks.

  • Difficulties with External Knowledge: Zhang (2024) in "Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge" (Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge) notes challenges in selecting and combining external knowledge, especially for non-sequential tasks and tasks with many hops, showing a gap compared to human performance.

  • Generalization Issues: The same study highlights LLMs' struggles to generalize to data with larger numbers of hops, limiting their applicability in complex scenarios.

How Chunking can help ?

Chunking involves breaking down large texts into smaller, manageable parts or structuring the reasoning process into steps. Better chunking methods can:

  • Ensure input texts are divided to preserve context, making it easier for LLMs to retrieve and connect relevant information.

  • Use techniques like chain-of-thought prompting to break reasoning into steps, helping LLMs handle complex tasks more accurately.

Multi-hop reasoning remains a complex challenge for LLMs, with issues like order sensitivity and generalization limiting performance. Better chunking methods, encompassing input text segmentation and reasoning step structuring, offer promising solutions by enhancing context preservation and processing efficiency. As research progresses, integrating these methods with advanced RAG systems and prompting techniques could significantly improve LLMs' reasoning capabilities, aligning them closer to human-level performance.

Like this article? Share it.

Start building your AI agents today

Join 10,000+ developers building AI agents with ApiFlow