🟡 Math

Throughout this course, we have seen many different prompting methods that can be used to improve %%LLM|LLM%% math ability. One recent approach, MathPrompter(@imani2023mathprompter), unifies some of these methods (%%CoT|CoT prompting%%, %%PAL|PAL%%, etc.) into a single technique. The overarching idea is to break down a math question into algebraic terms then use Python code to solve it in different ways.

MathPrompter has four steps. We will explain them using the following example problem. The example is taken directly from the paper.

Q: At a restaurant, each adult meal costs $5 and kids eat free. If a group of 15
people came in and 8 were kids, how much would it cost for the group to eat?

Step 1: Generate Algabraic Template

The first step is to assign a variable to each number in the question. This helps because it allows easier translation of the question into an abstract math question, as well as into programming code.

This can be done via few shot prompting:

Failed to load Dyno Embed: JavaScript must be enabled

Step 2: Math Prompts

The point of this step is to formulate the problem as both an algabraic statement and as Python code. This step has two simultaneous prompts, which help to give diverse representations of the problem.

2a: Algebraic Statement

We can few-shot prompt the LLM to represent the math problem as an algebraic statement. This is done by asking the LLM to generate the answer format, starting with "Answer =".

Failed to load Dyno Embed: JavaScript must be enabled

2b: Python Code

We can also ask the %%LLM|LLM%% to generate Python code that solves the problem. This is done by asking the LLM to generate a Python function.

Failed to load Dyno Embed: JavaScript must be enabled

Answer Generation

Now, we can use the Mapping that we generated previously to automatically fill in the variables.

Mapping: {A: 5, B: 15, C: 8}

Algabraic:

Answer = 5 * 15 - 5 * 8

Python function:

def restaurant_cost(A=5, B=15, C=8):
  return A * (B - C)

We can evaluate both using Python.

Algebraic:

>>> eval("5 * 15 - 5 * 8")
35

Python function:

>>> restaurant_cost()
35

Step 4: Self-Consistency

Finally, we will leverage %%Self-Consistency|self_consistency%% to rerun the above process multiple times (~5), then take the majority answer.

Conclusion

MathPrompter reports 92.5% accuracy on the MultiArith(@roy-roth-2015-solving) dataset. The success of this technique is a great example of how you as a prompt engineer can take methods that you have learned throughout this course and combine them to deal with larger problems.