PRIME: Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

Rutgers University John Hopkins University
The teaser figure for PRIME.
Figure: QA framework vs the PRIME framework for evaluating implicit biases in LLM reasoning

PRIME (Puzzle Reasoning for Implicit biases in Model Evaluation) is a novel logic puzzle-based framework, to evaluate the effect of social biases on LLM reasoning.

We introduce an algorithm to generate logic puzzles of arbitrary complexity with three versions of each puzzle: a generic neutral baseline, a stereotypical version that confirms social stereotypes and an anti-stereotypical version that contradicts these stereotypes. This allows to measure how implicit biases affect deductive reasoning via controlled structural variations that preserve logical complexity while altering demographic associations. We also release a dataset of 6,048 logic grid puzzles for evaluating the influence of gender bias on deductive reasoning in LLMs. To analyze how biases infiltrate reasoning pathways we introduce three finegrained edit distance metrics and a bias difference metric.

Solving logic grid puzzles requires infering relationships between entities using a set of clues, where applying outside world knowledge is not required to arrive at the solution. For example, to determine a person’s occupation (doctor vs. nurse), gender stereotypes (e.g., men are doctors) are irrelevant. However, our experiments show that LLMs demonstrate significant demographic decision bias and reveal that stereotype-aligned associations function as reasoning shortcuts in LLMs. Our findings highlight the limitations of current alignment and safety training in models, which can be more effective at concealing explicit bias than addressing implicit biases that emerge during reasoning. PRIME can be used to automatically detect and quantify these decision biases.

-->

Details on PRIME Puzzle Generation

PRIME consists of puzzle triplets: Generic, Stereotypical, and Anti-stereotypical. Each triplet consists of a Name column as the anchor, a Bias-Probing column that aligns with or contradicts social stereotypes, and one or more General columns that are bias-irrelevant. For automatic generation of puzzles, PRIME proceeds in three main steps: grid construction, clue generation, and identification of a minimally solvable clue set. The grid satisfies the Latin square constraint to ensure each item appears exactly once per row and column. To frame different logical constraints and reasoning steps five different clue types are selected from Puzzle Baron.
PRIME puzzle setup

Details on PRIME Evaluation Metrics

Edit Distance quantifies how much an LLM's predicted puzzle solution deviates from the ground truth. It measures how many swaps away a predicted puzzle is from the ground truth. We use three different edit distances: overfall edit distance provides a holistic measure of overall reasoning accuracy bias probing edit distance provides targeted performance of identity-based assignments, and general edit distance assesses whether stereotype cues affect general reasoning.
Edit distance results
Bias difference results
Figure: Overall, Bias Probing and General Edit Distance

Bias Difference quantifies shifts in model performance between stereotypical and anti-stereotypical puzzles for each edit distance setting.

Selective Results and Analysis

Result 1
Result 2 Error Analysis

To learn more about the setup and findings please read the paper.

Publication

Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles

Fatima Jahara, Mark Dredze, Sharon Levy

If you use this work, please cite our paper:

@misc{jahara2025evaluatingimplicitbiasesllm,
      title={Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles}, 
      author={Fatima Jahara and Mark Dredze and Sharon Levy},
      year={2025},
      eprint={2511.06160},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2511.06160}, 
    }