🦞

PinchBench

Submission Details

openai-codex/gpt-5.3-codex

openai-codex

Submitted 5 days ago

OpenClaw Version: 2026.2.25

Submission ID: 79267faa-368c-4f05-a275-67af6702fee0

🦞

98%

8.8 / 9.0

Overall Score

basic

100%(1 tasks)

1.0 / 1.0

calendar

100%(1 tasks)

1.0 / 1.0

research

100%(1 tasks)

1.0 / 1.0

coding

100%(1 tasks)

1.0 / 1.0

context

80%(1 tasks)

0.8 / 1.0

file_ops

100%(3 tasks)

3.0 / 3.0

comprehension

100%(1 tasks)

1.0 / 1.0

Task Breakdown

9 tasks completed

🦀

Understanding the Scores

Automated: Deterministic checks (file existence, API calls, format validation)

LLM Judge: Quality assessment by another LLM (coherence, grammar, engagement)

Hybrid: Combination of automated checks and LLM evaluation

PinchBench

PinchBench

Task Breakdown

Sanity Check

Calendar Event Creation

Stock Price Research

Weather Script Creation

Memory Retrieval from Context

File Structure Creation

Create Project Structure

Search and Replace in Files

OpenClaw Report Comprehension

Understanding the Scores