🦞

PinchBench

Submission Details

moonshotai/kimi-k2.5

moonshotai

Submitted 16 days ago

OpenClaw Version: 2026.2.9

Submission ID: 7bbf8c77-c35e-44ce-a669-91a3e15e573b

🦞

94%

10.3 / 11.0

Overall Score

validation

100%(1 tasks)

1.0 / 1.0

calendar

92%(1 tasks)

0.9 / 1.0

api

100%(1 tasks)

1.0 / 1.0

writing

93%(2 tasks)

1.9 / 2.0

coding

100%(2 tasks)

2.0 / 2.0

comprehension

93%(1 tasks)

0.9 / 1.0

research

93%(1 tasks)

0.9 / 1.0

context

80%(1 tasks)

0.8 / 1.0

complex

88%(1 tasks)

0.9 / 1.0

Task Breakdown

11 tasks completed

🦀

Understanding the Scores

Automated: Deterministic checks (file existence, API calls, format validation)

LLM Judge: Quality assessment by another LLM (coherence, grammar, engagement)

Hybrid: Combination of automated checks and LLM evaluation

PinchBench

PinchBench

Task Breakdown

Sanity Check

Calendar Event

Stock Research

Blog Post

Weather Script

Document Summary

Events Research

Email Draft

Memory Retrieval

File Operations

Multi-step Workflow

Understanding the Scores