🦞

PinchBench

Submission Details

moonshotai/kimi-k2.5

moonshotai

Submitted 18 days ago

OpenClaw Version: 2026.2.9

Submission ID: 7e49ff36-5a0d-4c2f-9ee7-bb40c8da5d28

🦞

89%

9.8 / 11.0

Overall Score

basic

100%(1 tasks)

1.0 / 1.0

calendar

100%(1 tasks)

1.0 / 1.0

research

100%(2 tasks)

2.0 / 2.0

writing

100%(2 tasks)

2.0 / 2.0

coding

100%(1 tasks)

1.0 / 1.0

comprehension

100%(1 tasks)

1.0 / 1.0

context

0%(1 tasks)

0.0 / 1.0

file_ops

100%(1 tasks)

1.0 / 1.0

complex

75%(1 tasks)

0.8 / 1.0

Task Breakdown

11 tasks completed

🦀

Understanding the Scores

Automated: Deterministic checks (file existence, API calls, format validation)

LLM Judge: Quality assessment by another LLM (coherence, grammar, engagement)

Hybrid: Combination of automated checks and LLM evaluation

PinchBench

PinchBench

Task Breakdown

Sanity Check

Calendar Event Creation

Stock Price Research

Blog Post Writing

Weather Script Creation

Document Summarization

Tech Conference Research

Professional Email Drafting

Memory Retrieval from Context

File Structure Creation

Multi-step API Workflow

Understanding the Scores