Cross-corpus findings

Across 10 papers and 245 claims.

68
claims verified by code
176
not yet verified
1
failed verification

Key patterns

Where it worked, it worked cleanly.

Of the 68 verified claims, most matched the paper's reported result on first run. Computational papers with well-structured repositories and pre-computed arrays on Zenodo were the most tractable. The failure rate where code and data were both available was near zero.

Data availability is the dominant blocker.

Most claims could not be verified because the underlying data — raw imaging, electrophysiology recordings, or immunostaining scans — was not in any public deposit. Code was typically available; data was not.

API breakage was the most common code issue.

The most common reproducibility failure was deprecated library APIs (e.g., matplotlib 3.8 removing w_xaxis). These are one-line fixes that prevent automated figure comparison but do not affect the underlying computation.

Observational claims need a different verification model.

Anatomical atlas papers (e.g., Artiushin spider brain atlas) contain claims whose "reproduction" means inspecting the deposited volume, not re-computing a statistic. The claim graph method surfaces this explicitly: these claims are valid but require a different reproduction pathway.

Assessment claims are the most tractable layer.

Of 26 assessment claims (model scope, assumptions, methodological constraints), nearly all could be verified by code inspection alone — no data required. Making these explicit is itself a contribution: the assumptions are in the code, not the paper prose.

By paper