Cross-corpus findings
Across 10 papers and 245 claims.
Key patterns
Where it worked, it worked cleanly.
Of the 68 verified claims, most matched the paper's reported result on first run. Computational papers with well-structured repositories and pre-computed arrays on Zenodo were the most tractable. The failure rate where code and data were both available was near zero.
Data availability is the dominant blocker.
Most claims could not be verified because the underlying data — raw imaging, electrophysiology recordings, or immunostaining scans — was not in any public deposit. Code was typically available; data was not.
API breakage was the most common code issue.
The most common reproducibility failure was deprecated library APIs (e.g., matplotlib 3.8 removing
w_xaxis). These are one-line fixes that
prevent automated figure comparison but do not affect the underlying computation.
Observational claims need a different verification model.
Anatomical atlas papers (e.g., Artiushin spider brain atlas) contain claims whose "reproduction" means inspecting the deposited volume, not re-computing a statistic. The claim graph method surfaces this explicitly: these claims are valid but require a different reproduction pathway.
Assessment claims are the most tractable layer.
Of 26 assessment claims (model scope, assumptions, methodological constraints), nearly all could be verified by code inspection alone — no data required. Making these explicit is itself a contribution: the assumptions are in the code, not the paper prose.