Overview
What is this? A set of experiments at Erdős AI Lab sitting between deep learning and structural biology - characterizing how modern transformer-based folding stacks behave on small proteins, where they succeed, where they fail, and whether their internal confidence signals actually track real structural fidelity.
And the second thread? A head-to-head against classical molecular-dynamics baselines on the same targets, looking for systematic biases on either side - the kind that only show up when learned and physics-based approaches are scored against each other in earnest.
Threads
- pLDDT-style confidence calibration. Do learned confidence scores actually track RMSD/lDDT on held-out structures? When do they break, and along which axes - loop length, disorder, novelty?
- Folding trajectory dynamics. Instead of treating folding stacks as black boxes, inspecting intermediate representations to understand how structure forms layer by layer.
- Transformer vs. classical MD. Head-to-head on small targets where classical MD is still tractable, isolating where each approach is reliable.
Selected Results
What are we looking at? A representative head-to-head and a folding-trajectory snapshot from one of the active tracks. The structural comparison shows where single-chain folding stacks land versus the native complex; the trajectory plots are the observables we use to score folding quality over time.
Why It Matters
Hasn't structure prediction been solved? At the surface level, yes - the benchmark numbers are impressive. But reliable use downstream requires knowing when a model can be trusted, and the answer is rarely "always". Calibrated confidence and a mechanistic read on folding stacks are what turn benchmark scores into usable scientific tools.