AI × Bio: Protein Folding & Molecular Dynamics

Four experiments in AI × biology, on a single GPU

Folding, pulling, and predicting real proteins.

Four experiments on real proteins, all on one GPU, all reproducible from the code linked at the bottom. I built a structure-prediction and molecular-dynamics pipeline end to end and put it through four regimes that stress completely different parts of it: single-structure folding, confidence calibration, mechanical unfolding, and microsecond equilibrium dynamics. Each result is checked against the published experimental or theoretical value, and they all land where they should.

Stack. ESMFold v1 via HuggingFace transformers; AMBER ff14SB + TIP3P + OpenMM 8.5 for MD; mdtraj for analysis; tmtools for TM-scoring; FastAPI + py3Dmol + Plotly for a live dashboard. A single A100 throughout. Code on GitHub, large trajectories on HuggingFace, and a live dashboard. Built at Erdős AI Lab.

TL;DR

ESMFold on 8 small benchmark proteins in 12.9 seconds total. Four of them (ubiquitin, GB1, titin I27, α-spectrin SH3) hit TM-score ≥ 0.90 against the experimental crystal structures.
ESMFold confidence calibration on 38 small public proteins (10-120 aa): per-residue pLDDT tracks ground-truth TM-score at Pearson r = 0.709, Spearman ρ = 0.584.
Titin I27 mechanical unfolding by steered MD at three pulling velocities. Peak force at the medium velocity reached 2,316 pN, about 11× the experimental AFM measurement of 204 ± 26 pN (Carrion-Vazquez et al., PNAS 1999), the loading-rate inflation that Bell-Evans theory predicts.
Chignolin (CLN025) equilibrium MD at 340 K for 1 µs of physical time. The 10-residue β-hairpin held its native state for 89.5% of the trajectory; mean Q (native contact fraction) = 0.91; minimum Cα-RMSD to experiment = 0.27 Å.

Experiment 1 - ESMFold on a small-protein benchmark

I picked 8 well-studied small proteins from RCSB, fetched their sequences from the FASTA endpoint, and folded each with ESMFold v1 (Lin et al., Science 2023). For each prediction I fetched the experimental PDB, parsed the Cα trace, computed RMSD by Kabsch superposition, scored TM-score via tmtools (TM-align's optimal alignment, Zhang & Skolnick 2004), and computed GDT-TS.

PDB	Name	Length	TM-score	RMSD (Å)	Mean pLDDT
1UBQ	ubiquitin	76	0.96	0.84	77
2QMT	GB1	56	0.96	0.55	76
1SHG	α-spectrin SH3	62	0.90	12.17 *	78
1TIT	titin I27	89	0.90	1.22	83
1VII	villin headpiece	35	0.66	2.12	79
1L2Y	trp-cage	20	0.55	0.51	77
5AWL	chignolin	10	0.44	0.88	70
1LMB	λ-repressor	87	n/a	n/a	79

* The 1SHG RMSD of 12.17 Å is a Kabsch-superposition artifact: an independent residue-by-residue alignment picks up an offset that the TM-align alignment handles correctly. TM-score is the ground truth here, and 0.90 says it is the same fold. 1LMB's native PDB is multi-chain and broke my single-chain parser.

Total folding time: 12.9 seconds for 8 sequences. Mean TM = 0.76 across the 7 that scored cleanly.

Horizontal bar chart of TM-score for eight proteins. Ubiquitin and GB1 at 0.96, SH3 and titin I27 at 0.90, villin 0.66, trp-cage 0.55, chignolin 0.44. — ESMFold accuracy (TM-score vs native) on the 8 targets; 12.9 s total wall time on one A100. The dotted line at TM = 0.5 marks the same-fold threshold.

Four panels of ESMFold predictions (rainbow) superposed on the gray experimental structures for ubiquitin, GB1, titin I27 and SH3, all near-perfect overlays at TM 0.90 to 0.96. — The four top performers: rainbow predictions overlaid on the gray experimental structures. You can barely see the gray underneath, which is the result.

What's actually new here: nothing. ESMFold on these targets is a known capability; the original paper benchmarked similar systems. What is new for me is a working pipeline that takes a sequence string, returns a 3D PDB in about two seconds, and scores it against ground truth.

Experiment 2 - ESMFold confidence calibration

The interesting question on small proteins is not whether ESMFold can predict the structure, it is whether it knows when it cannot. The per-residue pLDDT score is the model's self-reported confidence. If pLDDT correlates with actual structural accuracy, the model is calibrated and you can trust its uncertainty. If not, the confidence is decorative.

I built a 50-protein PISCES-style sweep: small (10-120 aa) monomeric proteins from the PDB, single chain, no ligands. Sequences fetched live from the RCSB FASTA endpoint, each folded by ESMFold, each scored against its native structure. After dropping multi-chain, non-standard-residue, and scoring-failure cases, 38 proteins had full pLDDT + TM-score data.

Scatter plot of ESMFold mean pLDDT confidence against true TM-score on 38 small proteins, with a rising linear fit. Pearson r = 0.709. — ESMFold self-confidence (mean pLDDT) vs true accuracy (TM-score to native) on the 38-protein PISCES subset (10-120 aa). Pearson r = 0.709, Spearman ρ = 0.584.

Pearson r = 0.709
Spearman ρ = 0.584
Linear fit slope = 0.018 TM per pLDDT-point

ESMFold's confidence is meaningfully informative on small proteins. The two failure modes sit in the lower-left cluster: proteins where pLDDT was 60-70 but TM came in under 0.5. Chignolin is one of these. At 10 residues the model is genuinely uncertain about the β-hairpin register, and it says so.

Chignolin: experimental native (green, PDB 5AWL) on the left versus the ESMFold prediction (rainbow, sequence only, no MSA) on the right; the prediction is a folded shape but misses the exact beta-hairpin register. — The chignolin failure case: ESMFold (right, sequence only, no MSA) collapses the chain to a folded shape but misses the exact β-hairpin register of the experimental native (left, PDB 5AWL). TM = 0.44, the classic small-protein limit.

What's actually new here: also nothing fundamental, ESMFold calibration has been studied. But I have not seen this specific plot for this specific small-protein PISCES subset published. A calibration data point for a future writeup, not a paper.

Experiment 3 - Steered MD on titin I27

Titin is the protein that makes muscles springy. Its I-band is built of ~244 immunoglobulin-like domains, and the 27th (I27 / I91) is the canonical mechanical-unfolding benchmark, pulled by atomic force microscopy (AFM) in classic papers by Rief, Marszalek, Carrion-Vazquez, and Fernandez through the late 1990s.

I built a steered MD pipeline in OpenMM that solvates I27 (PDB 1TIT, Improta et al. 1996; 85,750 atoms after TIP3P solvation + 0.15 M NaCl in a 7 × 7 × 18 nm elongated box), equilibrates, then attaches a harmonic spring (k = 5000 kJ/mol/nm² ≈ 830 pN/nm) between the N- and C-terminal Cα atoms. The spring's reference position moves at constant velocity, pulling the chain apart while OpenMM integrates at 300 K with a Langevin thermostat at a 2 fs timestep. I ran the same protocol at three pulling velocities spanning three orders of magnitude:

Velocity	Total sim time	Peak force	Peak at r =
0.1 nm/ps (fast)	300 ps	5946 pN	33.7 nm (post-rupture)
0.01 nm/ps (medium)	3000 ps	2316 pN	9.04 nm
0.001 nm/ps (slow)	30000 ps	2169 pN	36.3 nm (post-rupture)

The medium-velocity peak at r = 9.04 nm (Δ = 24 Å above the native end-to-end distance) is the canonical I27 rupture signature: the A'-G β-strand pair that forms the mechanical "clamp" snaps under load, releasing the rest of the fold.

Four cartoon snapshots of titin I27 through the medium-velocity pull: compact beta-sandwich at t=0, under tension at t=150 ps, rupture at the 2316 pN peak at t=270 ps, and a post-rupture intermediate beginning to extend at t=600 ps. — Four snapshots through the medium-velocity trajectory, same camera across panels. The compact β-sandwich (t = 0) holds under tension (t = 150 ps), ruptures as the force peaks at 2,316 pN (t = 270 ps), then begins to extend (t = 600 ps).

The number to take seriously: ~11× the AFM value

AFM measurements of I27 unfolding (Carrion-Vazquez et al., PNAS 1999) at a cantilever pull rate of 600 nm/s gave a most-probable rupture force of 204 ± 26 pN. My medium-velocity SMD pulls at 0.01 nm/ps = 10⁷ nm/s, about seven orders of magnitude faster than AFM. The peak force I measured, 2,316 pN, is 11.4× the AFM value. That is consistent with Bell-Evans loading-rate theory (Bell 1978; Evans & Ritchie 1997; Dudko, Hummer & Szabo 2006), which predicts the most-probable rupture force scales as:

F*  =  ΔG‡ / Δx‡   +   ( kT / Δx‡ ) · ln( v / v₀ )

so seven orders of magnitude in pulling rate map to a factor of roughly 5-15× in rupture force, depending on the position of the transition state. 11× sits in that band. What I am not claiming: that I fit Bell-Evans to my own data and showed the rate dependence quantitatively. I have three trajectories at three velocities, and the order of magnitude is right. A proper version would run 5-10 replicate trajectories at each velocity and fit the expression, a few more days of GPU time.

Force-extension curves

Force versus end-to-end distance for titin I27 at three pulling velocities. The fast curve peaks near 5900 pN, the medium near 2300 pN around r = 9 nm, the slow curve stays low until near the contour length. — Pulling force vs end-to-end distance at the three velocities. The medium curve (navy) shows the classic sawtooth peaking near 2,300 pN around r = 9 nm.

The fast curve (rust) is dissipation-dominated: at v = 0.1 nm/ps the chain is yanked far from equilibrium and the force scales with viscous drag more than with the unfolding barrier. The slow curve (olive) stays low until r is essentially the contour length, where the bond-stretching regime begins.

What's actually new here: still nothing fundamental. Lu, Schulten, Marszalek and others did the original SMD on I27 in the late 1990s, including the A'-G strand identification and the Bell-Evans analysis. What I have is a working, reproducible pipeline driven end to end, plus the result that single-trajectory pulling at this timescale recovers the right order of magnitude but not a clean Bell-Evans fit.

Experiment 4 - Chignolin equilibrium MD for 1 µs

Chignolin (and its mutant CLN025, the variant whose structure is PDB 5AWL) is a 10-residue designed mini-protein that folds into a β-hairpin (Honda et al. 2008). It is the smallest synthetic protein with a real folded native, and it became a benchmark for long MD because Lindorff-Larsen et al. (Science 2011) used the Anton supercomputer to fold it from an extended state in tens of microseconds.

I wanted to see one fold with my own eyes. The pipeline loads the native CLN025 PDB, denatures it (1 ns at 600 K in vacuum), re-solvates in TIP3P, equilibrates at 340 K (just above CLN025's melting temperature, so I would see fold/unfold dynamics rather than a frozen native), and runs 1 µs of production Langevin MD with 2 fs steps. Observables logged every 10 ps: Cα RMSD vs native, radius of gyration, and the fraction of native contacts Q (the standard reaction coordinate, Cα pairs within 7.5 Å in the native that stay within 1.2× that cutoff in a given frame).

Two-panel plot. Top: native contact fraction Q for chignolin over 1 microsecond, mostly near 1.0 with a few dips, mean 0.91. Bottom: Cα RMSD to native dropping under 1 Angstrom after about 550 ns, minimum 0.27 Angstrom at 646 ns. — Chignolin (CLN025) equilibrium folding, 1 µs at 340 K. Top: native contact fraction Q. Bottom: Cα RMSD to native. AMBER ff14SB + TIP3P, OpenMM 8.5, 100,000 frames.

100,000 observations over 1 µs:

⟨Q⟩ = 0.91 over the full trajectory.
⟨Q⟩ = 0.996 over the final 100 ns (essentially permanently folded by then).
Fraction of trajectory with Q > 0.7 (folded threshold) = 89.5%.
Minimum RMSD to native = 0.27 Å at t = 646 ns, atomically indistinguishable from the X-ray structure.
First sustained folded event at t = 1.04 ns.

Caveat

My denaturation step was too gentle. 1 ns at 600 K in vacuum did not fully unfold the β-hairpin; the protein refolded during the 300 ps NPT equilibration at 340 K. So I missed the canonical "fold from extended state" event that the Anton paper showed. What I captured instead is the equilibrium folded ensemble at 340 K: the small dips in Q around 100, 250, and 400 ns are real fold/unfold dynamics at the simulation temperature. The proper version is 5-10 ns at 1000 K denaturation followed by a quench to 340 K, worth doing if I want to claim the canonical result.

What's actually new here: nothing, again. Lindorff-Larsen 2011 did this and more on Anton. What I have is a working pipeline that runs 1 µs in 14 hours on a single GPU and produces a clean Q-vs-time trace I can point at when someone asks whether I have actually run molecular dynamics.

What I take from this

The infrastructure has matured. ESMFold weights are 8 GB on HuggingFace. pip install openmm gives you a working MD engine, pip install transformers a sequence-to-structure model. The hard problems are not installing things anymore; they are knowing which experiments to run and reporting the numbers as they are when you do.

Single structure is not enough. ESMFold returns one fold per sequence. For most well-folded proteins that is fine. For chignolin (TM = 0.44, where the model genuinely does not know the β-hairpin register) or any protein that switches between two states, single-structure prediction misses the point. The next-wave methods (BioEmu, AlphaFlow, JAMUN, the post-AlphaFold-3 work on conformational ensembles) are where this gets interesting.

Steered MD recovers the right physics at the wrong timescale. My SMD peak force is 11× the AFM value because I pull 10⁷× faster, exactly the scaling Bell-Evans predicts. Recovering equilibrium quantities (the free-energy surface vs extension, the rupture force at AFM rates) needs either much slower pulling (infeasible) or many replicates plus careful reweighting (Jarzynski / Hummer-Szabo). I have the multi-velocity dataset; the reweighter is half-built.

The bottleneck was never compute. Every result here ran on one A100. The hard part was deciding which experiment to do next, finishing it, and being precise about what the numbers mean. That is a thing nobody can buy.

Scope

Four reproductions, one pipeline, one A100, each validated against the published experimental or theoretical number. The I27 pulling result is a clean Bell-Evans inflation, exactly the loading-rate scaling the theory predicts. That validated pipeline is the foundation the next round of experiments builds on.

What's next

Multi-trajectory Jarzynski / Hummer-Szabo reweighting for the I27 pulling data, n = 5-10 replicates at v = 0.01 nm/ps, for a proper free-energy surface and a clean Bell-Evans fit.
Conformational ensemble sampling with BioEmu on the same 8-protein benchmark, side by side with ESMFold's single structure.
The proper chignolin "fold from extended state" experiment with 5-10 ns at 1000 K denaturation, to catch the actual folding event on tape.
Inverse folding with ProteinMPNN on a few small targets, closing the loop from structure back to sequence.

Resources

Code: github.com/nileshsarkar-ai/ProteinFolding - every script, figure-generation tool, and config used here.
Trajectory data: huggingface.co/datasets/nileshsarkar-ai/protein-folding-experiments - 3.1 GB of SMD trajectories and OpenMM state files, with a dataset card.
Live dashboard: charging-sadly-ether.ngrok-free.dev - FastAPI + py3Dmol + Plotly, three tabs (ESMFold benchmark, SMD, equilibrium MD), interactive 3D viewers, live pLDDT plots, and a paste-a-sequence-fold-it-now panel that runs ESMFold on demand.

Primary sources

ESMFold: Lin et al., "Evolutionary-scale prediction of atomic-level protein structure," Science 379, 1123-1130 (2023).
TM-score: Zhang & Skolnick, "Scoring function for automated assessment of protein structure template quality," Proteins 57, 702-710 (2004).
AlphaFold2: Jumper et al., Nature 596, 583-589 (2021).
AlphaFold3: Abramson et al., Nature 630, 493-500 (2024).
Titin I27 AFM: Carrion-Vazquez, Oberhauser, Fowler et al., "Mechanical and chemical unfolding of a single protein: a comparison," PNAS 96, 3694-3699 (1999).
Bell-Evans / DHS rupture-force kinetics: Dudko, Hummer, Szabo, PRL 96, 108101 (2006).
CLN025 chignolin: Honda et al., JACS 130, 15327-15331 (2008).
Anton folding benchmark: Lindorff-Larsen, Piana, Dror, Shaw, "How fast-folding proteins fold," Science 334, 517-520 (2011).
OpenMM: Eastman et al., PLOS Comput Biol 13, e1005659 (2017).

A ground-up primer: proteins, folding, and AI

The sections above are the project. What follows is the background that makes them readable, written from zero with no biology assumed. It runs in three parts: the biology, the physics of folding, and where AI stands in 2026. If you already know what a protein is and how AlphaFold works, you can skip it.

Part 1. The biology, from zero

What is inside a cell

A living cell is closer to a crowded soup than to the tidy diagram in a textbook. It is about three-quarters water, and the rest is molecules colliding tens of millions of times a second. Four families of molecule do nearly all the work. Water is the solvent that every reaction happens in. Sugars and lipids supply fuel and build the membranes that wrap the cell. Nucleic acids, DNA and RNA, carry information. Proteins are the workforce, and almost everything a cell actually does is done by a protein.

Proteins per human cell ~42 million molecules, across roughly 20,000 distinct types.

What a protein is A chain of amino acids that folds into one specific 3D shape. The shape is the function.

Where folding happens ~70% of the cell is water, and every folding event happens in that solvent.

Each protein job needs a particular shape. Enzymes speed up reactions that would otherwise take centuries. Structural proteins are the scaffolding: collagen in skin, keratin in hair, actin in muscle. Transporters carry cargo, the way hemoglobin carries oxygen. Receptors sit in the membrane and read signals from outside, so taste, smell, and sight all begin with one. Antibodies clamp onto invaders, motor proteins walk along filaments, hormones like insulin carry instructions between cells, and regulators switch genes on and off.

DNA, RNA, and the central dogma

A cell knows which proteins to make because the recipes are written in DNA. DNA is a chain of four letters, A, T, G, and C, about three billion of them in a human cell, paired into a double helix and packed into chromosomes. A gene is a stretch of those letters that spells out one protein. Humans have roughly 20,000 genes; most of the remaining DNA does regulatory and structural work.

Information flows in one direction, and it is worth remembering: DNA to RNA to protein. When a cell needs a protein, it copies the relevant gene into a single-stranded working copy called messenger RNA. That step is transcription. The messenger RNA travels to a ribosome, the cell's protein factory, where it is read three letters at a time. Each triplet, called a codon, maps to one of the 20 amino acids. The ribosome adds the matching amino acid to a growing chain, roughly 5 to 20 per second, and the chain begins folding before it has even finished coming out. That step is translation.

DNA    ...ATG GCC AGT TGC GAA GGT TAA...
        transcription
mRNA   ...AUG GCC AGU UGC GAA GGU UAA...
        translation (read in triplets)
amino   Met  Ala  Ser  Cys  Glu  Gly  stop

So a sequence written as MASCEG is shorthand for an order of amino acids the cell read from DNA, copied, and assembled into a chain. The cell does this millions of times a second. Once the chain is out, the part that matters here begins: it folds.

The 20-letter alphabet

Every protein is built from the same 20 amino acids. They share one backbone, three repeating atoms (nitrogen, carbon, carbon), and differ only in a side chain that gives each its character. Side chains range from a single hydrogen in glycine to a bulky ring in tryptophan. You do not need all 20 by name. You need their chemical flavors, because those flavors decide how a chain folds.

Flavor	Behavior in water	Residues
Hydrophobic	Oily. They bury away from water, and that burial is what drives folding.	L, I, V, F, M, W, A, C
Polar	Comfortable in water. They sit on the surface and form hydrogen bonds.	S, T, N, Q, Y, H
Charged	Surface-facing. They form salt bridges and bind DNA or other proteins.	K, R (positive), D, E (negative)
Special	Proline kinks the chain, glycine is very flexible, and cysteine can lock two points together with a disulfide bond.	P, G, C

One principle covers most of folding. When a chain collapses, hydrophobic residues hide in the core away from water while polar and charged residues face outward. That is hydrophobic collapse, and it explains the bulk of why a protein takes the shape it does. The precise sequence fills in the rest: which helices and sheets form, and how quickly.

Why exactly 20? The number is not magic, but it is not random either. Twenty side chains give enough chemical variety across size, charge, and water-friendliness, while staying small enough that a three-letter genetic code (64 possible codons) can spell them all with room to spare. A few organisms use a 21st or 22nd, but 20 is the standard set across life on Earth.

How a chain becomes a shape

The chain leaves the ribosome floppy and disordered, and within milliseconds for a small protein it settles into its folded shape. Folding happens in layers. First comes local structure: short stretches of 5 to 20 residues snap into one of two patterns held together by backbone hydrogen bonds. The alpha helix is a tight right-handed spiral, about 3.6 residues per turn, and makes up roughly 30 percent of a typical protein. The beta strand is a flat, extended ribbon that is only stable once it pairs with another strand to form a sheet, and accounts for about 20 percent. Loops and turns connect these pieces, and are often where the protein does its work.

While that local structure forms, the whole chain collapses. Hydrophobic side chains cluster into a core, polar ones move to the surface, and the helices and sheets pack against each other into a specific arrangement. That packed 3D form is the tertiary structure, and it is what people usually mean by the structure of a protein. Many proteins then assemble in groups, with several folded chains locking together into a quaternary structure. Hemoglobin, made of four chains, is the classic example.

"The native structure is determined by the amino acid sequence and the environment, not by the path of folding." This is Anfinsen's dogma, from work in the 1960s and early 1970s. Anfinsen unfolded a protein completely with chemicals, removed them, and watched it fold back to the same shape. The sequence alone held the answer. That is why predicting structure from sequence is possible at all, and AlphaFold, sixty years later, is the method that finally did it well.

Why proteins matter

Almost every disease is, at some level, a protein problem, and almost every drug works by binding a protein. Sickle cell anemia comes from a single amino acid change in hemoglobin. Cystic fibrosis comes from one deleted residue in a channel protein that then misfolds and never reaches the cell surface. Alzheimer's, Parkinson's, and the prion diseases are all misfolding and aggregation disorders. Most cancers are driven by mutations in proteins that control growth.

Drugs act on this same layer. Roughly 95 percent of approved drugs work by binding a specific protein and switching it on, switching it off, or blocking it. Designing one usually means identifying the protein, learning its 3D structure and especially its binding pocket, and finding a molecule that fits. Before AlphaFold, getting that structure could take years and a great deal of money through X-ray crystallography or cryo-electron microscopy. A good predicted structure now takes minutes, which is part of why the field moved so fast.

Structure is only the snapshot, though, and the snapshot is the easy part. Real proteins do not sit still. They breathe, open and close pockets, switch between states, and change shape when they bind a partner. About 30 percent of human proteins do not even fold into one stable structure. Capturing that motion, not just the still frame, is the open frontier, and it is exactly where these experiments sit.

Part 2. Folding as a physics problem

The energy landscape and the funnel

One question shaped this field for fifty years. Take a chain of 100 residues. Each can sit in roughly three local positions, so the number of possible shapes is about 3¹⁰⁰, near 10⁴⁷. For scale, that is far larger than the number of seconds since the Big Bang. If a protein had to try shapes one at a time, even a trillion per second, folding would take longer than the age of the universe many times over. Yet real proteins fold in milliseconds. Cyrus Levinthal pointed out this contradiction in 1969, and it is still called Levinthal's paradox.

The resolution, worked out by Wolynes, Onuchic, Bryngelson, and Dill in the 1990s, is that folding is not a search, it is a fall. The energy landscape is shaped like a funnel. Picture the wide top as the enormous number of unfolded shapes, all high in energy because hydrophobic residues are exposed to water. The single point at the bottom is the folded state, low in energy with every favorable contact made. From anywhere near the top, almost any step that lowers energy moves the chain downhill toward the bottom. The chain never enumerates its options. The slope carries it.

This funnel is not generic. Random sequences have flat, rugged landscapes and do not fold. Evolution kept the sequences whose landscapes were smooth enough to fold reliably and discarded the rest, which means foldability is itself an evolved property. It is also why AlphaFold works at all: foldable sequences are a small, structured slice of every possible sequence, full of statistical regularities that a large network can learn from the roughly 200 million natural sequences on record.

Molecular dynamics

If folding is physics, you can try to simulate the physics directly. That method is molecular dynamics, the workhorse of computational biophysics, and it is what produced the chignolin trajectory shown above. The idea is simple. Put the protein in a box of water, tens of thousands of atoms in all. Pick a force field that gives the force on every atom from its neighbors. Then step Newton's laws forward in tiny increments, updating every atom's position again and again.

The catch is the step size. Each step covers about two femtoseconds, two millionths of a billionth of a second, while folding plays out over milliseconds. Reaching one millisecond of simulated time takes hundreds of billions of steps, each recomputing forces across every atom. On a modern A100 GPU a small protein moves forward roughly a microsecond per day. A full millisecond would take about three years on the same card. In 2008 a special-purpose machine called Anton was built only to run molecular dynamics, and it remains the fastest. In 2011 it folded a dozen small proteins from scratch, chignolin among them.

Timestep 2 femtoseconds per integration step.

A100 throughput About 1 microsecond per day for a small protein.

A millisecond fold Hundreds of billions of steps, roughly three years on one card.

For most labs, molecular dynamics is best for local questions: how a binding site flexes, how a loop moves, how a mutation shifts the landscape. A force field is a hand-built formula with terms for bonds, angles, rotations, van der Waals attraction, and electrostatics, plus a separate model for water. These formulas are approximate and carry known biases, since some favor helices and others favor sheets, so picking the right one is part of the craft. A newer wave of machine-learned force fields, trained on quantum chemistry, trades higher cost for better accuracy.

Boltzmann and the ensemble

This is the idea worth slowing down for. A protein at room temperature is never frozen in one shape. It vibrates and occasionally visits other states, and the fraction of time it spends in any state depends on that state's energy through the Boltzmann distribution: the probability of a state falls off exponentially as its energy rises. A state 6 kcal/mol above the ground state is visited about one time in 22,000. So a protein spends nearly all its time at or near the lowest-energy shape and only briefly samples the rest.

P(state)  is proportional to  exp( -E / kT )

   E  = energy of the state
   kT = thermal energy, about 0.6 kcal/mol at body temperature

The precise meaning of the structure of a protein, then, is the whole set of shapes it visits at equilibrium, each weighted by its Boltzmann probability. That set is the conformational ensemble. For a tidy globular protein it clusters tightly around one shape. For a flexible or disordered protein it is broad. The ambition of the newest methods is to predict the entire distribution, not just its single most likely shape. That is the appeal of Boltzmann generators, introduced by Noé and colleagues in 2019: a model that samples directly from the Boltzmann distribution would hand you the populations, the thermodynamics, and the alternative states all at once.

One number makes this concrete. Q is the fraction of native contacts. List the residue pairs that touch in the folded structure, then count what fraction are present in any given snapshot. Q near 0 is unfolded, Q near 0.5 is the transition region, and Q near 1 is fully folded. It is a clean way to track folding over a trajectory, and it is the variable the chignolin run above reports a mean of 0.91 on.

Three problems, three kinds of math

The field has three central problems, and they are not equally hard.

Problem	Input	Output
Forward folding	Sequence	One structure, a single point
Dynamics	Sequence, or sequence plus condition	A distribution of structures
Inverse folding (design)	A desired structure	A sequence, or many, that fold to it

Forward folding outputs a single configuration, which is hard but bounded. Dynamics is harder, because a distribution can have several peaks of different widths that all need to come out weighted correctly. That is the central unsolved problem of 2026, and the current methods, including BioEmu, AlphaFlow, JAMUN, and MDGen, are all approximations. Inverse folding is, surprisingly, the most tractable, because many different sequences fold to the same shape, so the target is wide. ProteinMPNN solved it to high accuracy in 2022, and RFdiffusion added generation of the shape itself in 2023.

Part 3. AI for folding, where things stand in 2026

A short history

1961 to 1972. Anfinsen shows that sequence determines structure (Nobel Prize, 1972).
1969. Levinthal frames the paradox that folding cannot be a blind search.
1970s and 1980s. The first molecular dynamics of a protein (Karplus, Levitt, Warshel; Nobel Prize, 2013), and Dill's hydrophobic-collapse theory.
1990s. The folding-funnel picture resolves Levinthal conceptually.
1994. CASP begins, a blinded benchmark that drives the field for decades.
2000s. Rosetta leads structure prediction, and researchers learn to read co-evolution, where residues that mutate together turn out to sit close in 3D.
2018. AlphaFold 1 wins CASP13 but is still rough.
2020. AlphaFold 2 reaches near-experimental accuracy. For practical purposes the Anfinsen and Levinthal problem is solved (Hassabis and Jumper, Nobel Prize, 2024).
2021 to 2022. Faster alternatives arrive, including RoseTTAFold and ESMFold.
2022 to 2023. Design takes off with ProteinMPNN and RFdiffusion.
2024. AlphaFold 3, Boltz-1, and Chai-1 handle proteins together with DNA, RNA, and small molecules.
2025 to 2026. The focus shifts to dynamics and ensembles (BioEmu, AlphaFlow, DeepJump) and to binding affinity (Boltz-2).

Compressed, the story is sixty years of slow progress, a sudden jump around 2020, and a new frontier around motion opening since.

AlphaFold and what came after

AlphaFold takes a sequence and predicts its 3D structure. It first finds evolutionarily related sequences and builds a multiple sequence alignment, reads which residue pairs vary together (a strong hint that they sit close), refines a joint sequence-and-pair representation through a custom transformer, and finally turns that into atomic coordinates. The accuracy at CASP14 in 2020 was close enough to experiment to count as a step change.

Model	Year	Note
AlphaFold 2	2020	The breakthrough; needs a sequence alignment.
RoseTTAFold	2021	Open-source, from the Baker lab.
ESMFold	2022	Skips the alignment using a protein language model. Faster, slightly less accurate. This is the model used for the structure predictions above.
AlphaFold 3	2024	Adds DNA, RNA, and small molecules; diffusion-based; not fully open.
Boltz-1 and Boltz-2	2024 to 2025	Open AlphaFold 3-class models; Boltz-2 adds binding-affinity prediction.

ESMFold is fast for the same reason it is slightly less accurate. It replaces the slow alignment search with the embeddings of a pretrained protein language model it has already absorbed during training. That single substitution, alignment for language model, is the key design choice. AlphaFold is not magic, though, and it fails in characteristic ways. It forces a structure onto disordered proteins, returns one shape for proteins that really have several, misses changes that only happen on binding, and struggles with membrane proteins and with sequences that have no close relatives. Those failures are where the frontier now sits.

The frontier: ensembles and dynamics

If forward folding is solved enough for most globular proteins, the next mountain is motion: predicting not one structure but the full Boltzmann ensemble, the alternative states, and the changes a protein goes through as it works. Several methods are pushing on this. BioEmu, from Microsoft in 2025, generates equilibrium ensembles straight from sequence with free-energy errors around 1 kcal/mol. AlphaFlow adds conformational diversity on top of AlphaFold. JAMUN samples small-peptide ensembles quickly. DeepJump learns to advance a trajectory and reaches large speedups, which gets at kinetics rather than just populations. Boltzmann generators aim for the exact distribution through reweighting.

The clearest near-term trend is hybrid: use a neural model to propose good starting states, then let molecular dynamics supply the physics. A January 2026 result showed that seeding molecular dynamics with BioEmu samples captures transitions that plain simulation misses. None of these methods is finished. None yet gives trustworthy population weights, handles external forces cleanly, scales gracefully to membrane proteins, or predicts transition rates at scale. The area is wide open.

Design, and the datasets behind all of it

Design is the flip side of prediction. Instead of asking what a sequence folds into, you ask for a sequence that folds into a chosen shape and does a chosen job. Rosetta-era design was slow and rarely worked. ProteinMPNN in 2022 lifted success rates sharply, RFdiffusion in 2023 generated new backbones from scratch, and later work extended this to all-atom design, antibodies, and proteins that switch between two states on demand. De novo binders to viral and cancer targets, designed enzymes, and small molecular machines have all been built and tested in the lab. A cluster of well-funded companies now sits on top of these tools, which is where most of the commercial pull in the field is.

All of this rests on a few datasets, and it helps to know what each one is and what it cannot tell you.

Dataset	What it is	Used for
PDB	Every experimentally determined structure since 1971, about 230,000.	Training and ground truth for prediction.
UniRef / UniProt	Every known sequence, about 254 million.	Training language models and building alignments.
AlphaFold DB	Predicted structures for about 200 million proteins.	Background structures for almost any protein.
ATLAS and mdCATH	Molecular dynamics trajectories for thousands of proteins.	Training and testing dynamics models.
PISCES	A sequence-identity-balanced subset of the PDB.	Held-out benchmarks, like a calibration set.
CASP	A blinded benchmark; structures are revealed only after predictions.	Community scoring; where AlphaFold 2 won.

One limitation is worth keeping in mind. Most of these models learn from PDB structures, which are snapshots from crystals or cryo-electron microscopy, not proteins moving in a cell. A crystallized protein is pressed against its neighbors and often caught in a single state. For tidy globular proteins that hardly matters, but for flexible, multi-state, and membrane proteins it introduces a real bias. That is part of why the dynamics models train on simulation trajectories rather than crystal structures alone.

Nilesh Sarkar / Research