NetModel parameters
This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:
Pick a non-default net by specifying the parameters:
Feature extraction
Define two sets of amino acid sequences originating from different protein families, namely enzymes and structural proteins:
Define the feature embeddings extractor using ESMFold:
Visualize the features of the protein sequences:
Advanced usage
When working with proteins composed of multiple chains (multimers), a glycine linker (commonly 25 residues long) is automatically inserted between chains to create a single continuous sequence suitable for structure prediction. Glycine is typically chosen due to its small size and minimal structural impact. To demonstrate this, we’ll use the antibody 3HFM, which consists of three chains:
Get the predicted structure and confidence score:
Visualize the structure:
Network result
The following diagram shows the modular flow of the ESMFold inference process, which takes a protein sequence and outputs its 3D structure:
The process starts with a raw amino acid sequence with two chains:
The encodeSequence function converts the input protein sequence into numerical lists representing amino acid types, residue indices, a linker mask and chain identifiers for each residue:
The pretrained ESM-2 language model extracts contextual embeddings from the input sequence:
The result includes initial single (s_s_0) and pair (s_z_0) representations. The single representation encodes features for each residue, while the pair representation encodes features for every residue pair, enabling joint local and relational information during folding:
The Folding Trunk is a deep neural network composed of 48 Folding Blocks followed by an additional eight-block Structure Module; together, they refine the internal representations through multiple recycling steps, improving the predicted structure progressively. Initialize input metadata (mask and residue indices) and set up tensors for recycling intermediate representations:
The recycling loop runs multiple times, each time passing the current representations and recycled values into the Folding Trunk model, which updates the representations and predicts the structure. Since the Folding Trunk model already includes the Structure Module internally, it returns both the refined internal features and the predicted atomic structure in a single call:
Visualize the initial single and pair representations:
The post-processing part predicts the final atomic coordinates, frames, angles and sidechains. It also outputs the per-residue confidence (pLDDT) and prepares the data for visualization (PDB string):
The final output includes atomic positions and confidence scores, used to render the 3D protein structure. Convert the output to PDB format:
Get the confidence score:
Plot the result: