Structure

Add SDF to PDB

Adds a small molecule to a protein structure. The docking between the molecule and the protein must be calculated beforehand, e.g., with the Diff Dock node. For the combined structure, the binding affinity can be calculated with Protein Ligand Binding Affinity (see Chemistry).

Input:

  • Molecule: The molecule to add to the protein structure as loaded SDF file.
  • Structure: The protein structure as loaded PDB structure.

Output: Combined structure.

Alpha Fold

Computes a protein tertiary structure using AlphaFold2.

Input: Fasta file containing a protein sequence for which the structure is to be predicted.

Input Parameters:

  • Max template date: Cutoff date for which structures from the Alpha Fold databases to include in the structural predictions. Can be useful for repeating older analyses.
  • Models to relax: Determines if the node will apply a relaxation step using molecular dynamics to either only the best predicted structure (structure 0), all predicted structures, or no predicted structure. A relaxation step can help refine the structure prediction of Alpha Fold by allowing for the optimization of bond lengths and other parameters, potentially generating a more accurate and stable structure.

Output: Predictions of all five Alpha Fold models as PDB files, ranked from best to worst prediction with the first prediction being the best.

Citations:

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. DOI: 10.1038/s41586-021-03819-2.

Alpha Fold Multimer

Predicts protein complex structures from multiple fasta sequences. Each fasta sequence in the supplied fasta file is assumed to be one chain of a multichain protein complex.

Input: Multifasta file to predict the structure of. Each fasta sequence in the supplied fasta file is assumed to be one chain of a multichain protein complex.

Input Parameters:

  • Number of models to relax: Models to relax. Options are all, none, and best.
  • max_template_date: Maximum age of the most recent template files to use. Useful for replicating older alphafold analyses on less recent databases. For an up-to-date database prediction, use the current data in the format yyyy-mm-dd.

Output:

  • Prediction1 to Prediction5: Separate structure outputs, as AlphaFold is comprised of five separate models, with each giving their own prediction, ordered by model confidence.

Diff Dock

Molecular docking using the Diff Dock-L diffusion machine learning model. Can dock small molecule ligands given in an SDF file into a given protein structure. The resulting docked ligand can be bound to the protein with the Add SDF to PDB node.

Input:

  • Receptor: The protein structure to dock the ligand to as loaded PDB structure.
  • Ligand: The small molecule to dock as loaded SDF structure (ChemicalStructure).

Output:

  • Docked Ligand: Ligand in predicted docked pose (ChemicalStructure, SDF file).
  • Confidence Score: Confidence score of the prediction as float.

Citations:

Corso, G., Deng, A., Fry, B., Polizzi, N., Barzilay, R., & Jaakkola, T. (2024). Deep Confident Steps to New Pockets: Strategies for Docking Generalization. arXiv preprint arXiv:2402.18396. DOI: https://doi.org/10.48550/arXiv.2402.18396.

Omega Fold

Computes a protein tertiary structure de novo (with no needed templates and multiple sequence alignment) using Omega Fold. This is a lot faster than computing the structure using Alpha Fold while being slightly less accurate.

Input: Protein sequence to predict the structure for.

Input Parameters: Subbatch size to use less VRAM. The subbatch size determines how much of the structure is computed in one computational batch. Larger batches can cause the computation of the protein structure to fail, if the protein sequence is too large. Set -1 to use the number of residues in the sequence and compute everything in one batch.

Output: Predicted protein structure.

Citations:

Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., ... & Peng, J. (2022). High-resolution de novo structure prediction from primary sequence. BioRxiv, 2022-07. DOI: https://doi.org/10.1101/2022.07.21.500999.

Output: Reverse complement of the input sequence as fasta file

Smiles To Structure

Converts a SMILES (Simplified molecular-input line-entry system) string, given as text, into an SDF format structure. The SDF file format stores three-dimensional structural data of molecules like drugs or metabolites.

Input Parameters:

  • SMILES: Smiles string to convert.
  • Optimize:Option that determines if xyna.bio should optimize the three-dimensional geometry of the molecule using molecular mechanics. This can be useful to obtain a more accurate three-dimensional structure of the molecule.

Output: Generated chemical structure as SDF file.