Biology

GC Content

Calculates the ratio of the bases guanine (G) and cytosine (C) out of all four possible bases (adenine, guanine, cytosine, thymine (or uracil in RNA)) in a given DNA sequence.

The GC content of a sequence can be important to know for the planning of sequencing experiments, PCR reactions, and many other experiments that might be sensitive to DNA with an increased melting temperature, as the bases G and C increase a nucleotide’s melting temperature.

Input: Loaded nucleic acid sequence.

Output: GC content as float.

Molecular Weight of DNA

Calculates the molecular weight of a DNA sequence in Dalton.

The weight of a DNA sequence differs depending on if the DNA is assumed to be single or double-stranded, circular, or linear and depending on what distribution of isotopes is assumed for the atoms in the DNA.

Input: Loaded protein or nucleic acid sequence.

Input Parameters: Characteristics of the sequence can be ticked off if they apply (Double Stranded, Circular, Monoisotropic). A monoisotropic DNA sequence is assumed to only contain the most abundant naturally occurring stable isotope for each type of atom.

Output: Molecular weight as float.

Protein Stability

Computes the protein instability index based upon the observed frequency of dipeptides in different stable/unstable proteins. The larger the value, the more instable the protein.

It is a heuristic for the stability of protein sequences given the observed differences in dipeptide frequency between stable and unstable proteins. Values > 40 indicate short half-life of the protein.

Input: Loaded protein sequence.

Output: Protein instability index as float.

Citations:

Guruprasad, K., Reddy, B. B., & Pandit, M. W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering, Design and Selection, 4(2), 155-161. DOI: https://doi.org/10.1093/protein/4.2.155