Nodes
Data Handling
Data handling nodes are nodes that serve as a file input to the graph. These nodes are a way to load data from a file into the pipeline by giving the name of the file. Different data handling nodes represent different filetypes known and used by xyna.bio.
Supported file formats
Fasta
Various -omics data types can usually be represented using the same file ending .fasta. This is inconsistent with the not compatible types of data stored and can lead to mistakes and chaining together incompatible tools. For this reason, xyna.bio automatically recognizes the contents of fasta files and infers the reference type for further processing. The reference type identity can also be manually assigned.
Data handling for fasta files is split between gene fasta files and protein fasta files (both with the ending .fasta).
Additionally, there are aligned fasta files and multi fasta files.
Aligned fasta files contain gene or protein sequences respectively that are the output of a multiple sequence alignment. These alignments in fasta format are allowed to contain “-” as a special character indicating a gap in the alignment.
Multi fasta files contain more than one sequence per file and can be manipulated using merge nodes.
Additional file formats
GenBank
The GenBank file format (.gb) allows for the storage of gene sequences along with additional information like region annotations, sample information and references to publications.
Loads a GenBank file containing a gene sequence along with annotations.
Input: filename
Output: GenBank file
PDB
The PDB file format (.pdb) contains three-dimensional structural data in the form of atomic coordinates. xyna.bio expects a pdb file to contain a protein structure.
Loads a PDB file containing a protein structure.
Input: filename
Output: PDB file
SDF
The SDF file format (.sdf) also contains three-dimensional structural data, but instead of protein structures it is more commonly used to store coordinates of smaller molecules like drugs or metabolites.
SDF and PDB files can be combined into one PDB file containing the information of both.
Loads an SDF file containing a molecule structure
Input: filename
Output: SDF file