Format of the protocol file¶
The protocol file is an important part of every HyLiTE analysis. It allows the user to define:
the different organisms
their relationship to each other (parent, child)
any biological replicates
the files containing reads or alignments for every organism
For specific examples, please refer to the HyLiTE manual
General format¶
1. Columns¶
The columns of the protocol file are separated by tabs. They are, from left to right:
organism name
organism ploidy (an integer value)
sample name
data: RNAseq or gDNA
.fastq file containing the reads
Note
The protocol file contains no header line
The file names must include their complete paths
The data type is relevant for the computation of expression data
Using the option -S one can provide alignment files (.sam) instead of read files
2. Rows¶
Each line represents a single file, even in the case of paired-end reads or biological replicates.
Note
The child must be the first organism listed
The lines must be grouped by organism, then by sample
Using paired-end reads¶
For paired-end reads, the reads should be contained in two files. This is represented in the protocol file by two consecutive lines, with the same organism name and the same sample name, but with a different file name.
Using biological replicates¶
If you have several biological replicates for one or more organisms, just use the same organism name, but a different sample name. Using several biological replicates is particularly helpful as it increases the SNP detection sensitivity while HyLiTE still keeps track of the expression of every replicate individually.