Parameters file¶
Aside from the basic options (analysis name, pipeline to use, …) HyLiTE uses many default parameters.
These parameters are all referenced in the params.py file inside the hylite package.
Although the file itself can cast light on many parameter uses, we give a detailed account here for future developers to use. By convention, parameter variables name are uppercase.
VERSION = '2.0.2'
DEFAULT_NAME = "HyLiTE_"+VERSION+"_"
VERSION is the version number of HyLiTE. DEFAULT_NAME is a prefix for the name of a HyLiTE analysis in case the user does not specify any (it is followed by the current date).
HIGH_LEVEL_VERBOSE = False
HIGH_LEVEL_VERBOSE provides important debugging information
############
#SAMTOOLS variables
SAMTOOLS_PATH = ""
SAMTOOLS_VERSION = get_samtools_version(SAMTOOLS_PATH)
SAMTOOLS_NAME_VIEW = "samtools view -Sb"
SAMTOOLS_NAME_SORT = "samtools sort"
SAMTOOLS_NAME_INDEX = "samtools index"
SAMTOOLS_NAME_MPILEUP = "samtools mpileup"
SAMTOOLS_NAME_FAIDX = "samtools faidx"
SAMTOOLS_OPTION_IN = ""
SAMTOOLS_VIEW_OPTION_OUT = "-o"
SAMTOOLS_MPILEUP_OPTION_NOBAQ = '-B'
SAMTOOLS_MPILEUP_OPTION_MINQUAL = '-Q'
SAMTOOLS_MPILEUP_OPTION_MAXCOV = '-d'
SAMTOOLS_MPILEUP_OPTION_REFERENCE = '-f'
SAMTOOLS_MPILEUP_OPTION_SAMPLE = '-b'
SAMTOOLS_MPILEUP_OPTION_PAIRED = '-A'
if SAMTOOLS_VERSION == 0:
SAMTOOLS_OPTION_OUT = ""
elif SAMTOOLS_VERSION == 1:
SAMTOOLS_OPTION_OUT = "-o"
else:
raise IOError("The version of samtools is not recognized\n")
These parameters are used to manage differences of options between the different versions of samtools and to wrap the different samtools options used by HyLiTE. SAMTOOLS_PATH should be changed if you use a local insallation of samtools and have not added its directory to your $PATH environment variable. If samtools options were to change names, or usage, these should be updated too. The class using these parameters is samtools_wrapper class. The version of sammtools is determined by a function in utils.py
############
#bowtie2 variables
MISMATCH_DEFAULT = True
DEFAULT_PHRED64 = False
BOWTIE2_PATH = ''
BOWTIE2_NAME_BUILD = 'bowtie2-build'
BOWTIE2_NAME_ALIGN = 'bowtie2'
BOWTIE2_OPTION_UNPAIRED = '-U'
BOWTIE2_OPTION_PAIRED_1 = '-1'
BOWTIE2_OPTION_PAIRED_2 = '-2'
BOWTIE2_OPTION_MISMATCH = '-N 1'
BOWTIE2_OPTION_BASE = '-x'
BOWTIE2_OPTION_OUT = '-S'
BOWTIE2_OPTION_PHRED64 = '--phred64'
BOWTIE2_OPTION_THREAD = '-p'
BOWTIE2_BUILD_OPTION_REF = ''
BOWTIE2_BUILD_OPTION_OUT = ''
These parameters are the equivalent of the previous set, but for the usage of bowtie2. The class concerned is bowtie2_wrapper class.
############
#Generic variables
PHRED_STANDARD = 33
SAMPLE_TYPE = ('RNAseq','gDNA')
DEFAULT_SAMPLE_TYPE = 'RNAseq'
DEFAULT_NB_NODES = 1
VERBOSE = False
FULL_OUPUT = False
GENE_BETWEEN_PICKLE = 50 #number of new genes to proceed between the pickling operations
GENE_BETWEEN_WRITING = 1 #number of new genes to process between writing operations
PHRED_STANDARD is the number to add to a PHRED quality score to get its ASCII equivalent. Here, the default assumes the PHRED+33 standard.
SAMPLE_TYPE defines the authorized type of samples, currently RNAseq and gDNA. The difference between the two is that expression data are not computed for gDNA samples.
DEFAULT_SAMPLE_TYPE is set to RNAseq, this means that any sample type not defined in SAMPLE_TYPE is treated as RNAseq.
DEFAULT_NB_NODES is the number of nodes to use for the parallelization of the samtools pipeline; default is 1.
VERBOSE is the default value of the –verbose option.
FULL_OUPUT is the default value of the –full_output option.
GENE_BETWEEN_PICKLE is the number of genes between every pickling operation.
GENE_BETWEEN_WRITING is the number of genes between result writing and memory cleaning.
Warning
Setting a high number of genes between writing can make HyLiTE a memory hog.
Setting a number of genes between writing greater than the number of genes between pickling will make HyLiTE complexity exponential to the number of genes.
Neither operation is encouraged
#SNP detection
MIN_COVERAGE_HAPLOID = 3
MIN_COVERAGE_POLYPLOID = 20 #insures that differential expression up to a factor 17:3 will be correctly detected
EXPECTED_ERROR_RATE = 0.02/3. #0.02 total error rate. For each position, there is one true base and three errors possible, so each error base has probability error_rate/3
ALPHA = 0.001
#The current settings allows a good detection of the SNPs in haploids (parents)
# and can make false new SNPs appear in the child with an observed frequency of 0.003 per base.
#Note that the detection of new SNPs in the child just raises the N flag and will not change the parental association.
#It could however make a few chimeric reads appear in a randomly fashion.
These parameters are discussed at length in 2. Detecting SNPs.
############
#pileup format
BEGIN_CHAR = '^'
END_CHAR = '$'
REF_CHAR = (',','.','N') #we chose to include N in this even if it does not really mean reference
INDEL_CHAR = ('+','-')
EMPTY_CHAR = '*'
These parameters specify the special character of the pileup format.
BEGIN_CHAR indicates the beginning of a read.
END_CHAR, the end of a read.
REF_CHAR are the characters encountered when the base matches with the reference.
INDEL_CHAR specifies an indel.
EMPTY_CHAR specifies an absence of mapping at this position (it can also mean than we are inside a gap).