Parameters file

Aside from the basic options (analysis name, pipeline to use, …) HyLiTE uses many default parameters.

These parameters are all referenced in the params.py file inside the hylite package.

Although the file itself can cast light on many parameter uses, we give a detailed account here for future developers to use. By convention, parameter variables name are uppercase.

VERSION = '2.0.2'
DEFAULT_NAME = "HyLiTE_"+VERSION+"_"

VERSION is the version number of HyLiTE. DEFAULT_NAME is a prefix for the name of a HyLiTE analysis in case the user does not specify any (it is followed by the current date).

HIGH_LEVEL_VERBOSE = False

HIGH_LEVEL_VERBOSE provides important debugging information

############
#SAMTOOLS variables
SAMTOOLS_PATH = ""
SAMTOOLS_VERSION = get_samtools_version(SAMTOOLS_PATH)
SAMTOOLS_NAME_VIEW = "samtools view -Sb"
SAMTOOLS_NAME_SORT = "samtools sort"
SAMTOOLS_NAME_INDEX = "samtools index"
SAMTOOLS_NAME_MPILEUP = "samtools mpileup"
SAMTOOLS_NAME_FAIDX = "samtools faidx"
SAMTOOLS_OPTION_IN = ""
SAMTOOLS_VIEW_OPTION_OUT = "-o"
SAMTOOLS_MPILEUP_OPTION_NOBAQ = '-B'
SAMTOOLS_MPILEUP_OPTION_MINQUAL = '-Q'
SAMTOOLS_MPILEUP_OPTION_MAXCOV = '-d'
SAMTOOLS_MPILEUP_OPTION_REFERENCE = '-f'
SAMTOOLS_MPILEUP_OPTION_SAMPLE = '-b'
SAMTOOLS_MPILEUP_OPTION_PAIRED = '-A'
if SAMTOOLS_VERSION == 0:
    SAMTOOLS_OPTION_OUT = ""
elif SAMTOOLS_VERSION == 1:
    SAMTOOLS_OPTION_OUT = "-o"
else:
    raise IOError("The version of samtools is not recognized\n")

These parameters are used to manage differences of options between the different versions of samtools and to wrap the different samtools options used by HyLiTE. SAMTOOLS_PATH should be changed if you use a local insallation of samtools and have not added its directory to your $PATH environment variable. If samtools options were to change names, or usage, these should be updated too. The class using these parameters is samtools_wrapper class. The version of sammtools is determined by a function in utils.py

############
#bowtie2 variables
MISMATCH_DEFAULT = True
DEFAULT_PHRED64 = False

BOWTIE2_PATH = ''
BOWTIE2_NAME_BUILD = 'bowtie2-build'
BOWTIE2_NAME_ALIGN = 'bowtie2'

BOWTIE2_OPTION_UNPAIRED = '-U'
BOWTIE2_OPTION_PAIRED_1 = '-1'
BOWTIE2_OPTION_PAIRED_2 = '-2'
BOWTIE2_OPTION_MISMATCH = '-N 1'
BOWTIE2_OPTION_BASE = '-x'
BOWTIE2_OPTION_OUT = '-S'
BOWTIE2_OPTION_PHRED64 = '--phred64'
BOWTIE2_OPTION_THREAD = '-p'

BOWTIE2_BUILD_OPTION_REF = ''
BOWTIE2_BUILD_OPTION_OUT = ''

These parameters are the equivalent of the previous set, but for the usage of bowtie2. The class concerned is bowtie2_wrapper class.

############
#Generic variables
PHRED_STANDARD = 33
SAMPLE_TYPE = ('RNAseq','gDNA')
DEFAULT_SAMPLE_TYPE = 'RNAseq'
DEFAULT_NB_NODES = 1
VERBOSE = False
FULL_OUPUT = False
GENE_BETWEEN_PICKLE = 50 #number of new genes to proceed between the pickling operations
GENE_BETWEEN_WRITING = 1 #number of new genes to process between writing operations

Warning

  • Setting a high number of genes between writing can make HyLiTE a memory hog.

  • Setting a number of genes between writing greater than the number of genes between pickling will make HyLiTE complexity exponential to the number of genes.

  • Neither operation is encouraged

#SNP detection
MIN_COVERAGE_HAPLOID = 3
MIN_COVERAGE_POLYPLOID = 20 #insures that differential expression up to a factor 17:3 will be correctly detected

EXPECTED_ERROR_RATE = 0.02/3. #0.02 total error rate. For each position, there is one true base and three errors possible, so each error base has probability error_rate/3
ALPHA = 0.001
#The current settings allows a good detection of the SNPs in haploids (parents)
# and can make false new SNPs appear in the child with an observed frequency of 0.003 per base.
#Note that the detection of new SNPs in the child just raises the N flag and will not change the parental association.
#It could however make a few chimeric reads appear in a randomly fashion.

These parameters are discussed at length in 2. Detecting SNPs.

############
#pileup format
BEGIN_CHAR = '^'
END_CHAR = '$'
REF_CHAR = (',','.','N') #we chose to include N in this even if it does not really mean reference
INDEL_CHAR = ('+','-')
EMPTY_CHAR = '*'

These parameters specify the special character of the pileup format.