Advanced: Writing a configuration file

To write a configuration file for eukrhythmic, you need to edit the config.yaml file included in the eukrhythmic base directory. This YAML-formatted file can be modified by changing the entries to the right of each of the colons in each line of the file.

Configuration file entries

Below is a listing of each supported entry in the configuration file (config.yaml in the base directory) and how to specify each flag when using the pipeline.

Title
Flag in file Meaning & how to specify
metaT_sample The name of the sample file containing sample ids to be used as unique identifiers in the pipeline, descriptive sample names, and input FASTA file names.
inputDIR The file directory where the input data is found. Currently, should be specified with “/” separators, but no trailing “/”. Should begin with “/” only if you are going to the root of your file system (not a relative path).
checkqual Boolean flag for whether to run quality checking with salmon, QUAST, BUSCO, etc. on assemblies. If 1, these quality checks are performed.
spikefile A path to a FASTA file containing the sequence of any spiking that might affect reads. This will depend on experimental setup. If the file is not valid (e.g., if this flag is set to 0), nothing is done.
runbbmap A boolean flag to specify whether to use a spike file to drop spiked reads, according to what was done in your experiment. If 1, the spikefile is used; otherwise, this filtering is either not performed or is not used downstream in the pipeline (depending on whether a spike file exists).
kmers A list of k-mer sizes to use, where applicable, in assembly. These should all be integer values (default: 20, 50, 110). The median k-mer value in this list will be used when just 1 k-mer value is required.
assemblers The assemblers to be used to assemble the metatranscriptomes (which will later be merged). All of the specified assemblers in this list should have matching Snakemake rules in the modules folder of the main pipeline directory (named identically), as well as “clean” rules (explained below).
jobname A descriptive name to be used to name jobs on your high-performance computing system, such that you can track the progress of your workflow.
adapter Path to a FASTA file containing the adapter used during sequencing. Defaults to a static adapter file in the static directory.
separategroups A boolean flag. If 1, specified assembly groups in the metaT_sample file are used to co-assemble raw files. Otherwise, each raw file is assembled separately regardless of what is specified in the “AssemblyGroup” column of the input file.
outputDIR The path to a directory where all program output will be stored.
assembledDIR The directory to move assembled files to, relative to the output directory. Defaults to “assembled”; not necessary to specify.
renamedDIR The directory to move “renamed” files to (which are files with the name of the assembler added to each FASTA header), relative to the output directory. Defaults to “assembled”; not necessary to specify.
scratch The location to move unnecessary intermediate files to after computation.