Making a more general-purpose Nextflow pipeline
Generalizing the Hello pipeline
Learning outcomes
After having completed this chapter you will be able to:
- Understand how
nextflow.configcontrols parameters, resources and profiles for thehello-pipeline. - Explain how splitting processes into separate module files improves structure and re-use.
- Recognise how dynamic file naming works in process
outputdirectives andscriptblocks.
Material
Overview of the hello-pipeline project
Let’s go to the directory then:
cd /workspaces/nextflow-training/exercises/hello-pipeline
code .
The hello-pipeline example (under exercises/hello-pipeline/) is a small but realistic DSL2 workflow that illustrates:
- Configuration in
nextflow.config(parameters, process resources, profiles, outputs). - Modular code structure using a
modules/folder. - Dynamic file naming based on values flowing through the pipeline.
These are the files in the directory
hello-pipeline
├── hello-pipeline.nf
├── modules
│ ├── collectGreetings.nf
│ ├── convertToUpper.nf
│ ├── cowpy.nf
│ └── sayHello.nf
└── nextflow.config
The main entry point is:
hello-pipeline.nf(workflow definition)nextflow.config(configuration)modules/*.nf(process modules)
The workflow at a glance
In hello-pipeline.nf, the workflow:
- Reads greetings from a CSV file (
params.input). - Emits a greeting per line (
sayHello). - Converts greetings to upper case (
convertToUpper). - Collects all greetings into one file and a small report (
collectGreetings). - Generates ASCII “art” for the final greetings using
cowpy(cowpymodule).
The dataflow and basic DSL2 concepts are covered in the Introduction to Nextflow; here we zoom in on:
- How configuration, modules and filenames are wired together.
- How changing
nextflow.configchanges the behaviour of the same code. - How Channels are created.
hello-pipeline.nf
| hello-pipeline.nf | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
nextflow.config
| nextflow.config | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
Channels
There is a variety of channel factories that we can use to set up a channel cosnidering that they are built in a way that allows us to operate on their contents using operators. You may have noticed the way in which the input file is read: this is a standard practice to read csv files, in which first channel.fromPath is called (Channel factories), and then the operators splitCsv() and map{} are invoked. This is done to generate a file name dynamically so that the final file names will be unique. We will cover more about operators in the next session.
| hello-pipeline.nf | |
|---|---|
13 14 15 | |
Make file names unique
A common way to make the file names unique is to use some unique piece of metadata from the inputs (received from the input channel) as part of the output file name. Here, for convenience, we’ll just use the greeting itself since it’s just a short string, and prepend it to the base output filename.
From hello-pipeline.nf, keep in mind that each process like sayHello, convertToUpper, etc., is creating a channel through which the output(s) are flowing.
| hello-pipeline.nf | |
|---|---|
17 18 19 20 21 | |
Modules: processes in separate files
In hello-pipeline.nf, each process is defined in its own module file under modules/, and then imported at the top of hello-pipeline.nf:
| hello-pipeline.nf | |
|---|---|
4 5 6 7 8 | |
This structure:
- Keeps each process short and focused.
- Makes it easier to reuse a process in other pipelines.
- Separates workflow logic (
hello-pipeline.nf) from implementation details (modules/*.nf).
Same pattern at larger scale
In real projects, you often have:
- A
workflow/ormodules/folder with many small process modules. - One or a few entry-point workflows that glue them together using
includestatements.
Dynamic naming of files in modules
The hello-pipeline shows several patterns for dynamic filenames, where outputs depend on values such as:
- The greeting text itself.
- The batch name.
- The input file name.
Filenames based on the greeting
In modules/sayHello.nf:
- The process receives a value
greeting.
| modules/sayHello.nf | |
|---|---|
6 7 | |
- The
outputdirective declares a file path using string interpolation:
| modules/sayHello.nf | |
|---|---|
9 10 | |
- The
scriptblock uses the same pattern when writing the file.
This ensures:
- Each input greeting produces its own output file.
- Filenames are traceable (the greeting is visible in the filename).
Filenames derived from inputs
In modules/convertToUpper.nf:
- The process receives a path
input_file.
| modules/convertToUpper.nf | |
|---|---|
6 7 | |
- The
outputdirective uses:
| modules/convertToUpper.nf | |
|---|---|
10 11 | |
Nextflow replaces ${input_file} with the actual filename, so if the input is hello-output.txt the output is:
UPPER-hello-output.txt
This pattern:
- Preserves the original filename.
- Adds a clear prefix to show which step produced the file.
Using parameters in filenames
In modules/collectGreetings.nf:
- Inputs:
| modules/collectGreetings.nf | |
|---|---|
6 7 8 | |
- Outputs:
| modules/collectGreetings.nf | |
|---|---|
10 11 12 | |
Here:
- The batch name is inserted into the output filenames.
- The
emitlabels (outfile,report) let the workflow refer to each output by name instead of by position.
Exercise: Stop for a moment and think: where is the batch name coming from?
Answer
Remember that everything is controlled now from nextflow.config, and in such file the parameter params.batch was stated.
- Changing
--batchchanges both output directory and filenames, keeping different runs tidy.
Prefixing filenames with the tool name
In modules/cowpy.nf:
Line to specify the container
For now, just ignore line 4 where the container is specified:
| modules/cowpy.nf | |
|---|---|
4 | |
- Input:
| modules/cowpy.nf | |
|---|---|
6 7 8 | |
- Output:
| modules/cowpy.nf | |
|---|---|
10 11 | |
Again, ${input_file} is expanded to the actual filename of the collected greetings file, and the cowpy- prefix indicates which process created this output.
Dynamic naming in the workflow output block
The hello-pipeline.nf file uses a final output block to publish results. Instead of hard-coding paths, it uses:
| hello-pipeline.nf | |
|---|---|
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
The .name property refers to the underlying process or module name:
- This keeps the final publication logic short and consistent.
- If you rename a process, you only update it in one place.
Centralizing settings
The nextflow.config file controls:
- Default process resources (memory, CPUs).
- Pipeline parameters (input file, batch name, cow character).
- Where outputs are published.
Process resources and per-process overrides
The process block sets default resources for all processes, and then overrides them for specific ones:
Line to enable docker
For now, just ignore line 1 where docker is enabled:
| nextflow.config | |
|---|---|
1 | |
| nextflow.config | |
|---|---|
6 7 8 9 10 11 12 | |
- Global default:
memory = 1.GBfor all processes. - Process specific: The process
cowpywill have dedicated resources withmemory = 2.GBandcpus = 2.
Thus:
- Most processes run with 1 GB RAM and 1 CPU (implicit default).
cowpyis given more memory and CPUs to handle container start-up and text generation.
GitHub Codespaces
Keep in mind that we are using the free tier of GitHub Codespaces with only 2 CPUs.
Per-process tuning pattern
This pattern (process { ... withName: 'X' { ... } }) is common in larger pipelines:
- Start from sensible global defaults.
- Override only the heavy or special processes.
Pipeline parameters
The params block defines defaults:
| nextflow.config | |
|---|---|
17 18 19 20 21 | |
params.input: path to the input CSV file (e.g.data/greetings.csv).params.batch: a short name for the current batch/run (used in filenames and output directory).params.character: which cowpy character to use in the final ASCII art.
Exercise: Can you imagine how it is possible to override these parameters?
Override parameters
Answer
You can override any of these at run time:
nextflow run hello-pipeline.nf \
--input custom_data/my_greetings.csv \
--batch friday_fun \
--character tux
This will:
- Read greetings from
custom_data/my_greetings.csv. - Use
friday_funin output filenames and directories. - Render ASCII art using the
tuxcharacter.
Output configuration
At the end of nextflow.config you will find:
| nextflow.config | |
|---|---|
27 28 | |
- outputDir: all published outputs are grouped under a directory named after the batch.
- workflow.output.mode: outputs are copied (not symlinked) into the final results directory.
Combined with the output block in hello-pipeline.nf, this means:
- If
params.batch = 'batch', outputs are written underresults/batch/. - Changing
--batchcreates a clean new results folder for each run.
Exercise: Try to apply what you have learned so far to reuse the process copy_file developed during Introduction to Nextflow. The goal is to copy the file generated by the last process (cowpy) in this pipeline.
Organize your ideas
- Create the module where the others modules are located.
- Call that module in the
hello-pipeline.nf. - Identify which output the module will take from
cowpy. - Adjust the module if necessary to process the output from
cowpy. - Define what is going to be generated within the
copy_fileand where it is going to be published. - State in
hello-pipeline.nfwhere the output directory.
Answer
Find below how the files would be modified.
modules/copy_file.nf
| modules/copy_file.nf | |
|---|---|
1 2 3 4 5 6 7 8 9 10 | |
hello-pipeline.nf
| hello-pipeline.nf | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
Summary
nextflow.configcentralizes parameters, resources and output settings, allowing you to switch environments without changing code.- Splitting processes into a
modules/folder keeps the workflow modular, readable and reusable. - Dynamic filenames (using string interpolation) make outputs self-describing and integrate naturally with
paramssuch asbatch.