The nf-core/variantbenchmarking pipeline (https://github.com/nf-core/variantbenchmarking) is a versatile and comprehensive workflow designed to benchmark variant-calling tools across various use cases. Developed as part of the German Human Genome-Phenome Archive (GHGA) project, this pipeline supports the evaluation of small variants, insertions and deletions (indels), and structural variants for both germline and somatic samples.
Users can leverage publicly available truth datasets, such as Genome in a Bottle or SEQC2, for benchmarking or provide custom VCF files with or without specific regions of interest. The pipeline supports diverse normalization methods, including variant splitting, deduplication, left or right alignment, filtration, and different benchmarking tools such as hap.py, RTG Tools, Truvari, SVAnalyzer, or Witty.er. This flexibility enables tailored analyses to meet specific research needs. The workflow generates detailed performance metrics, such as precision, recall, and F1 scores, allowing researchers to accurately assess the strengths and limitations of their variant-calling workflows.
GHGA’s architecture is built on cloud computing infrastructures and includes an ethico-legal framework to ensure data protection compliance. GHGA enables researchers to conduct reproducible, rigorous, and secure research by standardizing bioinformatics workflows and governing reusability through harmonized metadata schemas.
Built using Nextflow, the nf-core/variantbenchmarking pipeline is scalable, reproducible, and compatible with diverse computational environments, including local systems, high-performance clusters, and cloud platforms. This ensures seamless integration with secure platforms like GHGA for smooth benchmarking analyses. Additionally, the pipeline is fully open source and adheres to nf-core community guidelines, ensuring high-quality, reviewed code, modularity, and extensibility.