预打包的 SnpEff 批注管道 Pre-packaged SnpEff annotation pipeline

设置Setup

SnpEff (v4.3) 作为 Azure Databricks 作业运行。Run SnpEff (v4.3) as an Azure Databricks job. 最有可能的是,Azure Databricks 解决方案架构师将为你设置初始作业。Most likely, an Azure Databricks solutions architect will set up the initial job for you. 必要的详细信息如下:The necessary details are:

基准Benchmarks

该管道已在 1000 基因组项目的 8520 万个变体站点上使用以下群集配置进行了测试:The pipeline has been tested on 85.2 million variant sites from the 1000 Genomes project using the following cluster configurations:

  • 驱动程序:Standard_DS13_v2Driver: Standard_DS13_v2
  • 辅助角色:Standard_D32s_v3 * 7(224 个核心)Workers: Standard_D32s_v3 * 7 (224 cores)
  • 运行时:2.5 小时Runtime: 2.5 hours

参数Parameters

管道接受多个控制其行为的参数。The pipeline accepts a number of parameters that control its behavior. 这里记录了最重要且最常更改的参数;其余参数可以在 SnpEff 批注管道笔记本中找到。The most important and commonly changed parameters are documented here; the rest can be found in the SnpEff Annotation pipeline notebook. 可以为所有运行或单次运行设置所有参数。All parameters can be set for all runs or per-run.

参数Parameter 默认Default 描述Description
inputVariantsinputVariants 不适用n/a 输入变量(VCF 或 Delta Lake)的路径。Path of input variants (VCF or Delta Lake).
输出output 不适用n/a 应将管道输出写入到的路径。The path where pipeline output should be written.
exportVCFexportVCF falsefalse 如果为 true,则管道会将结果写入 VCF 以及 Delta Lake。If true, the pipeline writes results in VCF as well as Delta Lake.
exportVCFAsSingleFileexportVCFAsSingleFile falsefalse 如果为 true,则将 VCF 作为单一文件导出If true, exports VCF as single file

此外,必须使用环境变量来配置参考基因组。In addition, you must configure the reference genome using environment variables. 若要使用 Grch37,请设置环境变量:To use Grch37, set the environment variable:

refGenomeId=grch37

若要改用 Grch38,请按如下所示设置环境变量:To use Grch38 instead, set an environment variable like this:

refGenomeId=grch38

输出Output

带批注的变体将写出到提供的输出目录内的 Delta 表中。The annotated variants are written out to Delta tables inside the provided output directory. 如果将管道配置为导出到 VCF,这些变体也会显示在输出目录下。If you configured the pipeline to export to VCF, they’ll appear under the output directory as well.

output
|---annotations
    |---Delta files
|---annotations.vcf

SnpEff 批注管道笔记本 SnpEff annotation pipeline notebook

获取笔记本Get notebook