用于基因组学的 Databricks Runtime 7.4Databricks Runtime 7.4 for Genomics

Databricks 于 2020 年 11 月发布了此映像。Databricks released this image in November 2020.

用于基因组学的 Databricks Runtime 7.4 是 Databricks Runtime 7.4 的一个版本,已针对基因组和生物医学数据的处理进行了优化。Databricks Runtime 7.4 for Genomics is a version of Databricks Runtime 7.4 optimized for working with genomic and biomedical data. 它是用于基因组学的 Databricks 统一分析平台的组件。It is a component of the Databricks Unified Analytics Platform for Genomics.

有关详细信息(包括有关创建用于基因组学的 Databricks Runtime 群集的说明),请参阅用于基因组学的 Databricks RuntimeFor more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics. 若要详细了解如何开发基因组学应用程序,请参阅基因组学指南For more information on developing genomics applications, see Genomics guide.

新增功能New features

用于基因组学的 Databricks Runtime 7.4 是在 Databricks Runtime 7.4 基础上构建的。Databricks Runtime 7.4 for Genomics is built on top of Databricks Runtime 7.4. 若要了解 Databricks Runtime 7.4 中的新增功能,请参阅 Databricks Runtime 7.4 发行说明。For information on what’s new in Databricks Runtime 7.4, see the Databricks Runtime 7.4 release notes.

用于二进制特征的 GloWGRGloWGR for binary traits

GloWGR 现可针对二进制特征拟合全基因组回归模型。GloWGR can now fit whole genome regression models for binary traits.

逻辑回归函数接受 offset 参数Logistic regression function accepts offset parameter

logistic_regression_gwas 函数现接受 offset 参数。The logistic_regression_gwas function now accepts an offset parameter. 此参数等效于系数固定为 1 的特征。This parameter is equivalent to a feature with a fixed coefficient of 1. 似然比检验和 Firth 惩罚似然比检验均遵从此参数。Both the likelihood ratio test and Firth penalized likelihood ratio test respect this parameter. GloWGR 的输出应作为 offset 传递。The output of GloWGR should be passed as an offset.

Hail 支持Hail support

用于基因组学的 Databricks Runtime 7.4 是 7.x 系列中第一个打包了对 Hail 的支持的版本。Databricks Runtime 7.4 for Genomics is the first release in the 7.x line to package support for Hail.

改进Improvements

GloWGR 的简便函数GloWGR convenience functions

GloWGR 中的 RidgeRegressionLogisticRegression 现提供 transform_loco 函数,用于生成 leave-one-chomosome-out (LOCO) 预测。The RidgeRegression and LogisticRegression classes in GloWGR now provide a transform_loco function to generate leave-one-chomosome-out (LOCO) predictions. 此外,GloWGR 现还包含一个 reshape_for_gwas 函数,用于将 GloWGR 中的预测转换为 Glow 中的关联检验可接受的形式。In addition, GloWGR now includes a reshape_for_gwas function to reshape the predictions from GloWGR into a form that the association tests in Glow can accept.

GloWGR 可用性改进GloWGR usability improvements

用于数量和二进制特征的 GloWGR 现提供更好的性能,验证失败时,还提供更清晰的错误消息。GloWGR for quantitative and binary traits now provides better performance and clearer error messages in the case of validation failures.

更快的 VCF 读取器Faster VCF reader

用于基因组学的 Databricks Runtime 7.4 包含快速的实验性 VCF 读取器。Databricks Runtime 7.4 for Genomics includes an experimental fast VCF reader. 你可以在笔记本或群集配置中将 Spark 配置 io.projectglow.vcf.fastReaderEnabled 设置为 true 来激活新的读取器。You can activate the new reader by setting the Spark configuration io.projectglow.vcf.fastReaderEnabled to true in a notebook or cluster configuration.

Libraries

以下部分列出了用于基因组学的 Databricks Runtime 7.4 中包含的库,这些库不同于 Databricks Runtime 7.4 中包含的库。The following sections list the libraries included in Databricks Runtime 7.4 for Genomics that differ from those included in Databricks Runtime 7.4.

已升级的库Upgraded libraries

  • Hail:从 0.2.40 升级到了 0.2.58Hail: 0.2.40 to 0.2.58

已打包的库Packaged libraries

Library 版本Version
ADAMADAM 0.32.00.32.0
GATKGATK 4.1.4.14.1.4.1
冰雹Hail 0.2.580.2.58
Hadoop-bamHadoop-bam 7.9.27.9.2
samtoolssamtools 1.91.9
VEPVEP 9696