Databricks Runtime 7.0 (unsupported)

Databricks released this image in June 2020.

The following release notes provide information about Databricks Runtime 7.0, powered by Apache Spark 3.0.

New features

Databricks Runtime 7.0 includes the following new features:

  • Scala 2.12

    Databricks Runtime 7.0 upgrades Scala from 2.11.12 to 2.12.10. The change list between Scala 2.12 and 2.11 is in the Scala 2.12.0 release notes.

  • Auto Loader (Public Preview), released in Databricks Runtime 6.4, has been improved in Databricks Runtime 7.0

    Auto Loader gives you a more efficient way to process new data files incrementally as they arrive on a cloud blob store during ETL. This is an improvement over file-based structured streaming, which identifies new files by repeatedly listing the cloud directory and tracking the files that have been seen, and can be very inefficient as the directory grows. Auto Loader is also more convenient and effective than file-notification-based structured streaming, which requires that you manually configure file-notification services on the cloud and doesn't let you backfill existing files. For details, see What is Auto Loader?.

    On Databricks Runtime 7.0 you no longer need to request a custom Databricks Runtime image in order to use Auto Loader.

  • COPY INTO (Public Preview), which lets you load data into Delta Lake with idempotent retries, has been improved in Databricks Runtime 7.0

    Released as a Public Preview in Databricks Runtime 6.4, the COPY INTO SQL command lets you load data into Delta Lake with idempotent retries. To load data into Delta Lake today you have to use Apache Spark DataFrame APIs. If there are failures during loads, you have to handle them effectively. The new COPY INTO command provides a familiar declarative interface to load data in SQL. The command keeps track of previously loaded files and you safely re-run it in case of failures. For details, see COPY INTO.

Improvements

  • Azure Synapse (formerly SQL Data Warehouse) connector supports the COPY statement.

    The main benefit of COPY is that lower privileged users can write data to Azure Synapse without needing strict CONTROL permissions on Azure Synapse.

  • The %matplotlib inline magic command is no longer required to display Matplolib objects inline in notebook cells. They are always displayed inline by default.

  • Matplolib figures are now rendered with transparent=False, so that user-specified backgrounds are not lost. This behavior can be overridden by setting Spark configuration spark.databricks.workspace.matplotlib.transparent true.

  • When running Structured Streaming production jobs on High Concurrency mode clusters, restarts of a job would occasionally fail, because the previously running job wasn't terminated properly. Databricks Runtime 6.3 introduced the ability to set the SQL configuration spark.sql.streaming.stopActiveRunOnRestart true on your cluster to ensure that the previous run stops. This configuration is set by default in Databricks Runtime 7.0.

Major library changes

Python packages

Major Python packages upgraded:

  • boto3 1.9.162 -> 1.12.0
  • matplotlib 3.0.3 -> 3.1.3
  • numpy 1.16.2 -> 1.18.1
  • pandas 0.24.2 -> 1.0.1
  • pip 19.0.3 -> 20.0.2
  • pyarrow 0.13.0 -> 0.15.1
  • psycopg2 2.7.6 -> 2.8.4
  • scikit-learn 0.20.3 -> 0.22.1
  • scipy 1.2.1 -> 1.4.1
  • seaborn 0.9.0 -> 0.10.0

Python packages removed:

  • boto (use boto3)
  • pycurl

Note

The Python environment in Databricks Runtime 7.0 uses Python 3.7, which is different from the installed Ubuntu system Python: /usr/bin/python and /usr/bin/python2 are linked to Python 2.7 and /usr/bin/python3 is linked to Python 3.6.

R packages

R packages added:

  • broom
  • highr
  • isoband
  • knitr
  • markdown
  • modelr
  • reprex
  • rmarkdown
  • rvest
  • selectr
  • tidyverse
  • tinytex
  • xfun

R packages removed:

  • abind
  • bitops
  • car
  • carData
  • doMC
  • gbm
  • h2o
  • littler
  • lme4
  • mapproj
  • maps
  • maptools
  • MatrixModels
  • minqa
  • mvtnorm
  • nloptr
  • openxlsx
  • pbkrtest
  • pkgKitten
  • quantreg
  • R.methodsS3
  • R.oo
  • R.utils
  • RcppEigen
  • RCurl
  • rio
  • sp
  • SparseM
  • statmod
  • zip

Java and Scala libraries

  • Apache Hive version used for handling Hive user-defined functions and Hive SerDes upgraded to 2.3.
  • Previously Azure Storage and Key Vault jars were packaged as part of Databricks Runtime, which would prevent you from using different versions of those libraries attached to clusters. Classes under com.microsoft.azure.storage and com.microsoft.azure.keyvault are no longer on the class path in Databricks Runtime. If you depend on either of those class paths, you must now attach Microsoft Azure Storage SDK or Azure Key Vault SDK to your clusters.

Behavior changes

This section lists behavior changes from Databricks Runtime 6.6 to Databricks Runtime 7.0. You should be aware of these as you migrate workloads from lower Databricks Runtime releases to Databricks Runtime 7.0 and above.

Spark behavior changes

Because Databricks Runtime 7.0 is the first Databricks Runtime built on Spark 3.0, there are many changes that you should be aware of when you migrate workloads from Databricks Runtime 5.5 LTS or 6.x, which are built on Spark 2.4. These changes are listed in the "Behavior changes" section of each functional area in the Apache Spark section of this release notes article:

Other behavior changes

  • The upgrade to Scala 2.12 involves the following changes:

    • Package cell serialization is handled differently. The following example illustrates the behavior change and how to handle it.

      Running foo.bar.MyObjectInPackageCell.run() as defined in the following package cell will trigger the error java.lang.NoClassDefFoundError: Could not initialize class foo.bar.MyObjectInPackageCell$

      package foo.bar
      
      case class MyIntStruct(int: Int)
      
      import org.apache.spark.sql.SparkSession
      import org.apache.spark.sql.functions._
      import org.apache.spark.sql.Column
      
      object MyObjectInPackageCell extends Serializable {
      
        // Because SparkSession cannot be created in Spark executors,
        // the following line triggers the error
        // Could not initialize class foo.bar.MyObjectInPackageCell$
        val spark = SparkSession.builder.getOrCreate()
      
        def foo: Int => Option[MyIntStruct] = (x: Int) => Some(MyIntStruct(100))
      
        val theUDF = udf(foo)
      
        val df = {
          val myUDFInstance = theUDF(col("id"))
          spark.range(0, 1, 1, 1).withColumn("u", myUDFInstance)
        }
      
        def run(): Unit = {
          df.collect().foreach(println)
        }
      }
      

      To work around this error, you can wrap MyObjectInPackageCell inside a serializable class.

    • Certain cases using DataStreamWriter.foreachBatch will require a source code update. This change is due to the fact that Scala 2.12 has automatic conversion from lambda expressions to SAM types and can cause ambiguity.

      For example, the following Scala code can't compile:

      streams
        .writeStream
        .foreachBatch { (df, id) => myFunc(df, id) }
      

      To fix the compilation error, change foreachBatch { (df, id) => myFunc(df, id) } to foreachBatch(myFunc _) or use the Java API explicitly: foreachBatch(new VoidFunction2 ...).

  • Because the Apache Hive version used for handling Hive user-defined functions and Hive SerDes is upgraded to 2.3, two changes are required:

    • Hive's SerDe interface is replaced by an abstract class AbstractSerDe. For any custom Hive SerDe implementation, migrating to AbstractSerDe is required.
    • Setting spark.sql.hive.metastore.jars to builtin means that the Hive 2.3 metastore client will be used to access metastores for Databricks Runtime 7.0. If you need to access Hive 1.2 based external metastores, set spark.sql.hive.metastore.jars to the folder that contains Hive 1.2 jars.

Deprecations and removals

  • Data skipping index was deprecated in Databricks Runtime 4.3 and removed in Databricks Runtime 7.0. We recommend that you use Delta tables instead, which offer improved data skipping capabilities.
  • In Databricks Runtime 7.0, the underlying version of Apache Spark uses Scala 2.12. Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. The cluster Libraries tab shows a status Skipped and a deprecation message that explains the changes in library handling. However, if you have a cluster that was created on an earlier version of Databricks Runtime before Azure Databricks platform version 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. The workaround is either to clone the cluster or to create a new cluster.

Apache Spark

Databricks Runtime 7.0 includes Apache Spark 3.0.

In this section:

Core, Spark SQL, Structured Streaming

Highlights

Performance enhancements

Extensibility enhancements

  • Catalog plugin API (SPARK-31121)
  • Data source V2 API refactoring (SPARK-25390)
  • Hive 3.0 and 3.1 metastore support (SPARK-27970),(SPARK-24360)
  • Extend Spark plugin interface to driver (SPARK-29396)
  • Extend Spark metrics system with user-defined metrics using executor plugins (SPARK-28091)
  • Developer APIs for extended Columnar Processing Support (SPARK-27396)
  • Built-in source migration using DSV2: parquet, ORC, CSV, JSON, Kafka, Text, Avro (SPARK-27589)
  • Allow FunctionInjection in SparkExtensions (SPARK-25560)
  • Allows Aggregator to be registered as a UDAF (SPARK-27296)

Connector enhancements

  • Column pruning through nondeterministic expressions (SPARK-29768)
  • Support spark.sql.statistics.fallBackToHdfs in data source tables (SPARK-25474)
  • Allow partition pruning with subquery filters on file source (SPARK-26893)
  • Avoid pushdown of subqueries in data source filters (SPARK-25482)
  • Recursive data loading from file sources (SPARK-27990)
  • Parquet/ORC
  • CSV
    • Support filters pushdown in CSV datasource (SPARK-30323)
  • Hive SerDe
    • No schema inference when reading Hive serde table with native data source (SPARK-27119)
    • Hive CTAS commands should use data source if it is convertible (SPARK-25271)
    • Use native data source to optimize inserting partitioned Hive table (SPARK-28573)
  • Apache Kafka
    • Add support for Kafka headers (SPARK-23539)
    • Add Kafka delegation token support (SPARK-25501)
    • Introduce new option to Kafka source: offset by timestamp (starting/ending) (SPARK-26848)
    • Support the minPartitions option in Kafka batch source and streaming source v1 (SPARK-30656)
    • Upgrade Kafka to 2.4.1 (SPARK-31126)
  • New built-in data sources

Feature enhancements

SQL compatibility enhancements

  • Switch to Proleptic Gregorian calendar (SPARK-26651)
  • Build Spark's own datetime pattern definition (SPARK-31408)
  • Introduce ANSI store assignment policy for table insertion (SPARK-28495)
  • Follow ANSI store assignment rule in table insertion by default (SPARK-28885)
  • Add a SQLConf spark.sql.ansi.enabled (SPARK-28989)
  • Support ANSI SQL filter clause for aggregate expression (SPARK-27986)
  • Support ANSI SQL OVERLAY function (SPARK-28077)
  • Support ANSI nested bracketed comments (SPARK-28880)
  • Throw exception on overflow for integers (SPARK-26218)
  • Overflow check for interval arithmetic operations (SPARK-30341)
  • Throw Exception when invalid string is cast to numeric type (SPARK-30292)
  • Make interval multiply and divide's overflow behavior consistent with other operations (SPARK-30919)
  • Add ANSI type aliases for char and decimal (SPARK-29941)
  • SQL Parser defines ANSI compliant reserved keywords (SPARK-26215)
  • Forbid reserved keywords as identifiers when ANSI mode is on (SPARK-26976)
  • Support ANSI SQL LIKE ... ESCAPE syntax (SPARK-28083)
  • Support ANSI SQL Boolean-Predicate syntax (SPARK-27924)
  • Better support for correlated subquery processing (SPARK-18455)

Monitoring and debugability enhancements

  • New Structured Streaming UI (SPARK-29543)
  • SHS: Allow event logs for running streaming apps to be rolled over (SPARK-28594)
  • Add an API that allows a user to define and observe arbitrary metrics on batch and streaming queries (SPARK-29345)
  • Instrumentation for tracking per-query planning time (SPARK-26129)
  • Put the basic shuffle metrics in the SQL exchange operator (SPARK-26139)
  • SQL statement is shown in SQL Tab instead of callsite (SPARK-27045)
  • Add tooltip to SparkUI (SPARK-29449)
  • Improve the concurrent performance of History Server (SPARK-29043)
  • EXPLAIN FORMATTED command (SPARK-27395)
  • Support Dumping truncated plans and generated code to a file (SPARK-26023)
  • Enhance describe framework to describe the output of a query (SPARK-26982)
  • Add SHOW VIEWS command (SPARK-31113)
  • Improve the error messages of SQL parser (SPARK-27901)
  • Support Prometheus monitoring natively (SPARK-29429)

PySpark enhancements

  • Redesigned pandas UDFs with type hints (SPARK-28264)
  • Pandas UDF pipeline (SPARK-26412)
  • Support StructType as arguments and return types for Scalar Pandas UDF (SPARK-27240 )
  • Support Dataframe Cogroup via Pandas UDFs (SPARK-27463)
  • Add mapInPandas to allow an iterator of DataFrames (SPARK-28198)
  • Certain SQL functions should take column names as well (SPARK-26979)
  • Make PySpark SQL exceptions more Pythonic (SPARK-31849)

Documentation and test coverage enhancements

Other notable changes

  • Built-in Hive execution upgrade from 1.2.1 to 2.3.6  (SPARK-23710, SPARK-28723, SPARK-31381)
  • Use Apache Hive 2.3 dependency by default (SPARK-30034)
  • GA Scala 2.12 and remove 2.11 (SPARK-26132)
  • Improve logic for timing out executors in dynamic allocation (SPARK-20286)
  • Disk-persisted RDD blocks served by shuffle service and ignored for Dynamic Allocation (SPARK-27677)
  • Acquire new executors to avoid hang because of blocklisting (SPARK-22148)
  • Allow sharing of Netty's memory pool allocators (SPARK-24920)
  • Fix deadlock between TaskMemoryManager and UnsafeExternalSorter$SpillableIterator (SPARK-27338)
  • Introduce AdmissionControl APIs for StructuredStreaming (SPARK-30669)
  • Spark History Main page performance improvement (SPARK-25973)
  • Speed up and slim down metric aggregation in SQL listener (SPARK-29562)
  • Avoid the network when shuffle blocks are fetched from the same host (SPARK-27651)
  • Improve file listing for DistributedFileSystem (SPARK-27801)

Behavior changes for Spark core, Spark SQL, and Structured Streaming

The following migration guides list behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

The following behavior changes are not covered in these migration guides:

  • In Spark 3.0, the deprecated class org.apache.spark.sql.streaming.ProcessingTime has been removed. Use org.apache.spark.sql.streaming.Trigger.ProcessingTime instead. Likewise, org.apache.spark.sql.execution.streaming.continuous.ContinuousTrigger has been removed in favor of Trigger.Continuous, and org.apache.spark.sql.execution.streaming.OneTimeTrigger has been hidden in favor of Trigger.Once. (SPARK-28199)
  • In Databricks Runtime 7.0, when reading a Hive SerDe table, by default Spark disallows reading files under a subdirectory that is not a table partition. To enable it, set the configuration spark.databricks.io.hive.scanNonpartitionedDirectory.enabled as true. This does not affect Spark native table readers and file readers.

Programming guides:

MLlib

Highlights

  • Multiple columns support was added to Binarizer (SPARK-23578), StringIndexer (SPARK-11215), StopWordsRemover (SPARK-29808) and PySpark QuantileDiscretizer (SPARK-22796)
  • Support tree-based feature transformation(SPARK-13677)
  • Two new evaluators MultilabelClassificationEvaluator (SPARK-16692) and RankingEvaluator (SPARK-28045) were added
  • Sample weights support was added in DecisionTreeClassifier/Regressor (SPARK-19591), RandomForestClassifier/Regressor (SPARK-9478), GBTClassifier/Regressor (SPARK-9612), RegressionEvaluator (SPARK-24102), BinaryClassificationEvaluator (SPARK-24103), BisectingKMeans (SPARK-30351), KMeans (SPARK-29967) and GaussianMixture (SPARK-30102)
  • R API for PowerIterationClustering was added (SPARK-19827)
  • Added Spark ML listener for tracking ML pipeline status (SPARK-23674)
  • Fit with validation set was added to Gradient Boosted Trees in Python (SPARK-24333)
  • RobustScaler transformer was added (SPARK-28399)
  • Factorization Machines classifier and regressor were added (SPARK-29224)
  • Gaussian Naive Bayes (SPARK-16872) and Complement Naive Bayes (SPARK-29942) were added
  • ML function parity between Scala and Python (SPARK-28958)
  • predictRaw is made public in all the Classification models. predictProbability is made public in all of the Classification models except LinearSVCModel (SPARK-30358)

Behavior changes for MLlib

The following migration guide lists behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

The following behavior changes are not covered in the migration guide:

  • In Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) return LogisticRegressionSummary, not the subclass BinaryLogisticRegressionSummary. The additional methods exposed by BinaryLogisticRegressionSummary would not work in this case anyway. (SPARK-31681)
  •  In Spark 3.0, pyspark.ml.param.shared.Has* mixins do not provide any set*(self, value) setter methods anymore, use the respective self.set(self.*, value) instead. See SPARK-29093 for details. (SPARK-29093)

Programming guide

SparkR

  • Arrow optimization in SparkR's interoperability (SPARK-26759)
  • Performance enhancement via vectorized R gapply(), dapply(), createDataFrame, collect()
  • "Eager execution" for R shell, IDE (SPARK-24572)
  • R API for Power Iteration Clustering (SPARK-19827)

Behavior changes for SparkR

The following migration guide lists behavior changes between Apache Spark 2.4 and 3.0. These changes may require updates to jobs that you have been running on lower Databricks Runtime versions:

Programming guide

GraphX

Programming guide: GraphX Programming Guide.

Deprecations

Known issues

  • Parsing day of year using pattern letter 'D' returns the wrong result if the year field is missing. This can happen in SQL functions like to_timestamp which parses datetime string to datetime values using a pattern string. (SPARK-31939)
  • Join/Window/Aggregate inside subqueries may lead to wrong results if the keys have values -0.0 and 0.0. (SPARK-31958)
  • A window query may fail with ambiguous self-join error unexpectedly. (SPARK-31956)
  • Streaming queries with dropDuplicates operator may not be able to restart with the checkpoint written by Spark 2.x. (SPARK-31990)

Maintenance updates

See Databricks Runtime 7.0 maintenance updates.

System environment

  • Operating System: Ubuntu 18.04.4 LTS
  • Java: 1.8.0_252
  • Scala: 2.12.10
  • Python: 3.7.5
  • R: R version 3.6.3 (2020-02-29)
  • Delta Lake 0.7.0

Installed Python libraries

Library Version Library Version Library Version
asn1crypto 1.3.0 backcall 0.1.0 boto3 1.12.0
botocore 1.15.0 certifi 2020.4.5 cffi 1.14.0
chardet 3.0.4 cryptography 2.8 cycler 0.10.0
Cython 0.29.15 decorator 4.4.1 docutils 0.15.2
entrypoints 0.3 idna 2.8 ipykernel 5.1.4
ipython 7.12.0 ipython-genutils 0.2.0 jedi 0.14.1
jmespath 0.9.4 joblib 0.14.1 jupyter-client 5.3.4
jupyter-core 4.6.1 kiwisolver 1.1.0 matplotlib 3.1.3
numpy 1.18.1 pandas 1.0.1 parso 0.5.2
patsy 0.5.1 pexpect 4.8.0 pickleshare 0.7.5
pip 20.0.2 prompt-toolkit 3.0.3 psycopg2 2.8.4
ptyprocess 0.6.0 pyarrow 0.15.1 pycparser 2.19
Pygments 2.5.2 PyGObject 3.26.1 pyOpenSSL 19.1.0
pyparsing 2.4.6 PySocks 1.7.1 python-apt 1.6.5+ubuntu0.3
python-dateutil 2.8.1 pytz 2019.3 pyzmq 18.1.1
requests 2.22.0 s3transfer 0.3.3 scikit-learn 0.22.1
scipy 1.4.1 seaborn 0.10.0 setuptools 45.2.0
six 1.14.0 ssh-import-id 5.7 statsmodels 0.11.0
tornado 6.0.3 traitlets 4.3.3 unattended-upgrades 0.1
urllib3 1.25.8 virtualenv 16.7.10 wcwidth 0.1.8
wheel 0.34.2

Installed R libraries

R libraries are installed from Microsoft CRAN snapshot on 2020-04-22.

Library Version Library Version Library Version
askpass 1.1 assertthat 0.2.1 backports 1.1.6
base 3.6.3 base64enc 0.1-3 BH 1.72.0-3
bit 1.1-15.2 bit64 0.9-7 blob 1.2.1
boot 1.3-25 brew 1.0-6 broom 0.5.6
callr 3.4.3 caret 6.0-86 cellranger 1.1.0
chron 2.3-55 class 7.3-17 cli 2.0.2
clipr 0.7.0 cluster 2.1.0 codetools 0.2-16
colorspace 1.4-1 commonmark 1.7 compiler 3.6.3
config 0.3 covr 3.5.0 crayon 1.3.4
crosstalk 1.1.0.1 curl 4.3 data.table 1.12.8
datasets 3.6.3 DBI 1.1.0 dbplyr 1.4.3
desc 1.2.0 devtools 2.3.0 digest 0.6.25
dplyr 0.8.5 DT 0.13 ellipsis 0.3.0
evaluate 0.14 fansi 0.4.1 farver 2.0.3
fastmap 1.0.1 forcats 0.5.0 foreach 1.5.0
foreign 0.8-76 forge 0.2.0 fs 1.4.1
generics 0.0.2 ggplot2 3.3.0 gh 1.1.0
git2r 0.26.1 glmnet 3.0-2 globals 0.12.5
glue 1.4.0 gower 0.2.1 graphics 3.6.3
grDevices 3.6.3 grid 3.6.3 gridExtra 2.3
gsubfn 0.7 gtable 0.3.0 haven 2.2.0
highr 0.8 hms 0.5.3 htmltools 0.4.0
htmlwidgets 1.5.1 httpuv 1.5.2 httr 1.4.1
hwriter 1.3.2 hwriterPlus 1.0-3 ini 0.3.1
ipred 0.9-9 isoband 0.2.1 iterators 1.0.12
jsonlite 1.6.1 KernSmooth 2.23-17 knitr 1.28
labeling 0.3 later 1.0.0 lattice 0.20-41
lava 1.6.7 lazyeval 0.2.2 lifecycle 0.2.0
lubridate 1.7.8 magrittr 1.5 markdown 1.1
MASS 7.3-51.6 Matrix 1.2-18 memoise 1.1.0
methods 3.6.3 mgcv 1.8-31 mime 0.9
ModelMetrics 1.2.2.2 modelr 0.1.6 munsell 0.5.0
nlme 3.1-147 nnet 7.3-14 numDeriv 2016.8-1.1
openssl 1.4.1 parallel 3.6.3 pillar 1.4.3
pkgbuild 1.0.6 pkgconfig 2.0.3 pkgload 1.0.2
plogr 0.2.0 plyr 1.8.6 praise 1.0.0
prettyunits 1.1.1 pROC 1.16.2 processx 3.4.2
prodlim 2019.11.13 progress 1.2.2 promises 1.1.0
proto 1.0.0 ps 1.3.2 purrr 0.3.4
r2d3 0.2.3 R6 2.4.1 randomForest 4.6-14
rappdirs 0.3.1 rcmdcheck 1.3.3 RColorBrewer 1.1-2
Rcpp 1.0.4.6 readr 1.3.1 readxl 1.3.1
recipes 0.1.10 rematch 1.0.1 rematch2 2.1.1
remotes 2.1.1 reprex 0.3.0 reshape2 1.4.4
rex 1.2.0 rjson 0.2.20 rlang 0.4.5
rmarkdown 2.1 RODBC 1.3-16 roxygen2 7.1.0
rpart 4.1-15 rprojroot 1.3-2 Rserve 1.8-6
RSQLite 2.2.0 rstudioapi 0.11 rversions 2.0.1
rvest 0.3.5 scales 1.1.0 selectr 0.4-2
sessioninfo 1.1.1 shape 1.4.4 shiny 1.4.0.2
sourcetools 0.1.7 sparklyr 1.2.0 SparkR 3.0.0
spatial 7.3-11 splines 3.6.3 sqldf 0.4-11
SQUAREM 2020.2 stats 3.6.3 stats4 3.6.3
stringi 1.4.6 stringr 1.4.0 survival 3.1-12
sys 3.3 tcltk 3.6.3 TeachingDemos 2.10
testthat 2.3.2 tibble 3.0.1 tidyr 1.0.2
tidyselect 1.0.0 tidyverse 1.3.0 timeDate 3043.102
tinytex 0.22 tools 3.6.3 usethis 1.6.0
utf8 1.1.4 utils 3.6.3 vctrs 0.2.4
viridisLite 0.3.0 whisker 0.4 withr 2.2.0
xfun 0.13 xml2 1.3.1 xopen 1.0.0
xtable 1.8-4 yaml 2.2.1

Installed Java and Scala libraries (Scala 2.12 cluster version)

Group ID Artifact ID Version
antlr antlr 2.7.7
com.amazonaws amazon-kinesis-client 1.12.0
com.amazonaws aws-java-sdk-autoscaling 1.11.655
com.amazonaws aws-java-sdk-cloudformation 1.11.655
com.amazonaws aws-java-sdk-cloudfront 1.11.655
com.amazonaws aws-java-sdk-cloudhsm 1.11.655
com.amazonaws aws-java-sdk-cloudsearch 1.11.655
com.amazonaws aws-java-sdk-cloudtrail 1.11.655
com.amazonaws aws-java-sdk-cloudwatch 1.11.655
com.amazonaws aws-java-sdk-cloudwatchmetrics 1.11.655
com.amazonaws aws-java-sdk-codedeploy 1.11.655
com.amazonaws aws-java-sdk-cognitoidentity 1.11.655
com.amazonaws aws-java-sdk-cognitosync 1.11.655
com.amazonaws aws-java-sdk-config 1.11.655
com.amazonaws aws-java-sdk-core 1.11.655
com.amazonaws aws-java-sdk-datapipeline 1.11.655
com.amazonaws aws-java-sdk-directconnect 1.11.655
com.amazonaws aws-java-sdk-directory 1.11.655
com.amazonaws aws-java-sdk-dynamodb 1.11.655
com.amazonaws aws-java-sdk-ec2 1.11.655
com.amazonaws aws-java-sdk-ecs 1.11.655
com.amazonaws aws-java-sdk-efs 1.11.655
com.amazonaws aws-java-sdk-elasticache 1.11.655
com.amazonaws aws-java-sdk-elasticbeanstalk 1.11.655
com.amazonaws aws-java-sdk-elasticloadbalancing 1.11.655
com.amazonaws aws-java-sdk-elastictranscoder 1.11.655
com.amazonaws aws-java-sdk-emr 1.11.655
com.amazonaws aws-java-sdk-glacier 1.11.655
com.amazonaws aws-java-sdk-iam 1.11.655
com.amazonaws aws-java-sdk-importexport 1.11.655
com.amazonaws aws-java-sdk-kinesis 1.11.655
com.amazonaws aws-java-sdk-kms 1.11.655
com.amazonaws aws-java-sdk-lambda 1.11.655
com.amazonaws aws-java-sdk-logs 1.11.655
com.amazonaws aws-java-sdk-machinelearning 1.11.655
com.amazonaws aws-java-sdk-opsworks 1.11.655
com.amazonaws aws-java-sdk-rds 1.11.655
com.amazonaws aws-java-sdk-redshift 1.11.655
com.amazonaws aws-java-sdk-route53 1.11.655
com.amazonaws aws-java-sdk-s3 1.11.655
com.amazonaws aws-java-sdk-ses 1.11.655
com.amazonaws aws-java-sdk-simpledb 1.11.655
com.amazonaws aws-java-sdk-simpleworkflow 1.11.655
com.amazonaws aws-java-sdk-sns 1.11.655
com.amazonaws aws-java-sdk-sqs 1.11.655
com.amazonaws aws-java-sdk-ssm 1.11.655
com.amazonaws aws-java-sdk-storagegateway 1.11.655
com.amazonaws aws-java-sdk-sts 1.11.655
com.amazonaws aws-java-sdk-support 1.11.655
com.amazonaws aws-java-sdk-swf-libraries 1.11.22
com.amazonaws aws-java-sdk-workspaces 1.11.655
com.amazonaws jmespath-java 1.11.655
com.chuusai shapeless_2.12 2.3.3
com.clearspring.analytics stream 2.9.6
com.databricks Rserve 1.8-3
com.databricks jets3t 0.7.1-0
com.databricks.scalapb compilerplugin_2.12 0.4.15-10
com.databricks.scalapb scalapb-runtime_2.12 0.4.15-10
com.esotericsoftware kryo-shaded 4.0.2
com.esotericsoftware minlog 1.3.0
com.fasterxml classmate 1.3.4
com.fasterxml.jackson.core jackson-annotations 2.10.0
com.fasterxml.jackson.core jackson-core 2.10.0
com.fasterxml.jackson.core jackson-databind 2.10.0
com.fasterxml.jackson.dataformat jackson-dataformat-cbor 2.10.0
com.fasterxml.jackson.datatype jackson-datatype-joda 2.10.0
com.fasterxml.jackson.module jackson-module-paranamer 2.10.0
com.fasterxml.jackson.module jackson-module-scala_2.12 2.10.0
com.github.ben-manes.caffeine caffeine 2.3.4
com.github.fommil jniloader 1.1
com.github.fommil.netlib core 1.1.2
com.github.fommil.netlib native_ref-java 1.1
com.github.fommil.netlib native_ref-java-natives 1.1
com.github.fommil.netlib native_system-java 1.1
com.github.fommil.netlib native_system-java-natives 1.1
com.github.fommil.netlib netlib-native_ref-linux-x86_64-natives 1.1
com.github.fommil.netlib netlib-native_system-linux-x86_64-natives 1.1
com.github.joshelser dropwizard-metrics-hadoop-metrics2-reporter 0.1.2
com.github.luben zstd-jni 1.4.4-3
com.github.wendykierp JTransforms 3.1
com.google.code.findbugs jsr305 3.0.0
com.google.code.gson gson 2.2.4
com.google.flatbuffers flatbuffers-java 1.9.0
com.google.guava guava 15.0
com.google.protobuf protobuf-java 2.6.1
com.h2database h2 1.4.195
com.helger profiler 1.1.1
com.jcraft jsch 0.1.50
com.jolbox bonecp 0.8.0.RELEASE
com.microsoft.azure azure-data-lake-store-sdk 2.2.8
com.microsoft.sqlserver mssql-jdbc 8.2.1.jre8
com.ning compress-lzf 1.0.3
com.sun.mail javax.mail 1.5.2
com.tdunning json 1.8
com.thoughtworks.paranamer paranamer 2.8
com.trueaccord.lenses lenses_2.12 0.4.12
com.twitter chill-java 0.9.5
com.twitter chill_2.12 0.9.5
com.twitter util-app_2.12 7.1.0
com.twitter util-core_2.12 7.1.0
com.twitter util-function_2.12 7.1.0
com.twitter util-jvm_2.12 7.1.0
com.twitter util-lint_2.12 7.1.0
com.twitter util-registry_2.12 7.1.0
com.twitter util-stats_2.12 7.1.0
com.typesafe config 1.2.1
com.typesafe.scala-logging scala-logging_2.12 3.7.2
com.univocity univocity-parsers 2.8.3
com.zaxxer HikariCP 3.1.0
commons-beanutils commons-beanutils 1.9.4
commons-cli commons-cli 1.2
commons-codec commons-codec 1.10
commons-collections commons-collections 3.2.2
commons-configuration commons-configuration 1.6
commons-dbcp commons-dbcp 1.4
commons-digester commons-digester 1.8
commons-fileupload commons-fileupload 1.3.3
commons-httpclient commons-httpclient 3.1
commons-io commons-io 2.4
commons-lang commons-lang 2.6
commons-logging commons-logging 1.1.3
commons-net commons-net 3.1
commons-pool commons-pool 1.5.4
info.ganglia.gmetric4j gmetric4j 1.0.10
io.airlift aircompressor 0.10
io.dropwizard.metrics metrics-core 4.1.1
io.dropwizard.metrics metrics-graphite 4.1.1
io.dropwizard.metrics metrics-healthchecks 4.1.1
io.dropwizard.metrics metrics-jetty9 4.1.1
io.dropwizard.metrics metrics-jmx 4.1.1
io.dropwizard.metrics metrics-json 4.1.1
io.dropwizard.metrics metrics-jvm 4.1.1
io.dropwizard.metrics metrics-servlets 4.1.1
io.netty netty-all 4.1.47.Final
jakarta.annotation jakarta.annotation-api 1.3.5
jakarta.validation jakarta.validation-api 2.0.2
jakarta.ws.rs jakarta.ws.rs-api 2.1.6
javax.activation activation 1.1.1
javax.el javax.el-api 2.2.4
javax.jdo jdo-api 3.0.1
javax.servlet javax.servlet-api 3.1.0
javax.servlet.jsp jsp-api 2.1
javax.transaction jta 1.1
javax.transaction transaction-api 1.1
javax.xml.bind jaxb-api 2.2.2
javax.xml.stream stax-api 1.0-2
javolution javolution 5.5.1
jline jline 2.14.6
joda-time joda-time 2.10.5
log4j apache-log4j-extras 1.2.17
log4j log4j 1.2.17
net.razorvine pyrolite 4.30
net.sf.jpam jpam 1.1
net.sf.opencsv opencsv 2.3
net.sf.supercsv super-csv 2.2.0
net.snowflake snowflake-ingest-sdk 0.9.6
net.snowflake snowflake-jdbc 3.12.0
net.snowflake spark-snowflake_2.12 2.5.9-spark_2.4
net.sourceforge.f2j arpack_combined_all 0.1
org.acplt.remotetea remotetea-oncrpc 1.1.2
org.antlr ST4 4.0.4
org.antlr antlr-runtime 3.5.2
org.antlr antlr4-runtime 4.7.1
org.antlr stringtemplate 3.2.1
org.apache.ant ant 1.9.2
org.apache.ant ant-jsch 1.9.2
org.apache.ant ant-launcher 1.9.2
org.apache.arrow arrow-format 0.15.1
org.apache.arrow arrow-memory 0.15.1
org.apache.arrow arrow-vector 0.15.1
org.apache.avro avro 1.8.2
org.apache.avro avro-ipc 1.8.2
org.apache.avro avro-mapred-hadoop2 1.8.2
org.apache.commons commons-compress 1.8.1
org.apache.commons commons-crypto 1.0.0
org.apache.commons commons-lang3 3.9
org.apache.commons commons-math3 3.4.1
org.apache.commons commons-text 1.6
org.apache.curator curator-client 2.7.1
org.apache.curator curator-framework 2.7.1
org.apache.curator curator-recipes 2.7.1
org.apache.derby derby 10.12.1.1
org.apache.directory.api api-asn1-api 1.0.0-M20
org.apache.directory.api api-util 1.0.0-M20
org.apache.directory.server apacheds-i18n 2.0.0-M15
org.apache.directory.server apacheds-kerberos-codec 2.0.0-M15
org.apache.hadoop hadoop-annotations 2.7.4
org.apache.hadoop hadoop-auth 2.7.4
org.apache.hadoop hadoop-client 2.7.4
org.apache.hadoop hadoop-common 2.7.4
org.apache.hadoop hadoop-hdfs 2.7.4
org.apache.hadoop hadoop-mapreduce-client-app 2.7.4
org.apache.hadoop hadoop-mapreduce-client-common 2.7.4
org.apache.hadoop hadoop-mapreduce-client-core 2.7.4
org.apache.hadoop hadoop-mapreduce-client-jobclient 2.7.4
org.apache.hadoop hadoop-mapreduce-client-shuffle 2.7.4
org.apache.hadoop hadoop-yarn-api 2.7.4
org.apache.hadoop hadoop-yarn-client 2.7.4
org.apache.hadoop hadoop-yarn-common 2.7.4
org.apache.hadoop hadoop-yarn-server-common 2.7.4
org.apache.hive hive-beeline 2.3.7
org.apache.hive hive-cli 2.3.7
org.apache.hive hive-common 2.3.7
org.apache.hive hive-exec-core 2.3.7
org.apache.hive hive-jdbc 2.3.7
org.apache.hive hive-llap-client 2.3.7
org.apache.hive hive-llap-common 2.3.7
org.apache.hive hive-metastore 2.3.7
org.apache.hive hive-serde 2.3.7
org.apache.hive hive-shims 2.3.7
org.apache.hive hive-storage-api 2.7.1
org.apache.hive hive-vector-code-gen 2.3.7
org.apache.hive.shims hive-shims-0.23 2.3.7
org.apache.hive.shims hive-shims-common 2.3.7
org.apache.hive.shims hive-shims-scheduler 2.3.7
org.apache.htrace htrace-core 3.1.0-incubating
org.apache.httpcomponents httpclient 4.5.6
org.apache.httpcomponents httpcore 4.4.12
org.apache.ivy ivy 2.4.0
org.apache.orc orc-core 1.5.10
org.apache.orc orc-mapreduce 1.5.10
org.apache.orc orc-shims 1.5.10
org.apache.parquet parquet-column 1.10.1.2-databricks4
org.apache.parquet parquet-common 1.10.1.2-databricks4
org.apache.parquet parquet-encoding 1.10.1.2-databricks4
org.apache.parquet parquet-format 2.4.0
org.apache.parquet parquet-hadoop 1.10.1.2-databricks4
org.apache.parquet parquet-jackson 1.10.1.2-databricks4
org.apache.thrift libfb303 0.9.3
org.apache.thrift libthrift 0.12.0
org.apache.velocity velocity 1.5
org.apache.xbean xbean-asm7-shaded 4.15
org.apache.yetus audience-annotations 0.5.0
org.apache.zookeeper zookeeper 3.4.14
org.codehaus.jackson jackson-core-asl 1.9.13
org.codehaus.jackson jackson-jaxrs 1.9.13
org.codehaus.jackson jackson-mapper-asl 1.9.13
org.codehaus.jackson jackson-xc 1.9.13
org.codehaus.janino commons-compiler 3.0.16
org.codehaus.janino janino 3.0.16
org.datanucleus datanucleus-api-jdo 4.2.4
org.datanucleus datanucleus-core 4.1.17
org.datanucleus datanucleus-rdbms 4.1.19
org.datanucleus javax.jdo 3.2.0-m3
org.eclipse.jetty jetty-client 9.4.18.v20190429
org.eclipse.jetty jetty-continuation 9.4.18.v20190429
org.eclipse.jetty jetty-http 9.4.18.v20190429
org.eclipse.jetty jetty-io 9.4.18.v20190429
org.eclipse.jetty jetty-jndi 9.4.18.v20190429
org.eclipse.jetty jetty-plus 9.4.18.v20190429
org.eclipse.jetty jetty-proxy 9.4.18.v20190429
org.eclipse.jetty jetty-security 9.4.18.v20190429
org.eclipse.jetty jetty-server 9.4.18.v20190429
org.eclipse.jetty jetty-servlet 9.4.18.v20190429
org.eclipse.jetty jetty-servlets 9.4.18.v20190429
org.eclipse.jetty jetty-util 9.4.18.v20190429
org.eclipse.jetty jetty-webapp 9.4.18.v20190429
org.eclipse.jetty jetty-xml 9.4.18.v20190429
org.fusesource.leveldbjni leveldbjni-all 1.8
org.glassfish.hk2 hk2-api 2.6.1
org.glassfish.hk2 hk2-locator 2.6.1
org.glassfish.hk2 hk2-utils 2.6.1
org.glassfish.hk2 osgi-resource-locator 1.0.3
org.glassfish.hk2.external aopalliance-repackaged 2.6.1
org.glassfish.hk2.external jakarta.inject 2.6.1
org.glassfish.jersey.containers jersey-container-servlet 2.30
org.glassfish.jersey.containers jersey-container-servlet-core 2.30
org.glassfish.jersey.core jersey-client 2.30
org.glassfish.jersey.core jersey-common 2.30
org.glassfish.jersey.core jersey-server 2.30
org.glassfish.jersey.inject jersey-hk2 2.30
org.glassfish.jersey.media jersey-media-jaxb 2.30
org.hibernate.validator hibernate-validator 6.1.0.Final
org.javassist javassist 3.25.0-GA
org.jboss.logging jboss-logging 3.3.2.Final
org.jdbi jdbi 2.63.1
org.joda joda-convert 1.7
org.jodd jodd-core 3.5.2
org.json4s json4s-ast_2.12 3.6.6
org.json4s json4s-core_2.12 3.6.6
org.json4s json4s-jackson_2.12 3.6.6
org.json4s json4s-scalap_2.12 3.6.6
org.lz4 lz4-java 1.7.1
org.mariadb.jdbc mariadb-java-client 2.1.2
org.objenesis objenesis 2.5.1
org.postgresql postgresql 42.1.4
org.roaringbitmap RoaringBitmap 0.7.45
org.roaringbitmap shims 0.7.45
org.rocksdb rocksdbjni 6.2.2
org.rosuda.REngine REngine 2.1.0
org.scala-lang scala-compiler_2.12 2.12.10
org.scala-lang scala-library_2.12 2.12.10
org.scala-lang scala-reflect_2.12 2.12.10
org.scala-lang.modules scala-collection-compat_2.12 2.1.1
org.scala-lang.modules scala-parser-combinators_2.12 1.1.2
org.scala-lang.modules scala-xml_2.12 1.2.0
org.scala-sbt test-interface 1.0
org.scalacheck scalacheck_2.12 1.14.2
org.scalactic scalactic_2.12 3.0.8
org.scalanlp breeze-macros_2.12 1.0
org.scalanlp breeze_2.12 1.0
org.scalatest scalatest_2.12 3.0.8
org.slf4j jcl-over-slf4j 1.7.30
org.slf4j jul-to-slf4j 1.7.30
org.slf4j slf4j-api 1.7.30
org.slf4j slf4j-log4j12 1.7.30
org.spark-project.spark unused 1.0.0
org.springframework spring-core 4.1.4.RELEASE
org.springframework spring-test 4.1.4.RELEASE
org.threeten threeten-extra 1.5.0
org.tukaani xz 1.5
org.typelevel algebra_2.12 2.0.0-M2
org.typelevel cats-kernel_2.12 2.0.0-M4
org.typelevel machinist_2.12 0.6.8
org.typelevel macro-compat_2.12 1.1.1
org.typelevel spire-macros_2.12 0.17.0-M1
org.typelevel spire-platform_2.12 0.17.0-M1
org.typelevel spire-util_2.12 0.17.0-M1
org.typelevel spire_2.12 0.17.0-M1
org.xerial sqlite-jdbc 3.8.11.2
org.xerial.snappy snappy-java 1.1.7.5
org.yaml snakeyaml 1.24
oro oro 2.0.8
pl.edu.icm JLargeArrays 1.5
software.amazon.ion ion-java 1.0.2
stax stax-api 1.0.1
xmlenc xmlenc 0.52