Apache Spark guidelines

This article provides various guidelines for using Apache Spark on Azure HDInsight.

How do I run or submit Spark jobs?

Option Documents
Visual Studio Code Use Spark & Hive Tools for Visual Studio Code
Jupyter Notebooks Tutorial: Load data and run queries on an Apache Spark cluster in Azure HDInsight
IntelliJ Tutorial: Use Azure Toolkit for IntelliJ to create Apache Spark applications for an HDInsight cluster
IntelliJ Tutorial: Create a Scala Maven application for Apache Spark in HDInsight using IntelliJ
Zeppelin notebooks Use Apache Zeppelin notebooks with Apache Spark cluster on Azure HDInsight
Remote job submission with Livy Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster
Apache Oozie Oozie is a workflow and coordination system that manages Hadoop jobs.
Apache Livy You can use Livy to run interactive Spark shells or submit batch jobs to be run on Spark.
Azure Data Factory for Apache Spark The Spark activity in a Data Factory pipeline executes a Spark program on your own or [on-demand HDInsight cluster.
Azure Data Factory for Apache Hive The HDInsight Hive activity in a Data Factory pipeline executes Hive queries on your own or on-demand HDInsight cluster.

How do I monitor and debug Spark jobs?

How do I make my Spark jobs run more efficiently?

How do I connect to other Azure Services?

What are my storage options?

Next steps