管理 Azure HDInsight 上 Apache Spark 群集的资源Manage resources for Apache Spark cluster on Azure HDInsight

了解如何访问与 Apache Spark 群集关联的界面(如 Apache Ambari UI、Apache Hadoop YARN UI 和 Spark History Server),以及如何优化群集配置以达到最佳性能。Learn how to access the interfaces like Apache Ambari UI, Apache Hadoop YARN UI, and the Spark History Server associated with your Apache Spark cluster, and how to tune the cluster configuration for optimal performance.

打开 Spark History ServerOpen the Spark History Server

Spark History Server 是已完成和正在运行的 Spark 应用程序的 Web UI。Spark History Server is the web UI for completed and running Spark applications. 它是 Spark Web UI 的扩展。It's an extension of Spark's Web UI. 有关完整信息,请参阅 Spark History ServerFor complete information, see Spark History Server.

打开 YARN UIOpen the Yarn UI

可以使用 YARN UI 来监视 Spark 群集上当前运行的应用程序。You can use the YARN UI to monitor applications that are currently running on the Spark cluster.

  1. Azure 门户打开 Spark 群集。From the Azure portal, open the Spark cluster. 有关详细信息,请参阅列出和显示群集For more information, see List and show clusters.

  2. 从“群集仪表板” 中,选择“Yarn” 。From Cluster dashboards, select Yarn. 出现提示时,输入 Spark 群集的管理员凭据。When prompted, enter the admin credentials for the Spark cluster.

    启动 YARN UI

    提示

    或者,也可以从 Ambari UI 启动 YARN UI。Alternatively, you can also launch the YARN UI from the Ambari UI. 在 Ambari UI 中,导航到“YARN” > “快速链接” > “活动” > “资源管理器 UI”。From the Ambari UI, navigate to YARN > Quick Links > Active > Resource Manager UI.

针对 Spark 应用程序优化群集Optimize clusters for Spark applications

根据应用程序的要求,可用于 Spark 配置的三个关键参数为 spark.executor.instancesspark.executor.coresspark.executor.memoryThe three key parameters that can be used for Spark configuration depending on application requirements are spark.executor.instances, spark.executor.cores, and spark.executor.memory. 执行器是针对 Spark 应用程序启动的进程。An Executor is a process launched for a Spark application. 它在辅助角色节点上运行,负责执行应用程序的任务。It runs on the worker node and is responsible to carry out the tasks for the application. 执行器的默认数目和每个群集的执行器大小是根据辅助角色节点数目和辅助角色节点大小计算的。The default number of executors and the executor sizes for each cluster is calculated based on the number of worker nodes and the worker node size. 这些信息存储在群集头节点上的 spark-defaults.conf 中。This information is stored in spark-defaults.conf on the cluster head nodes.

这三个配置参数可在群集级别配置(适用于群集上运行的所有应用程序),也可以针对每个应用程序指定。The three configuration parameters can be configured at the cluster level (for all applications that run on the cluster) or can be specified for each individual application as well.

使用 Ambari UI 更改参数Change the parameters using Ambari UI

  1. 在 Ambari UI 中,导航到“Spark2” > “配置” > “自定义 spark2-defaults”。 From the Ambari UI navigate to Spark2 > Configs > Custom spark2-defaults.

    使用 Ambari 自定义设置参数Set parameters using Ambari custom

  2. 默认值(在群集上并发运行 4 个 Spark 应用程序)是合理的。The default values are good to have four Spark applications run concurrently on the cluster. 可以从用户界面更改这些值,如以下屏幕截图所示:You can change these values from the user interface, as shown in the following screenshot:

    使用 Ambari 设置参数Set parameters using Ambari

  3. 选择“保存”, 保存配置更改。Select Save to save the configuration changes. 在页面顶部,系统会提示是否重启所有受影响的服务。At the top of the page, you're prompted to restart all the affected services. 选择“重启”。 Select Restart.

    重新启动服务

更改 Jupyter 笔记本中运行的应用程序的参数Change the parameters for an application running in Jupyter notebook

对于在 Jupyter 笔记本中运行的应用程序,可以使用 %%configure magic 进行配置更改。For applications running in the Jupyter notebook, you can use the %%configure magic to make the configuration changes. 理想情况下,必须先在应用程序开头进行此类更改,再运行第一个代码单元。Ideally, you must make such changes at the beginning of the application, before you run your first code cell. 这可以确保在创建 Livy 会话时会配置应用到该会话。Doing this ensures that the configuration is applied to the Livy session, when it gets created. 如果想要更改处于应用程序中后面某个阶段的配置,必须使用 -f 参数。If you want to change the configuration at a later stage in the application, you must use the -f parameter. 但是,这样做会使应用程序中的所有进度丢失。However, by doing so all progress in the application is lost.

以下代码片段演示如何更改 Jupyter 中运行的应用程序的配置。The following snippet shows how to change the configuration for an application running in Jupyter.

%%configure
{"executorMemory": "3072M", "executorCores": 4, "numExecutors":10}

配置参数必须以 JSON 字符串传入,并且必须位于 magic 后面的下一行,如示例列中所示。Configuration parameters must be passed in as a JSON string and must be on the next line after the magic, as shown in the example column.

使用 spark-submit 更改已提交应用程序的参数Change the parameters for an application submitted using spark-submit

以下命令示范了如何更改使用 spark-submit 提交的批处理应用程序的配置参数。Following command is an example of how to change the configuration parameters for a batch application that is submitted using spark-submit.

spark-submit --class <the application class to execute> --executor-memory 3072M --executor-cores 4 �-num-executors 10 <location of application jar file> <application parameters>

使用 cURL 更改已提交应用程序的参数Change the parameters for an application submitted using cURL

以下命令示范了如何更改使用 cURL 提交的批处理应用程序的配置参数。The following command is an example of how to change the configuration parameters for a batch application that is submitted using cURL.

curl -k -v -H 'Content-Type: application/json' -X POST -d '{"file":"<location of application jar file>", "className":"<the application class to execute>", "args":[<application parameters>], "numExecutors":10, "executorMemory":"2G", "executorCores":5' localhost:8998/batches

在 Spark Thrift 服务器上更改这些参数Change these parameters on a Spark Thrift Server

Spark Thrift 服务器提供对 Spark 群集的 JDBC/ODBC 访问,用来为 Spark SQL 查询提供服务。Spark Thrift Server provides JDBC/ODBC access to a Spark cluster and is used to service Spark SQL queries. Power BI、Tableau 之类的工具使用 ODBC 协议与 Spark Thrift 服务器通信,以便将 Spark SQL 查询作为 Spark 应用程序执行。Tools like Power BI, Tableau, and so on, use ODBC protocol to communicate with Spark Thrift Server to execute Spark SQL queries as a Spark Application. 创建 Spark 群集时,将启动 Spark Thrift 服务器的两个实例(每个头节点上各有一个实例)。When a Spark cluster is created, two instances of the Spark Thrift Server are started, one on each head node. 在 YARN UI 中,每个 Spark Thrift 服务器显示为一个 Spark 应用程序。Each Spark Thrift Server is visible as a Spark application in the YARN UI.

Spark Thrift 服务器使用 Spark 动态执行器分配,因此未使用 spark.executor.instancesSpark Thrift Server uses Spark dynamic executor allocation and hence the spark.executor.instances isn't used. 相反,Spark Thrift 服务器使用 spark.dynamicAllocation.maxExecutorsspark.dynamicAllocation.minExecutors 来指定执行器计数。Instead, Spark Thrift Server uses spark.dynamicAllocation.maxExecutors and spark.dynamicAllocation.minExecutors to specify the executor count. 使用配置参数 spark.executor.coresspark.executor.memory 可以修改执行器大小。The configuration parameters spark.executor.cores, and spark.executor.memory are used to modify the executor size. 可按以下步骤所示更改这些参数:You can change these parameters as shown in the following steps:

  • 展开“高级 spark2-thrift-sparkconf” 类别可更新参数 spark.dynamicAllocation.maxExecutorsspark.dynamicAllocation.minExecutorsExpand the Advanced spark2-thrift-sparkconf category to update the parameters spark.dynamicAllocation.maxExecutors, and spark.dynamicAllocation.minExecutors.

    配置 Spark Thrift 服务器Configure Spark thrift server

  • 展开“自定义 spark2-thrift-sparkconf” 类别可更新参数 spark.executor.coresspark.executor.memoryExpand the Custom spark2-thrift-sparkconf category to update the parameters spark.executor.cores, and spark.executor.memory.

    配置 Spark Thrift 服务器参数Configure Spark thrift server parameter

更改 Spark Thrift 服务器的驱动程序内存Change the driver memory of the Spark Thrift Server

Spark Thrift 服务器驱动程序内存配置为头节点 RAM 大小的 25%,前提是头节点的 RAM 总大小大于 14 GB。Spark Thrift Server driver memory is configured to 25% of the head node RAM size, provided the total RAM size of the head node is greater than 14 GB. 可以使用 Ambari UI 更改驱动程序内存配置,如以下屏幕截图所示:You can use the Ambari UI to change the driver memory configuration, as shown in the following screenshot:

在 Ambari UI 中,导航到“Spark2” > “配置” > “高级 spark2-env”。 From the Ambari UI, navigate to Spark2 > Configs > Advanced spark2-env. 然后提供 spark_thrift_cmd_opts 的值。Then provide the value for spark_thrift_cmd_opts.

回收 Spark 群集资源Reclaim Spark cluster resources

由于 Spark 动态分配,因此 Thrift 服务器使用的唯一资源是两个应用程序主机的资源。Because of Spark dynamic allocation, the only resources that are consumed by thrift server are the resources for the two application masters. 若要回收这些资源,必须停止群集上运行的 Thrift 服务器服务。To reclaim these resources, you must stop the Thrift Server services running on the cluster.

  1. 在 Ambari UI 的左侧窗格中,选择“Spark2” 。From the Ambari UI, from the left pane, select Spark2.

  2. 在下一页中,选择“Spark2 Thrift 服务器” 。In the next page, select Spark2 Thrift Servers.

    重启 thrift server1Restart thrift server1

  3. 应看到正在运行 Spark2 Thrift 服务器的两个头节点。You should see the two headnodes on which the Spark2 Thrift Server is running. 选择其中一个头节点。Select one of the headnodes.

    重启 thrift server2Restart thrift server2

  4. 下一页将列出该头节点上运行的所有服务。The next page lists all the services running on that headnode. 在该列表中,选择 Spark2 Thrift 服务器旁边的下拉按钮,并选择“停止” 。From the list, select the drop-down button next to Spark2 Thrift Server, and then select Stop.

    重启 thrift server3Restart thrift server3

  5. 对其他头节点重复上述步骤。Repeat these steps on the other headnode as well.

重新启动 Jupyter 服务Restart the Jupyter service

启动 Ambari Web UI,如本文开头所示。Launch the Ambari Web UI as shown in the beginning of the article. 在左侧导航窗格中,依次选择“Jupyter” 、“服务操作” 和“全部重启” 。From the left navigation pane, select Jupyter, select Service Actions, and then select Restart All. 这会在所有头节点上启动 Jupyter 服务。This starts the Jupyter service on all the headnodes.

重启 JupyterRestart Jupyter

监视资源Monitor resources

启动 Yarn UI,如本文开头所示。Launch the Yarn UI as shown in the beginning of the article. 在屏幕顶部的“群集指标”表中,选中“已用内存” 和“内存总计” 列的值。In Cluster Metrics table on top of the screen, check values of Memory Used and Memory Total columns. 如果这 2 个值很接近,则可能资源不足,无法启动下一个应用程序。If the two values are close, there might not be enough resources to start the next application. 这同样适用于“已用 VCore” 和“VCore 总计” 列。The same applies to the VCores Used and VCores Total columns. 此外,在主视图中,如果有应用程序保持“已接受” 状态,而不转换为“正在运行” 或“失败” 状态,这也可能指示该应用程序未获得足够的资源来启动。Also, in the main view, if there's an application stayed in ACCEPTED state and not transitioning into RUNNING nor FAILED state, this could also be an indication that it isn't getting enough resources to start.

资源限制Resource Limit

终止正在运行的应用程序Kill running applications

  1. 在 Yarn UI 中,从左侧面板中,选择“正在运行” 。In the Yarn UI, from the left panel, select Running. 在正在运行的应用程序的列表中,确定要终止的应用程序,并选择“ID” 。From the list of running applications, determine the application to be killed and select the ID.

    终止 App1Kill App1

  2. 选择右上角的“终止应用程序” ,然后选择“确定” 。Select Kill Application on the top-right corner, then select OK.

    终止 App2Kill App2

另请参阅See also

适用于数据分析师For data analysts

适用于 Apache Spark 开发人员For Apache Spark developers