方案:Azure HDInsight 中的“阻止跨源 API”导致 Jupyter 服务器 404“找不到”错误Scenario: Jupyter server 404 "Not Found" error due to "Blocking Cross Origin API" in Azure HDInsight

本文介绍在 Azure HDInsight 群集中使用 Apache Spark 组件时出现的问题的故障排除步骤和可能的解决方法。This article describes troubleshooting steps and possible resolutions for issues when using Apache Spark components in Azure HDInsight clusters.

问题Issue

访问 HDInsight 上的 Jupyter 服务时看到一个指出“找不到”的错误框。When you access the Jupyter service on HDInsight, you see an error box saying "Not Found". 如果检查 Jupyter 日志,会看到如下所示的内容:If you check the Jupyter logs, you will see something like this:

[W 2018-08-21 17:43:33.352 NotebookApp] 404 PUT /api/contents/PySpark/notebook.ipynb (10.16.0.144) 4504.03ms referer=https://pnhr01hdi-corpdir.msappproxy.net/jupyter/notebooks/PySpark/notebook.ipynb
Blocking Cross Origin API request.  
Origin: https://xxx.xxx.xxx, Host: hn0-pnhr01.j101qxjrl4zebmhb0vmhg044xe.ax.internal.chinacloudapp.cn:8001

此外,在 Jupyter 日志中的“源”字段中还可以看到一个 IP 地址。You may also see an IP address in the "Origin" field in the Jupyter log.

原因Cause

此错误可能由以下几种原因导致:This error can be caused by a couple things:

  • 如果已应用网络安全组 (NSG) 规则来限制对群集的访问:If you have configured Network Security Group (NSG) Rules to restricts access to the cluster. 使用 NSG 规则限制访问时,你仍可以使用 IP 地址(而不是群集名称)直接访问 Apache Ambari 和其他服务。Restricting access with NSG rules will still allow you to directly access Apache Ambari and other services using the IP address rather than the cluster name. 但是,在访问 Jupyter 时,可能会看到 404“找不到”错误。However, when accessing Jupyter, you could see a 404 "Not Found" error.

  • 如果为 HDInsight 网关指定了自定义的 DNS 名称而不是标准的 xxx.azurehdinsight.cnIf you have given your HDInsight gateway a customized DNS name other than the standard xxx.azurehdinsight.cn.

解决方法Resolution

  1. 在以下两个位置修改 jupyter.py 文件:Modify the jupyter.py files in these two places:

    /var/lib/ambari-server/resources/common-services/JUPYTER/1.0.0/package/scripts/jupyter.py
    /var/lib/ambari-agent/cache/common-services/JUPYTER/1.0.0/package/scripts/jupyter.py
    
  2. 找到显示了以下内容的行:NotebookApp.allow_origin='\"https://{2}.{3}\"' 将其更改为:NotebookApp.allow_origin='\"*\"'Find the line that says: NotebookApp.allow_origin='\"https://{2}.{3}\"' And change it to: NotebookApp.allow_origin='\"*\"'.

  3. 从 Ambari 重启 Jupyter 服务。Restart the Jupyter service from Ambari.

  4. 在命令提示符下键入 ps aux | grep jupyter 后,应会显示允许任何 URL 与该服务建立连接。Typing ps aux | grep jupyter at the command prompt should show that it allows for any URL to connect to it.

这种安全性比现有的设置更低。This is a less secure than the setting we already had in place. 但是,它会假设对群集的访问受到限制,并且允许外部的流量连接到群集,因为应用了 NSG。But it is assumed access to the cluster is restricted and that one from outside is allowed to connect to the cluster as we have NSG in place.

后续步骤Next steps

如果你的问题未在本文中列出,或者无法解决问题,请访问以下渠道以获取更多支持:If you didn't see your problem or are unable to solve your issue, visit the following channel for more support:

  • 如果需要更多帮助,可以从 Azure 门户提交支持请求。If you need more help, you can submit a support request from the Azure portal. 从菜单栏中选择“支持” ,或打开“帮助 + 支持” 中心。Select Support from the menu bar or open the Help + support hub.