Azure Kubernetes 网络策略概述Azure Kubernetes Network Policies overview

网络策略为 Pod 提供微分段,就像网络安全组 (NSG) 为 VM 提供微分段一样。Network Policies provides micro-segmentation for pods just like Network Security Groups (NSGs) provide micro-segmentation for VMs. Azure 网络策略管理器(也称为 Azure NPM)实现支持标准的 Kubernetes 网络策略规范。The Azure Network Policy Manager (also known as Azure NPM) implementation supports the standard Kubernetes Network Policy specification. 可以使用标签来选择一组 Pod 并定义入口和出口规则的列表,以筛选从这些 Pod 出入的流量。You can use labels to select a group of pods and define a list of ingress and egress rules to filter traffic to and from these pods. Kubernetes 文档中详细了解 Kubernetes 网络策略。Learn more about the Kubernetes network policies in the Kubernetes documentation.

Kubernetes 网络策略概述

Azure NPM 实现可以与为容器提供 VNet 集成的 Azure CNI 配合使用。Azure NPM implementation works in conjunction with the Azure CNI that provides VNet integration for containers. 目前仅 Linux 支持 NPM。NPM is supported only on Linux today. 此实现通过根据定义的策略来配置 Linux IPTable 中的允许和拒绝规则,来强制执行流量筛选。The implementation enforces traffic filtering by configuring allow and deny IP rules in Linux IPTables based on the defined policies. 使用 Linux IPSet 将这些规则组合在一起。These rules are grouped together using Linux IPSets.

规划 Kubernetes 群集的安全性Planning security for your Kubernetes cluster

为群集实现安全性时,请使用网络安全组 (NSG) 来筛选出入群集子网的流量(北-南流量)。When implementing security for your cluster, use network security groups (NSGs) to filter traffic entering and leaving your cluster subnet (North-South traffic). 为群集中 Pod 之间的流量(东-西流量)使用 Azure NPM。Use Azure NPM for traffic between pods in your cluster (East-West traffic).

使用 Azure NPMUsing Azure NPM

可以通过下述方式使用 Azure NPM 为 Pod 提供微分段。Azure NPM can be used in the following ways to provide micro-segmentation for pods.

Azure Kubernetes 服务 (AKS)Azure Kubernetes Service (AKS)

NPM 在 AKS 中以本机方式提供,并且可在创建群集时启用。NPM is available natively in AKS and can be enabled at the time of cluster creation. 有关详细信息,请参阅在 Azure Kubernetes 服务 (AKS) 中使用网络策略保护 Pod 之间的流量Learn more about it in Secure traffic between pods using network policies in Azure Kubernetes Service (AKS).

AKS-engineAKS-engine

AKS-Engine 是一项工具,用于生成 Azure 资源管理器模板,以便在 Azure 中部署 Kubernetes 群集。AKS-Engine is a tool that generates an Azure Resource Manager template for the deployment of a Kubernetes cluster in Azure. 群集配置在 JSON 文件中指定,该文件在生成模板时传递给工具。The cluster configuration is specified in a JSON file that is passed to the tool when generating the template. 若要详细了解受支持的群集设置及其说明的完整列表,请参阅“Azure 容器服务引擎 - 群集定义”。To learn more about the entire list of supported cluster settings and their descriptions, see Azure Container Service Engine - Cluster Definition.

若要在使用 acs-engine 部署的群集上启用策略,请在群集定义文件中将 networkPolicy 设置的值指定为“azure”。To enable policies on clusters deployed using acs-engine, specify the value of the networkPolicy setting in the cluster definition file to be "azure".

配置示例Example configuration

下面的 JSON 示例配置使用 Azure CNI 创建了一个新的虚拟网络和子网,并在其中部署了 Kubernetes 群集。The below JSON example configuration creates a new virtual network and subnet, and deploys a Kubernetes cluster in it with Azure CNI. 我们建议你使用“记事本”来编辑此 JSON 文件。We recommend that you use "Notepad" to edit the JSON file.

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
         "networkPolicy": "azure"
       }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "<specify a cluster name>",
      "vmSize": "Standard_D2s_v3"
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool",
        "count": 2,
        "vmSize": "Standard_D2s_v3",
        "availabilityProfile": "AvailabilitySet"
      }
    ],
   "linuxProfile": {
      "adminUsername": "<specify admin username>",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "<cut and paste your ssh key here>"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "<enter the client ID of your service principal here >",
      "secret": "<enter the password of your service principal here>"
    }
  }
}

Azure 中的自制 (DIY) Kubernetes 群集Do it yourself (DIY) Kubernetes clusters in Azure

对于 DIY 群集,请先安装 CNI 插件,然后在群集中的每个虚拟机上启用它。For DIY clusters, first install the CNI plug-in and enable it on every virtual machine in a cluster. 如需详细说明,请参阅为自行部署的 Kubernetes 群集部署插件For detailed instructions, see Deploy the plug-in for a Kubernetes cluster that you deploy yourself.

部署群集以后,请运行下面的 kubectl 命令,以便下载 Azure NPM 守护程序集并将其应用到群集。Once the cluster is deployed run the following kubectl command to download and apply the Azure NPM daemon set to the cluster.

kubectl apply -f https://raw.githubusercontent.com/Azure/acs-engine/master/parts/k8s/addons/kubernetesmasteraddons-azure-npm-daemonset.yaml

此解决方案也是开源的,代码在 Azure 容器网络存储库中提供。The solution is also open source and the code is available on the Azure Container Networking repository.

使用 Azure NPM 监视和可视化网络配置Monitor and Visualize Network Configurations with Azure NPM

Azure NPM 包含信息丰富的 Prometheus 指标,可用于监视和更好地了解网络配置。Azure NPM includes informative Prometheus metrics that allow you to monitor and better understand your configurations. 它在 Azure 门户或 Grafana 实验室中提供了内置的可视化效果。It provides built-in visualizations in either the Azure portal or Grafana Labs. 你可以使用 Azure Monitor 或 Prometheus 服务器开始收集这些指标。You can start collecting these metrics using either Azure Monitor or a Prometheus Server.

Azure NPM 指标的优点Benefits of Azure NPM Metrics

以前,用户只能使用在群集节点内运行的命令 iptables -L 了解其网络配置,这会产生冗长且难以理解的输出。Users previously were only able to learn about their Network Configuration with the command iptables -L run inside a cluster node, which yields a verbose and difficult to understand output. NPM 指标具有以下与网络策略、IPTable 规则和 IPSet 相关的好处。NPM metrics provide the following benefits related to Network Policies, IPTables Rules, and IPSets.

  • 提供有关这三项与时间维度之间关系的见解,以对配置进行调试。Provides insight into the relationship between the three and a time dimension to debug a configuration.
  • 所有 IPSet 和每个 IPSet 中的条目数。Number of entries in all IPSets and each IPSet.
  • 应用具有 IPTable/IPSet 级别粒度的策略所花费的时间。Time taken to apply a policy with IPTable/IPSet level granularity.

支持的指标Supported Metrics

下面是支持的指标的列表:Following is the list of supported metrics:

标准名称Metric Name 说明Description Prometheus 指标类型Prometheus Metric Type 标签Labels
npm_num_policies 网络策略数number of network policies 仪表Gauge -
npm_num_iptables_rules IPTable 规则数number of IPTables rules 仪表Gauge -
npm_num_ipsets IPSet 数number of IPSets 仪表Gauge -
npm_num_ipset_entries 所有 IPSet 中的 IP 地址条目数number of IP address entries in all IPSets 仪表Gauge -
npm_add_policy_exec_time 用于添加网络策略的运行时runtime for adding a network policy 总结Summary 分位数(0.5、0.9 或 0.99)quantile (0.5, 0.9, or 0.99)
npm_add_iptables_rule_exec_time 用于添加 IPTable 规则的运行时runtime for adding an IPTables rule 总结Summary 分位数(0.5、0.9 或 0.99)quantile (0.5, 0.9, or 0.99)
npm_add_ipset_exec_time 用于添加 IPSet 的运行时runtime for adding an IPSet 总结Summary 分位数(0.5、0.9 或 0.99)quantile (0.5, 0.9, or 0.99)
npm_ipset_counts(高级)npm_ipset_counts (advanced) 每个 IPSet 中的条目数number of entries within each individual IPSet GaugeVecGaugeVec set name & hashset name & hash

“exec_time”指标中不同的分位数级别可帮助你区分一般情况和最坏情况。The different quantile levels in "exec_time" metrics help you differentiate between the general and worst case scenarios.

对于每个“exec_time”汇总指标,还有一个“exec_time_count”和“exec_time_sum”指标。There's also an "exec_time_count" and "exec_time_sum" metric for each "exec_time" Summary metric.

可以通过用于容器的 Azure Monitor 或 Prometheus 抓取指标。The metrics can be scraped through Azure Monitor for Containers or through Prometheus.

Azure Monitor 的设置Setup for Azure Monitor

第一步是为 Kubernetes 群集启用用于容器的 Azure Monitor。The first step is to enable Azure Monitor for containers for your Kubernetes cluster. 启用用于容器的 Azure Monitor 后,请配置用于容器的 Azure Monitor ConfigMap 以启用 NPM 集成和 Prometheus NPM 指标的收集。Once you have Azure Monitor for containers enabled, configure the Azure Monitor for containers ConfigMap to enable NPM integration and collection of Prometheus NPM metrics. 用于容器的 Azure Monitor ConfigMap 有一个 integrations 部分,其中具有用于收集 NPM 指标的设置。Azure monitor for containers ConfigMap has an integrations section with settings to collect NPM metrics. 默认情况下,这些设置在 ConfigMap 中处于禁用状态。These settings are disabled by default in the ConfigMap. 启用基本设置 collect_basic_metrics = true,将收集基本 NPM 指标。Enabling the basic setting collect_basic_metrics = true, will collect basic NPM metrics. 启用高级设置 collect_advanced_metrics = true 后除收集基本指标外,还收集高级指标。Enabling advanced setting collect_advanced_metrics = true will collect advanced metrics in addition to basic metrics.

编辑 ConfigMap 后,将其保存在本地,并按如下所示将 ConfigMap 应用到群集。After editing the ConfigMap, save it locally and apply the ConfigMap to your cluster as follows.

kubectl apply -f container-azm-ms-agentconfig.yaml 以下是用于容器的 Azure Monitor ConfigMap 的片段,其中显示了通过高级指标收集启用的 NPM 集成。kubectl apply -f container-azm-ms-agentconfig.yaml Below is a snippet from the Azure monitor for containers ConfigMap, which shows the NPM integration enabled with advanced metrics collection.

integrations: |-
    [integrations.azure_network_policy_manager]
        collect_basic_metrics = false
        collect_advanced_metrics = true

高级指标是可选的,打开它们将自动启用基本指标收集。Advanced metrics are optional, and turning them on will automatically turn on basic metrics collection. 高级指标当前仅包含 npm_ipset_countsAdvanced metrics currently include only npm_ipset_counts

Azure Monitor 的可视化选项Visualization Options for Azure Monitor

启用 NPM 指标收集后,可以使用容器见解或 Grafana 查看 Azure 门户中的指标。Once NPM metrics collection is enabled, you can view the metrics in the Azure portal using Container Insights or in Grafana.

在 Azure 门户中查看群集的“见解”Viewing in Azure portal under Insights for the cluster

打开 Azure 门户。Open Azure portal. 进入群集的“见解”后,导航至“工作簿”并打开“网络策略管理器 (NPM) 配置”。Once in your cluster's Insights, navigate to "Workbooks" and open "Network Policy Manager (NPM) Configuration".

除了查看工作簿(下图)之外,还可以在“见解”部分下的“日志”中直接查询 Prometheus 指标。Besides viewing the workbook (pictures below), you can also directly query the Prometheus metrics in "Logs" under the Insights section. 例如,此查询将返回所收集的所有指标。For example, this query will return all the metrics being collected. | 其中 TimeGenerated > ago(5h) | 其中名称包含“npm_”| where TimeGenerated > ago(5h) | where Name contains "npm_"

还可以直接向 Log Analytics 查询指标数据。You can also query Log Analytics directly for the metrics.

在 Grafana 仪表板中查看Viewing in Grafana Dashboard

设置 Grafana 服务器,并按照此处所述配置 Log Analytics 数据源。Set up your Grafana Server and configure a Log Analytics Data Source as described here. 然后,将带有 Log Analytics 后端的 Grafana 仪表板导入 Grafana 实验室。Then, import Grafana Dashboard with a Log Analytics backend into your Grafana Labs.

该仪表板具有类似于 Azure 工作簿的视觉对象。The dashboard has visuals similar to the Azure Workbook. 可以从 InsightsMetrics 表中添加面板以绘制图表并可视化 NPM 指标。You can add panels to chart & visualize NPM metrics from InsightsMetrics table.

安装 Prometheus 服务器Setup for Prometheus Server

一些用户可能选择使用 Prometheus 服务器而不是用于容器的 Azure Monitor 来收集指标。Some users may choose to collect metrics with a Prometheus Server instead of Azure Monitor for containers. 仅需在抓取配置中添加两个作业即可收集 NPM 指标。You merely need to add two jobs to your scrape config to collect NPM metrics.

若要安装简单的 Prometheus 服务器,请在群集上添加此 helm 存储库To install a simple Prometheus Server, add this helm repo on your cluster

helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update

然后添加服务器then add a server

helm install prometheus stable/prometheus -n monitoring \
--set pushgateway.enabled=false,alertmanager.enabled=false, \
--set-file extraScrapeConfigs=prometheus-server-scrape-config.yaml

其中 prometheus-server-scrape-config.yaml 包括where prometheus-server-scrape-config.yaml consists of

- job_name: "azure-npm-node-metrics"
  metrics_path: /node-metrics
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - source_labels: [__address__]
    action: replace
    regex: ([^:]+)(?::\d+)?
    replacement: "$1:10091"
    target_label: __address__
- job_name: "azure-npm-cluster-metrics"
  metrics_path: /cluster-metrics
  kubernetes_sd_configs:
  - role: service
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace]
    regex: kube-system
    action: keep
  - source_labels: [__meta_kubernetes_service_name]
    regex: npm-metrics-cluster-service
    action: keep
# Comment from here to the end to collect advanced metrics: number of entries for each IPSet
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: npm_ipset_counts
    action: drop

也可以将 azure-npm-node-metrics 作业替换为以下内容,或将其合并到 Kubernetes Pod 的现有作业中:You can also replace the azure-npm-node-metrics job with the content below or incorporate it into a pre-existing job for Kubernetes pods:

- job_name: "azure-npm-node-metrics-from-pod-config"
  metrics_path: /node-metrics
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace]
    regex: kube-system
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotationpresent_azure_npm_scrapeable]
    action: keep
  - source_labels: [__address__]
    action: replace
    regex: ([^:]+)(?::\d+)?
    replacement: "$1:10091"
    target_label: __address__

Prometheus 的可视化选项Visualization Options for Prometheus

使用 Prometheus 服务器时,仅支持 Grafana 仪表板。When using a Prometheus Server only Grafana Dashboard is supported.

请设置 Grafana 服务器并配置 Prometheus 数据源(如果尚未这样做)。If you haven't already, set up your Grafana Server and configure a Prometheus Data Source. 然后,将带有 Prometheus 后端的 Grafana 仪表板导入 Grafana 实验室。Then, import our Grafana Dashboard with a Prometheus backend into your Grafana Labs.

该仪表板的视觉对象与具有容器见解/Log Analytics 后端的仪表板相同。The visuals for this dashboard are identical to the dashboard with a Container Insights/Log Analytics backend.

示例仪表板Sample Dashboards

以下是容器见解 (CI) 和 Grafana 中 NPM 指标的一些示例仪表板Following are some sample dashboard for NPM metrics in Container Insights (CI) and Grafana

CI 汇总计数CI Summary Counts

Azure 工作簿汇总计数

一段时间内的 CI 计数CI Counts over Time

一段时间内的 Azure 工作簿计数 Azure Workbook counts over time

CI IPSet 条目数CI IPSet Entries

Azure 工作簿 IPSet 条目数 Azure Workbook IPSet entries

CI 运行时分位数CI Runtime Quantiles

Azure 工作簿运行时分位数

Grafana 仪表板汇总计数Grafana Dashboard Summary Counts

Grafana 仪表板汇总计数

一段时间内的 Grafana 仪表板计数Grafana Dashboard Counts over Time

一段时间内的 Grafana 仪表板计数 Grafana Dashboard counts over time

Grafana 仪表板 IPSet 条目数Grafana Dashboard IPSet Entries

Grafana 仪表板 IPSet 条目数 Grafana Dashboard IPSet entries

Grafana 仪表板运行时分位数Grafana Dashboard Runtime Quantiles

Grafana 仪表板运行时分位数 Grafana Dashboard runtime quantiles

后续步骤Next steps