什么是负责机器学习?What is responsible machine learning? (预览版)(preview)

本文介绍了什么是负责机器学习 (ML),以及如何使用 Azure 机器学习将其付诸实践。In this article, you'll learn what responsible machine learning (ML) is and ways you can put it into practice with Azure Machine Learning.

负责机器学习原则Responsible machine learning principles

在 AI 系统的整个开发和使用过程中,信任必须是核心。Throughout the development and use of AI systems, trust must be at the core. 具体包括信任平台、过程和模型。Trust in the platform, process, and models. 在 Microsoft,负责机器学习包含以下价值观和原则:At Microsoft, responsible machine learning encompasses the following values and principles:

  • 了解机器学习模型Understand machine learning models
    • 解释和说明模型行为Interpret and explain model behavior
    • 评估和缓解模型不公平性Assess and mitigate model unfairness
  • 保护人员及其数据Protect people and their data
    • 通过差异隐私防止数据泄露Prevent data exposure with differential privacy
    • 使用同态加密处理加密数据Work with encrypted data using homomorphic encryption
  • 控制端到端机器学习过程Control the end-to-end machine learning process
    • 用数据表记录机器学习生命周期Document the machine learning lifecycle with datasheets

负责的 ML 支柱 - 可解释性、差异隐私、同态加密、审核线索 - Azure 机器学习

随着人工智能和自治系统越来越多地融入社会结构,积极主动地努力预测和缓解这些技术带来的意外后果是很重要的。As artificial intelligence and autonomous systems integrate more into the fabric of society, it's important to proactively make an effort to anticipate and mitigate the unintended consequences of these technologies.

解释和说明模型行为Interpret and explain model behavior

难以解释的系统或非透明盒系统可能会出问题,因为利益干系人(如系统开发人员、监管人员、用户和业务决策者)难以理解系统做出某些决策的原因。Hard to explain or opaque-box systems can be problematic because it makes it hard for stakeholders like system developers, regulators, users, and business decision makers to understand why systems make certain decisions. 有些 AI 系统比其他系统更容易解释,有时需要在更高准确度的系统和更容易解释的系统之间进行取舍。Some AI systems are more explainable than others and there's sometimes a tradeoff between a system with higher accuracy and one that is more explainable.

若要生成可解释的 AI 系统,请使用 Microsoft 生成的开放源代码包 InterpretMLTo build interpretable AI systems, use InterpretML, an open-source package built by Microsoft. InterpretML 包支持多种可解释性技术,如 SHapley Additive exPlanations (SHAP)、mimic explainer 和 permutation feature importance (PFI)。The InterpretML package supports a wide variety of interpretability techniques such as SHapley Additive exPlanations (SHAP), mimic explainer and permutation feature importance (PFI). 可以在 Azure 机器学习中使用 InterpretML,以解释和说明机器学习模型,包括自动化机器学习模型InterpretML can be used inside of Azure Machine Learning to interpret and explain your machine learning models, including automated machine learning models.

提高机器学习模型中的公平性Mitigate fairness in machine learning models

随着 AI 系统越来越多地参与到社会的日常决策中,这些系统能够很好地为每个人提供公平的结果是极为重要的。As AI systems become more involved in the everyday decision-making of society, it's of extreme importance that these systems work well in providing fair outcomes for everyone.

AI 系统中的不公平性可能会导致以下意外后果:Unfairness in AI systems can result in the following unintended consequences:

  • 对个人隐瞒机会、资源或信息。Withholding opportunities, resources or information from individuals.
  • 强化偏见和刻板印象。Reinforcing biases and stereotypes.

公平性的许多方面无法通过指标来捕获或表示。Many aspects of fairness cannot be captured or represented by metrics. 有一些工具和做法可以在设计和开发时提高 AI 系统的公平性。There are tools and practices that can improve fairness in the design and development of AI systems.

减少 AI 系统的不公平性的两个关键步骤是评估和缓解。Two key steps in reducing unfairness in AI systems are assessment and mitigation. 建议使用 FairLearn,这是一个开放源代码包,可评估和缓解 AI 系统的潜在不公平性。We recommend FairLearn, an open-source package that can assess and mitigate the potential unfairness of AI systems. 若要详细了解公平性和 FairLearn 包,请参阅 ML 中的公平性一文。To learn more about fairness and the FairLearn package, see the Fairness in ML article.

通过差异隐私防止数据泄露Prevent data exposure with differential privacy

当数据用于分析时,重要的是数据在整个使用过程中保持私密和机密性。When data is used for analysis, it's important that the data remains private and confidential throughout its use. 差异隐私是一组系统和做法,有助于保持个人数据的安全性和私密性。Differential privacy is a set of systems and practices that help keep the data of individuals safe and private.

在传统方案中,原始数据存储在文件和数据库中。In traditional scenarios, raw data is stored in files and databases. 用户通常在分析数据时使用原始数据。When users analyze data, they typically use the raw data. 这是一个问题,因为可能会侵犯个人隐私。This is a concern because it might infringe on an individual's privacy. 差异隐私尝试通过向数据添加“噪音”或随机性来处理此问题,这样用户就无法识别任何单独的数据点。Differential privacy tries to deal with this problem by adding "noise" or randomness to the data so that users can't identify any individual data points.

实现差异隐私系统较为棘手。Implementing differentially private systems is difficult. SmartNoise 是一个开放源代码项目,其中包含用于生成全局差异隐私系统的不同组件。SmartNoise is an open-source project that contains different components for building global differentially private systems. 若要详细了解差异隐私和 SmartNoise 项目,请参阅使用差异隐私和 SmartNoise 保护数据隐私一文。To learn more about differential privacy and the SmartNoise project, see the preserve data privacy by using differential privacy and SmartNoise article.

使用同态加密处理加密数据Work on encrypted data with homomorphic encryption

在传统的云存储和计算解决方案中,云需要对客户数据具有未加密的访问权限,才能对其进行计算。In traditional cloud storage and computation solutions, the cloud needs to have unencrypted access to customer data to compute on it. 此访问权限会将数据公开给云操作员。This access exposes the data to cloud operators. 数据隐私依赖于客户信任的由云实现的访问控制策略。Data privacy relies on access control policies implemented by the cloud and trusted by the customer.

使用同态加密,无需访问机密(解密)密钥即可对加密数据进行计算。Homomorphic encryption allows for computations to be done on encrypted data without requiring access to a secret (decryption) key. 计算的结果已加密,只能由密钥的所有者公开。The results of the computations are encrypted and can be revealed only by the owner of the secret key. 使用同态加密,云操作员对其存储和计算的数据不会具有未加密的访问权限。Using homomorphic encryption, cloud operators will never have unencrypted access to the data they're storing and computing on. 系统会直接对加密数据进行计算。Computations are performed directly on encrypted data. 数据隐私依赖于先进的加密技术,并且数据所有者控制所有信息的发布。Data privacy relies on state-of-the-art cryptography, and the data owner controls all information releases. 有关 Microsoft 的同态加密的详细信息,请参阅 Microsoft ResearchFor more information on homomorphic encryption at Microsoft, see Microsoft Research.

若要开始在 Azure 机器学习中使用同态加密,请使用 Microsoft SEAL加密推理 Python 绑定。To get started with homomorphic encryption in Azure Machine Learning, use the encrypted-inference Python bindings for Microsoft SEAL. Microsoft SEAL 是一种开放源代码同态加密库,允许对经过加密的整数或实数进行加法和乘法计算。Microsoft SEAL is an open-source homomorphic encryption library that allows additions and multiplications to be performed on encrypted integers or real numbers. 若要了解有关 Microsoft SEAL 的详细信息,请参阅 Azure 体系结构中心Microsoft Research 项目页To learn more about Microsoft SEAL, see the Azure Architecture Center or the Microsoft Research project page.

请参阅以下示例,了解如何在 Azure 机器学习中部署加密推理 Web 服务See the following sample to learn how to deploy an encrypted inferencing web service in Azure Machine Learning.

用数据表记录机器学习生命周期Document the machine learning lifecycle with datasheets

在机器学习过程中记录正确的信息是在每个阶段做出负责决策的关键所在。Documenting the right information in the machine learning process is key to making responsible decisions at each stage. 使用数据表,可以记录在机器学习生命周期内使用和创建的机器学习资产。Datasheets are a way to document machine learning assets that are used and created as part of the machine learning lifecycle.

模型往往被认为是“非透明盒”,而且往往很少有关于它们的信息。Models tend to be thought of as "opaque boxes" and often there is little information about them. 由于机器学习系统变得越来越普遍,且被用于做出决策,因此使用数据表就是向开发更负责的机器学习系统迈进。Because machine learning systems are becoming more pervasive and are used for decision making, using datasheets is a step towards developing more responsible machine learning systems.

不妨在数据表中记录一些模型信息:Some model information you might want to document as part of a datasheet:

  • 预期用途Intended use
  • 模型体系结构Model architecture
  • 使用的训练数据Training data used
  • 使用的评估数据Evaluation data used
  • 训练模型性能指标Training model performance metrics
  • 公平性信息。Fairness information.

请参阅下面的示例,了解如何使用 Azure 机器学习 SDK 来实现模型数据表See the following sample to learn how to use the Azure Machine Learning SDK to implement datasheets for models.

其他资源Additional resources