保存和写入 Azure 机器学习试验文件的位置Where to save and write files for Azure Machine Learning experiments

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍在试验中保存输入文件以及写入输出文件的位置,以防止出现存储空间上限错误和试验延迟。In this article, you learn where to save input files, and where to write output files from your experiments to prevent storage limit errors and experiment latency.

计算目标上启动训练运行时,它们会与外部环境隔离。When launching training runs on a compute target, they are isolated from outside environments. 此设计的目的是确保试验的可再现性和可移植性。The purpose of this design is to ensure reproducibility and portability of the experiment. 如果在相同或不同的计算目标上运行同一脚本两次,可获得相同的结果。If you run the same script twice, on the same or another compute target, you receive the same results. 在此设计中,可将计算目标视为无状态计算资源,其中每个资源与完成后运行的作业无关联。With this design, you can treat compute targets as stateless computation resources, each having no affinity to the jobs that are running after they are finished.

保存输入文件的位置Where to save input files

在对计算目标或本地计算机启动试验之前,必须确保所需文件对该计算目标可用,例如需要运行的代码的依赖项文件和数据文件。Before you can initiate an experiment on a compute target or your local machine, you must ensure that the necessary files are available to that compute target, such as dependency files and data files your code needs to run.

Azure 机器学习通过复制整个源目录来运行训练脚本。Azure Machine Learning runs training scripts by copying the entire source directory. 如果你有不想上传的敏感数据,请使用 .ignore 文件或不将其包含在源目录中。If you have sensitive data that you don't want to upload, use a .ignore file or don't include it in the source directory . 改为使用数据存储来访问数据。Instead, access your data using a datastore.

试验快照的存储空间上限为 300 MB 和/或 2000 个文件。The storage limit for experiment snapshots is 300 MB and/or 2000 files.

因此,我们建议:For this reason, we recommend:

  • 将文件存储在 Azure 机器学习数据存储中。Storing your files in an Azure Machine Learning datastore. 这样能防止出现试验延迟问题,并具有从远程计算目标访问数据的优点,这意味着身份验证和装载工作都由 Azure 机器学习管理。This prevents experiment latency issues, and has the advantages of accessing data from a remote compute target, which means authentication and mounting are managed by Azure Machine Learning. 请参阅访问数据存储中的数据一文,详细了解如何将数据存储指定为源目录,以及如何将文件上传到数据存储。Learn more about specifying a datastore as your source directory, and uploading files to your datastore in the Access data from your datastores article.

  • 如果只需要几个数据文件和依赖项脚本且无法使用数据存储,请将文件放在与训练脚本相同的文件夹目录中。If you only need a couple data files and dependency scripts and can't use a datastore, place the files in the same folder directory as your training script. 在训练脚本或者调用训练脚本的代码中直接将此文件夹指定为 source_directorySpecify this folder as your source_directory directly in your training script, or in the code that calls your training script.

试验快照的存储空间上限Storage limits of experiment snapshots

Azure 机器学习会根据配置运行时所选择的目录自动为试验制作代码的试验快照。For experiments, Azure Machine Learning automatically makes an experiment snapshot of your code based on the directory you suggest when you configure the run. 此试验快照的总空间上限为 300 MB 和/或 2000 个文件。This has a total limit of 300 MB and/or 2000 files. 如果超出限制,将看到以下错误:If you exceed this limit, you'll see the following error:

While attempting to take snapshot of .
Your total snapshot size exceeds the limit of 300.0 MB

要解决此错误,请将试验文件存储在数据存储中。To resolve this error, store your experiment files on a datastore. 如果无法使用数据存储,下表提供了一些可以考虑的替代解决方案。If you can't use a datastore, the below table offers possible alternate solutions.

试验说明Experiment description 存储空间上限解决方案Storage limit solution
上限低于 2000 个文件且无法使用数据存储Less than 2000 files & can't use a datastore 使用该方法重写快照大小限制Override snapshot size limit with
azureml._restclient.snapshots_client.SNAPSHOT_MAX_SIZE_BYTES = 'insert_desired_size'
这可能需要数分钟的时间,具体取决于文件的数量和大小。This may take several minutes depending on the number and size of files.
必须使用特定的脚本目录Must use specific script directory 为了防止在快照中包含不必要的文件,请在目录中创建 ignore 文件(.gitignore.amlignore)。To prevent unnecessary files from being included in the snapshot, make an ignore file (.gitignore or .amlignore) in the directory. 将要排除的文件和目录添加到此文件中。Add the files and directories to exclude to this file. 有关此文件中使用的语法的详细信息,请参阅 .gitignore语法和模式For more information on the syntax to use inside this file, see syntax and patterns for .gitignore. .amlignore 文件使用相同的语法。The .amlignore file uses the same syntax. 如果这两个文件都存在,则 .amlignore 文件的优先级更高。If both files exist, the .amlignore file takes precedence.
管道Pipeline 在每个步骤中使用不同的子目录Use a different subdirectory for each step
Jupyter 笔记本Jupyter notebooks 创建 .amlignore 文件或将笔记本移动到新的空子目录,然后再次运行代码。Create a .amlignore file or move your notebook into a new, empty, subdirectory and run your code again.

写入文件的位置Where to write files

由于训练实验具有隔离性,所以在运行期间发生的文件更改并不一定在你的环境之外保留。Due to the isolation of training experiments, the changes to files that happen during runs are not necessarily persisted outside of your environment. 如果脚本在本地修改了文件以进行计算,那么,既不会为下一次试验运行持久保存更改,也不会自动将更改传播回客户端计算机。If your script modifies the files local to compute, the changes are not persisted for your next experiment run, and they're not propagated back to the client machine automatically. 因此在第一次试验运行期间所做的更改不应影响第二次试验运行中的更改。Therefore, the changes made during the first experiment run don't and shouldn't affect those in the second.

写入更改时,建议将文件写入 Azure 机器学习数据存储。When writing changes, we recommend writing files to an Azure Machine Learning datastore. 请参阅访问数据存储中的数据See Access data from your datastores.

如果不需要使用数据存储,请将文件写入 ./outputs 和/或 ./logs 文件夹。If you don't require a datastore, write files to the ./outputs and/or ./logs folder.

重要

“outputs”和“logs”两个文件夹接收 Azure 机器学习的特殊处理 。Two folders, outputs and logs, receive special treatment by Azure Machine Learning. 在训练期间,如果将文件写入 ./outputs./logs 文件夹,则会将这些文件自动上传到运行历史记录,以便在完成运行后对其具有访问权限。During training, when you write files to./outputs and./logs folders, the files will automatically upload to your run history, so that you have access to them once your run is finished.

  • 对于诸如状态消息或评分结果这样的输出,请将文件写入 ./outputs 文件夹,以便将它们作为项目持久保存在运行历史记录中。For output such as status messages or scoring results, write files to the ./outputs folder, so they are persisted as artifacts in run history. 请注意写入到此文件夹中的文件数量和文件大小,因为在将内容上传到运行历史记录时可能会出现延迟。Be mindful of the number and size of files written to this folder, as latency may occur when the contents are uploaded to run history. 如果需要考虑延迟,则建议将文件写入数据存储。If latency is a concern, writing files to a datastore is recommended.

  • 若要将写入的文件以日志形式保存在运行历史记录中,请将文件写入 ./logs 文件夹。To save written file as logs in run history, write files to ./logs folder. 日志是实时上传的,所以此方法适用于从远程运行流式传输实时更新。The logs are uploaded in real time, so this method is suitable for streaming live updates from a remote run.

后续步骤Next steps