快速入门:在 iOS 上使用语音 SDK 通过 Objective-C 识别语音Quickstart: Recognize speech in Objective-C on iOS by using the Speech SDK

针对语音合成也提供了快速入门。Quickstarts are also available for speech synthesis.

本文介绍如何使用 Azure 认知服务语音 SDK 在 Objective-C 中创建 iOS 应用,以便将麦克风或包含录制音频的文件中的语音转录为文本。In this article, you learn how to create an iOS app in Objective-C by using the Azure Cognitive Services Speech SDK to transcribe speech to text from a microphone or from a file with recorded audio.

先决条件Prerequisites

准备工作:Before you get started, you'll need:

  • 语音服务的订阅密钥A subscription key for the Speech service.
  • Xcode 9.4.1 或更高版本的 macOS 计算机。A macOS machine with Xcode 9.4.1 or later.
  • 目标设置为 iOS 9.3 版或更高版本。The target set to iOS version 9.3 or later.
  • 需要语音 SDK 1.11.0 或更高版本。Speech SDK version 1.11.0 or later is required.

获取用于 iOS 的语音 SDKGet the Speech SDK for iOS

重要

下载任何 Azure 认知服务语音 SDK,即表示你已确认接受其许可条款。By downloading any of the Azure Cognitive Services Speech SDKs, you acknowledge its license. 有关详细信息,请参阅:For more information, see:

用于 iOS 的认知服务语音 SDK 目前以 Cocoa 框架形式分发。The Cognitive Services Speech SDK for iOS is currently distributed as a Cocoa framework. 它可从此网站下载。It can be downloaded from this website. 将文件下载到主目录。Download the file to your home directory.

创建 Xcode 项目Create an Xcode project

启动 Xcode,然后通过选择“文件” > “新建” > “项目”来启动新项目。Start Xcode, and start a new project by selecting File > New > Project. 在模板选择对话框中,选择“iOS 单一视图应用”模板。In the template selection dialog box, select the iOS Single View App template.

在随后的对话框中,进行以下选择。In the dialog boxes that follow, make the following selections.

  1. 在“项目选项”对话框中,执行以下操作:In the Project Options dialog box:

    1. 为快速入门应用输入一个名称,例如 helloworldEnter a name for the quickstart app, for example, helloworld.
    2. 如果已经有 Apple 开发人员帐户,请输入相应的组织名称和组织标识符。Enter an appropriate organization name and organization identifier if you already have an Apple developer account. 出于测试目的,请使用 testorg 之类的名称。For testing purposes, use a name like testorg. 若要对应用进行签名,需要适当的预配配置文件。To sign the app, you need a proper provisioning profile. 有关详细信息,请参阅 Apple 开发人员站点For more information, see the Apple developer site.
    3. 确保选择 Objective-C 作为项目的语言。Make sure Objective-C is selected as the language for the project.
    4. 清除所有用于测试和核心数据的复选框。Clear all the check boxes for tests and core data.

    项目设置

  2. 选择项目目录:Select a project directory:

    1. 选择用于放置项目的主目录。Choose your home directory to put the project in. 此步骤会在主目录中创建一个 helloworld 目录,其中包含 Xcode 项目的所有文件。This step creates a helloworld directory in your home directory that contains all the files for the Xcode project.

    2. 禁止创建适用于此示例项目的 Git 存储库。Disable the creation of a Git repo for this example project.

    3. 在项目设置屏幕上调整 SDK 的路径。Adjust the paths to the SDK on the project settings screen.

      1. 在“常规”选项卡的“嵌入式二进制文件”标头下 ,添加 SDK 库作为框架:选择“添加嵌入式二进制文件” > “添加其他”。On the General tab under the Embedded Binaries header, add the SDK library as a framework by selecting Add embedded binaries > Add other. 转到主目录,然后选择 MicrosoftCognitiveServicesSpeech.framework 文件。Go to your home directory and select the file MicrosoftCognitiveServicesSpeech.framework. 此操作会自动将 SDK 库添加到“链接的框架和库”标头。This action adds the SDK library to the header Linked Framework and Libraries automatically. 添加的框架Added framework
      2. 转到“生成设置”选项卡,选择“所有”设置。Go to the Build Settings tab, and select the All setting.
      3. 将目录 $(SRCROOT)/..添加Add the directory $(SRCROOT)/.. 到“搜索路径”标头下的“框架搜索路径”。to Framework Search Paths under the Search Paths heading.

      “框架搜索路径”设置

设置 UISet up the UI

示例应用有非常简单的 UI。The example app has a very simple UI. 它有两个用于从文件或麦克风输入启动语音识别的按钮,以及一个用于显示结果的文本标签。It has two buttons to start speech recognition either from file or from microphone input and a text label to display the result. 此 UI 在项目的 Main.storyboard 部分设置。The UI is set up in the Main.storyboard part of the project. 打开情节提要的 XML 视图,方法是:右键单击项目树的 Main.storyboard 条目,然后选择“打开为” > “源代码”。Open the XML view of the storyboard by right-clicking the Main.storyboard entry of the project tree and selecting Open As > Source Code.

将自动生成的 XML 替换为以下代码:Replace the autogenerated XML with this code:

<?xml version="1.0" encoding="UTF-8"?>
<document type="com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB" version="3.0" toolsVersion="14113" targetRuntime="iOS.CocoaTouch" propertyAccessControl="none" useAutolayout="YES" useTraitCollections="YES" useSafeAreas="YES" colorMatched="YES" initialViewController="BYZ-38-t0r">
    <device id="retina4_7" orientation="portrait">
        <adaptation id="fullscreen"/>
    </device>
    <dependencies>
        <deployment identifier="iOS"/>
        <plugIn identifier="com.apple.InterfaceBuilder.IBCocoaTouchPlugin" version="14088"/>
        <capability name="Safe area layout guides" minToolsVersion="9.0"/>
        <capability name="documents saved in the Xcode 8 format" minToolsVersion="8.0"/>
    </dependencies>
    <scenes>
        <!--View Controller-->
        <scene sceneID="tne-QT-ifu">
            <objects>
                <viewController id="BYZ-38-t0r" customClass="ViewController" sceneMemberID="viewController">
                    <view key="view" contentMode="scaleToFill" id="8bC-Xf-vdC">
                        <rect key="frame" x="0.0" y="0.0" width="375" height="667"/>
                        <autoresizingMask key="autoresizingMask" widthSizable="YES" heightSizable="YES"/>
                        <subviews>
                            <button opaque="NO" contentMode="scaleToFill" fixedFrame="YES" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="qFP-u7-47Q">
                                <rect key="frame" x="84" y="247" width="207" height="82"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="Start speech recognition from file" identifier="recognize_file_button">
                                    <accessibilityTraits key="traits" button="YES" staticText="YES"/>
                                    <bool key="isElement" value="YES"/>
                                </accessibility>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <state key="normal" title="Recognize (File)"/>
                                <connections>
                                    <action selector="recognizeFromFileButtonTapped:" destination="BYZ-38-t0r" eventType="touchUpInside" id="Vfr-ah-nbC"/>
                                </connections>
                            </button>
                            <label opaque="NO" userInteractionEnabled="NO" contentMode="center" horizontalHuggingPriority="251" verticalHuggingPriority="251" fixedFrame="YES" text="Recognition result" textAlignment="center" lineBreakMode="tailTruncation" numberOfLines="5" baselineAdjustment="alignBaselines" adjustsFontSizeToFit="NO" translatesAutoresizingMaskIntoConstraints="NO" id="tq3-GD-ljB">
                                <rect key="frame" x="20" y="408" width="335" height="148"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="The result of speech recognition" identifier="result_label">
                                    <accessibilityTraits key="traits" notEnabled="YES"/>
                                    <bool key="isElement" value="NO"/>
                                </accessibility>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <color key="textColor" red="0.5" green="0.5" blue="0.5" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
                                <nil key="highlightedColor"/>
                            </label>
                            <button opaque="NO" contentMode="scaleToFill" fixedFrame="YES" contentHorizontalAlignment="center" contentVerticalAlignment="center" buttonType="roundedRect" lineBreakMode="middleTruncation" translatesAutoresizingMaskIntoConstraints="NO" id="91d-Ki-IyR">
                                <rect key="frame" x="16" y="209" width="339" height="30"/>
                                <autoresizingMask key="autoresizingMask" flexibleMaxX="YES" flexibleMaxY="YES"/>
                                <accessibility key="accessibilityConfiguration" hint="Start speech recognition from microphone" identifier="recognize_microphone_button"/>
                                <fontDescription key="fontDescription" type="system" pointSize="30"/>
                                <state key="normal" title="Recognize (Microphone)"/>
                                <connections>
                                    <action selector="recognizeFromMicButtonTapped:" destination="BYZ-38-t0r" eventType="touchUpInside" id="2n3-kA-ySa"/>
                                </connections>
                            </button>
                        </subviews>
                        <color key="backgroundColor" red="1" green="1" blue="1" alpha="1" colorSpace="custom" customColorSpace="sRGB"/>
                        <viewLayoutGuide key="safeArea" id="6Tk-OE-BBY"/>
                    </view>
                    <connections>
                        <outlet property="recognitionResultLabel" destination="tq3-GD-ljB" id="kP4-o4-s0Q"/>
                    </connections>
                </viewController>
                <placeholder placeholderIdentifier="IBFirstResponder" id="dkx-z0-nzr" sceneMemberID="firstResponder"/>
            </objects>
            <point key="canvasLocation" x="135.19999999999999" y="132.68365817091455"/>
        </scene>
    </scenes>
</document>

添加示例代码Add the sample code

  1. 下载示例 wav 文件,方法是:右键单击链接,然后选择“将目标另存为”。Download the sample wav file by right-clicking the link and selecting Save target as. 将 wav 文件作为资源添加到项目,方法是将其从 Finder 窗口拖放到“项目”视图的根级别目录中。Add the wav file to the project as a resource by dragging it from a Finder window into the root level of the Project view. 在以下对话框中选择“完成”,不更改设置。Select Finish in the following dialog box without changing the settings.

  2. 将自动生成的 ViewController.m 文件的内容替换为以下代码:Replace the contents of the autogenerated ViewController.m file with the following code:

    #import "ViewController.h"
    #import <MicrosoftCognitiveServicesSpeech/SPXSpeechApi.h>
    
    @interface ViewController () {
        NSString *speechHost;
        NSString *speechKey;
    }
    
    @property (weak, nonatomic) IBOutlet UIButton *recognizeFromFileButton;
    @property (weak, nonatomic) IBOutlet UIButton *recognizeFromMicButton;
    @property (weak, nonatomic) IBOutlet UILabel *recognitionResultLabel;
    - (IBAction)recognizeFromFileButtonTapped:(UIButton *)sender;
    - (IBAction)recognizeFromMicButtonTapped:(UIButton *)sender;
    @end
    
    @implementation ViewController
    
    - (void)viewDidLoad {
        speechHost = @"wss://YourServiceRegion.stt.speech.azure.cn/";
        speechKey = @"YourSubscriptionKey";
    }
    
    - (IBAction)recognizeFromFileButtonTapped:(UIButton *)sender {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            [self recognizeFromFile];
        });
    }
    
    - (IBAction)recognizeFromMicButtonTapped:(UIButton *)sender {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            [self recognizeFromMicrophone];
        });
    }
    
    - (void)recognizeFromFile {
        NSBundle *mainBundle = [NSBundle mainBundle];
        NSString *weatherFile = [mainBundle pathForResource: @"whatstheweatherlike" ofType:@"wav"];
        NSLog(@"weatherFile path: %@", weatherFile);
        if (!weatherFile) {
            NSLog(@"Cannot find audio file!");
            [self updateRecognitionErrorText:(@"Cannot find audio file")];
            return;
        }
    
        SPXAudioConfiguration* weatherAudioSource = [[SPXAudioConfiguration alloc] initWithWavFileInput:weatherFile];
        if (!weatherAudioSource) {
            NSLog(@"Loading audio file failed!");
            [self updateRecognitionErrorText:(@"Audio Error")];
            return;
        }
    
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithHost:speechHost subscription:speechKey];
        if (!speechConfig) {
            NSLog(@"Could not load speech config");
            [self updateRecognitionErrorText:(@"Speech Config Error")];
            return;
        }
    
        [self updateRecognitionStatusText:(@"Recognizing...")];
    
        SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] initWithSpeechConfiguration:speechConfig audioConfiguration:weatherAudioSource];
        if (!speechRecognizer) {
            NSLog(@"Could not create speech recognizer");
            [self updateRecognitionResultText:(@"Speech Recognition Error")];
            return;
        }
    
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self updateRecognitionErrorText:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails ])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self updateRecognitionResultText:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self updateRecognitionErrorText:(@"Speech Recognition Error")];
        }
    }
    
    - (void)recognizeFromMicrophone {
        SPXSpeechConfiguration *speechConfig = [[SPXSpeechConfiguration alloc] initWithHost:speechHost subscription:speechKey];
        if (!speechConfig) {
            NSLog(@"Could not load speech config");
            [self updateRecognitionErrorText:(@"Speech Config Error")];
            return;
        }
    
        [self updateRecognitionStatusText:(@"Recognizing...")];
    
        SPXSpeechRecognizer* speechRecognizer = [[SPXSpeechRecognizer alloc] init:speechConfig];
        if (!speechRecognizer) {
            NSLog(@"Could not create speech recognizer");
            [self updateRecognitionResultText:(@"Speech Recognition Error")];
            return;
        }
    
        SPXSpeechRecognitionResult *speechResult = [speechRecognizer recognizeOnce];
        if (SPXResultReason_Canceled == speechResult.reason) {
            SPXCancellationDetails *details = [[SPXCancellationDetails alloc] initFromCanceledRecognitionResult:speechResult];
            NSLog(@"Speech recognition was canceled: %@. Did you pass the correct key/region combination?", details.errorDetails);
            [self updateRecognitionErrorText:([NSString stringWithFormat:@"Canceled: %@", details.errorDetails ])];
        } else if (SPXResultReason_RecognizedSpeech == speechResult.reason) {
            NSLog(@"Speech recognition result received: %@", speechResult.text);
            [self updateRecognitionResultText:(speechResult.text)];
        } else {
            NSLog(@"There was an error.");
            [self updateRecognitionErrorText:(@"Speech Recognition Error")];
        }
    }
    
    - (void)updateRecognitionResultText:(NSString *) resultText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.blackColor;
            self.recognitionResultLabel.text = resultText;
        });
    }
    
    - (void)updateRecognitionErrorText:(NSString *) errorText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.redColor;
            self.recognitionResultLabel.text = errorText;
        });
    }
    
    - (void)updateRecognitionStatusText:(NSString *) statusText {
        dispatch_async(dispatch_get_main_queue(), ^{
            self.recognitionResultLabel.textColor = UIColor.grayColor;
            self.recognitionResultLabel.text = statusText;
        });
    }
    
    @end
    
  3. 将字符串 YourSubscriptionKey 替换为你的订阅密钥。Replace the string YourSubscriptionKey with your subscription key.

  4. 将字符串 YourServiceRegion 替换为与订阅关联的区域Replace the string YourServiceRegion with the region associated with your subscription. 例如,将 chinaeast2 用于试用订阅。For example, use chinaeast2 for the trial subscription.

  5. 添加进行麦克风访问的请求。Add the request for microphone access. 右键单击项目树的 Info.plist 条目,然后选择“打开为” > “源代码” 。Right-click the Info.plist entry of the project tree, and select Open As > Source Code. 将以下代码行添加到 <dict> 节,然后保存文件。Add the following lines into the <dict> section, and then save the file.

    <key>NSMicrophoneUsageDescription</key>
    <string>Need microphone access for speech recognition from microphone.</string>
    

生成并运行示例Build and run the sample

  1. 使调试输出可见,方法是:选择“视图” > “调试区域” > “激活控制台”。Make the debug output visible by selecting View > Debug Area > Activate Console.

  2. 从“产品” > “目标”菜单中的列表中,选择 iOS 模拟器或连接到开发计算机的 iOS 设备作为应用的目标位置 。Choose either the iOS simulator or an iOS device connected to your development machine as the destination for the app from the list in the Product > Destination menu.

  3. 在 iOS 模拟器中生成并运行示例代码,方法是在菜单中选择“产品” > “运行”。Build and run the example code in the iOS simulator by selecting Product > Run from the menu. 也可选择“播放”按钮。You also can select the Play button.

  4. 选择应用中的“识别(文件)”按钮以后,应看到音频文件的内容“天气怎么样?”After you select the Recognize (File) button in the app, you should see the contents of the audio file "What's the weather like?" 显示在屏幕下部。on the lower part of the screen.

    模拟的 iOS 应用

  5. 选择应用中的“识别(麦克风)”按钮并讲几句话后,应在屏幕下方看到所述文本。After you select the Recognize (Microphone) button in the app and say a few words, you should see the text you have spoken on the lower part of the screen.

后续步骤Next steps