3.AI语音转文本——Flask从制作到起飞，零件级颗粒度制造

写在前面的话：
作者是一名终身学习者，横跨环境、教育和IT三个行业。
IT是当前正在精进的行业，作者相信专业精神，崇尚知行合一。
作者以这个系列文章向每一个脚踏实地的web开发者致敬，希望能写出高度实用又有深度的文章帮路上的你清除障碍，欢迎你的指正和技术交流。

1. AI 工具准备

语音转文本是AI的一个子领域，本文使用百度提供的免费接口实现，专注在web server的开发上。
首先在百度AI注册相关账号，在应用列表中创建应用，这里起名为asr_test，asr是automatic speech recognition的缩写。

image

这时可以获得AppID, APIKey, Secret Key，作为后续web server功能的原材料。

2. package 管理

首先安装百度语音的相关SDK，同时更新requirements.txt，会发现baidu-aip会依赖其他的一些package，例如requests, urllib3。

$ pip install baidu-aip
$ pip freeze > requirements.txt

如果想要更好的了解package的依赖关系，可以使用pipdeptree package辅助：

$ pip install pipdeptree
$ pipdeptree
baidu-aip==2.2.10.0
  - requests [required: Any, installed: 2.21.0]
    - certifi [required: >=2017.4.17, installed: 2018.11.29]
    - chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4]
    - idna [required: >=2.5,<2.9, installed: 2.8]
    - urllib3 [required: >=1.21.1,<1.25, installed: 1.24.1]
Flask==1.0.2
  - click [required: >=5.1, installed: 7.0]
  - itsdangerous [required: >=0.24, installed: 1.1.0]
  - Jinja2 [required: >=2.10, installed: 2.10]
    - MarkupSafe [required: >=0.23, installed: 1.1.0]
  - Werkzeug [required: >=0.14, installed: 0.14.1]
pipdeptree==0.13.1
  - pip [required: >=6.0.0, installed: 18.1]
pytest-cov==2.6.0
  - coverage [required: >=4.4, installed: 4.5.2]
  - pytest [required: >=2.9, installed: 4.0.2]
    - atomicwrites [required: >=1.0, installed: 1.2.1]
    - attrs [required: >=17.4.0, installed: 18.2.0]
    - more-itertools [required: >=4.0.0, installed: 4.3.0]
      - six [required: >=1.0.0,<2.0.0, installed: 1.12.0]
    - pluggy [required: >=0.7, installed: 0.8.0]
    - py [required: >=1.5.0, installed: 1.7.0]
    - setuptools [required: Any, installed: 40.6.3]
    - six [required: >=1.10.0, installed: 1.12.0]
wheel==0.32.3
$ pip freeze > requirements.txt

pipdeptree可以清晰说明package的依赖关系，当前开发主要安装了两个package，一个是Flask，另一个是pytest-cov。Flask核心是两个package，Jinja2提供前端模板渲染，Werkzeug提供uWSGI服务。pytest-cov的代码覆盖率由coverage实现，test由pytest实现。

3. config 设置

然后搭建config体系，用于设置连接百度 AI 的相关参数，主要分为public和private两个部分。public部分在git版本控制中，配置应用于所有开发者的参数和无需保密的参数，例如debug和数据库的查询打印开关。private部分不在git版本控制中，配置私密参数，例如密钥和盐。

3.1 public config

在根路径下创建config.py，用来加载基本运行参数，目录结构和配置参数如下：

- FlaskTemplate
    - .circleci
        - config.yml
    - server
        - __init__.py
        - core.py
    - tests
        - unit_tests
            - __init__.py
            - test_index.py
        - __init__.py
    - venv
    - .gitignore  # 非git版本管理文件
    - config.py  # git版本管理配置参数
    - README.md  # 创建git仓库时选择生成的说明文档
    - requirements.txt  # 项目 package 安装说明

# -*- coding: utf8 -*-

DEBUG = False  # 非调试模式
SQLALCHEMY_ECHO = False  # 不输出数据库相关echo，未来连接数据库时使用

3.2 private config

在根路径下创建 instance directory，因为 Flask 默认 instance/ 下面加载 private config 文件。在 instance 路径下创建 defalut, development, production, staging 4个 python 文件，default 一般用于 local 开发，development 一般用于 dev 开发，produciton 一般用于线上生产，staging 一般用于新版本前测试。

- FlaskTemplate
    - .circleci
        - config.yml
    - instance
        - __init__.py
        - default.py  # local 配置
        - development.py  # dev 配置
        - production.py  # prod 配置
        - staging.py  # staging 配置
    - server
        - __init__.py
        - core.py
    - tests
        - unit_tests
            - __init__.py
            - test_index.py
        - __init__.py
    - venv
    - .gitignore  # 非git版本管理文件
    - config.py  # git版本管理配置参数
    - README.md  # 创建git仓库时选择生成的说明文档
    - requirements.txt  # 项目 package 安装说明

# default.py

# -*- coding: utf8 -*-

# Default values, to be used for all environments or overridden by individual environments.
# An example might be setting DEBUG = False in config/default.py and DEBUG = True in config/development.py.

DEBUG = True
SQLALCHEMY_ECHO = True

# Baidu Automatic Speech Recognition
APP_ID = "xxx"
API_KEY = "xxx"
SECRET_KEY = "xxx"
# 上面三个参数是在本文第一部分创建百度 AI 应用获取的参数

4. 项目启动更新

随着 config 的引入，项目的启动方式也需要更新，一来要添加配置和运行实例的绑定，二来要适应不同环境不同方式的启动。所以将过去的 core.py 的功能进行拆分，项目启动的代码拆分到 run.py 放在根路径中，项目配置的代码保留在 core.py 文件中，引入 config 和项目实例的绑定。

- FlaskTemplate
    - .circleci
        - config.yml
    - instance
        - __init__.py
        - default.py  # local 配置
        - development.py  # dev 配置
        - production.py  # prod 配置
        - staging.py  # staging 配置
    - server
        - __init__.py
        - core.py
    - tests
        - unit_tests
            - __init__.py
            - test_index.py
        - __init__.py
    - venv
    - .gitignore  # 非git版本管理文件
    - config.py  # git版本管理配置参数
    - README.md  # 创建git仓库时选择生成的说明文档
    - requirements.txt  # 项目 package 安装说明
    - run.py  # 项目启动文件

# core.py

# -*- coding: utf8 -*-

from flask import Flask


def create_app():
    # instance_relative_config 默认为False，设为True的时候允许 public config 文件被 instance 下的 private config 配置覆盖
    app = Flask(__name__, instance_relative_config=True)  
    # load public default configuration
    app.config.from_object('config')
    # load private default configuration
    app.config.from_pyfile('default.py')

    @app.route('/')
    def index():
        return "<h1>This is an index page.<h1/>"

    return app

# run.py

# -*- coding: utf8 -*-

from server.core import create_app

app = create_app()  # 放在这里的原因是方便后续服务器对项目的启动

# 通常作为本地开发项目启动的入口
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

启动项目检验设置情况：

Connected to pydev debugger (build 182.4505.26)
 * Serving Flask app "server.core" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
 * Restarting with stat
pydev debugger: process 68382 is connecting

 * Debugger is active!
 * Debugger PIN: 335-533-553

debug mode 处于开启状态，符合配置需求，虽然 public config 设置 debug 为 False，但 private config 设置 debug 为 True，配置正确。

5. instance_relative_config源码解析

Flask能够更新config配置，是由创建运行实例时的 instance_relative_config 参数控制实现的。

flask/app.py

class Flask(_PackageBoundObject):
    :param instance_relative_config: if set to ``True`` relative filenames for loading the config are assumed to be relative to the instance path instead of the application root.
    # 这个注释解释了instance_relative_config的作用是确定config加载地址，如果为 True，则加载 instance/ 下面的 private config
    
    # instance_relative_config 默认是 False，即 app 的 config 默认由 root 路径下的 public config 加载。
    def __init__(
        self,
        import_name,
        static_url_path=None,
        static_folder='static',
        static_host=None,
        host_matching=False,
        subdomain_matching=False,
        template_folder='templates',
        instance_path=None,
        instance_relative_config=False,
        root_path=None
    ):
        #: The configuration dictionary as :class:`Config`.  This behaves
        #: exactly like a regular dictionary but supports additional methods
        #: to load a config from files.
        self.config = self.make_config(instance_relative_config)
    
    def make_config(self, instance_relative=False):
        """Used to create the config attribute by the Flask constructor.
        The `instance_relative` parameter is passed in from the constructor
        of Flask (there named `instance_relative_config`) and indicates if
        the config should be relative to the instance path or the root path
        of the application.

        .. versionadded:: 0.8
        """
        root_path = self.root_path
        if instance_relative:  # 判定 config 的加载路径
            root_path = self.instance_path
        defaults = dict(self.default_config)
        defaults['ENV'] = get_env()  # 设置 ENV 参数
        defaults['DEBUG'] = get_debug_flag()
        return self.config_class(root_path, defaults)

上面是源码相关部分的简化版本，详细 app 的所有参数可以参看源码。

6. Blueprint 设置

接着开发 AI 语音转文本的具体实现，从后续功能扩展开发的角度考虑，使用 Blueprint 的配置更适合管理代码结构。先修改 core.py 下的蓝图加载

core.py 

# -*- coding: utf8 -*-

from flask import Flask


def create_app():
    app = Flask(__name__, instance_relative_config=True)
    # load public default configuration
    app.config.from_object('config')
    # load private default configuration
    app.config.from_pyfile('default.py')

    setup_blueprints(app)  # 加载蓝图

    @app.route('/')
    def index():
        return "<h1>This is an index page.<h1/>"

    return app


def setup_blueprints(app):
    from server.AI.view import blueprint as AI  # 使用 MVC 结构在 server 中为 AI 功能创建相应的 view controller

    # 蓝图参数的配置 list
    blueprints = [
        {'handler': AI, 'url_prefix': '/AI'}
    ]

    # 循环加载服务中的所有蓝图到 app 实例中
    for bp in blueprints:
        app.register_blueprint(bp['handler'], url_prefix=bp['url_prefix'])

7. AI 语音转文本

有了前面的配置铺垫，相应功能有了结构支撑，可以快速开发完成。在 server 路径下新建 AI 路径，用来处理所有 AI 的相关功能，在 AI 路径下，创建 view.py 用来处理视图功能，创建 controller.py 用来处理逻辑，暂时不引入 model 层，简化功能的实现。

- FlaskTemplate
    - .circleci
        - config.yml
    - instance
        - __init__.py
        - default.py  # local 配置
        - development.py  # dev 配置
        - production.py  # prod 配置
        - staging.py  # staging 配置
    - server
        - AI
            - __init__.py
            - controller.py
            - view.py
        - __init__.py
        - core.py
    - tests
        - unit_tests
            - __init__.py
            - test_index.py
        - __init__.py
    - venv
    - .gitignore  # 非git版本管理文件
    - config.py  # git版本管理配置参数
    - README.md  # 创建git仓库时选择生成的说明文档
    - requirements.txt  # 项目 package 安装说明
    - run.py  # 项目启动文件

# -*- coding: utf8 -*-

from flask import Blueprint

from server.AI import controller

blueprint = Blueprint('AI', __name__)  # 生成 AI 的蓝图实例，在 core.py 会 import 并在 app 中 register


@blueprint.route('/asr')
def asr():
    rsp = controller.asr().pop()  # 调用 controller 中的处理逻辑，完成相关功能

    return rsp

# -*- coding: utf8 -*-

from flask import current_app  # 引入当前实例

from aip import AipSpeech  # 引入百度 AI 的工具

from uploads import uploads_path

# 读取本地音频文件内容
def read_file(path):
    """
    read file content
    :param path: 
    :return: 
    """
    with open(path, 'rb') as f:
        return f.read()


def asr():
    """
    automatic speech recognition
    :return: 
    """
    # Baidu Cloud AI, get config params
    app_id = current_app.config['APP_ID']
    api_key = current_app.config['API_KEY']
    secret_key = current_app.config['SECRET_KEY']

    # 生成百度 AI 实例连接的 client
    client = AipSpeech(app_id, api_key, secret_key)
    # 
    def asr():
    """
    automatic speech recognition
    :return: 
    """
    # Baidu Cloud AI
    app_id = current_app.config['APP_ID']
    api_key = current_app.config['API_KEY']
    secret_key = current_app.config['SECRET_KEY']

    # 创建连接百度 AI 的 client
    client = AipSpeech(app_id, api_key, secret_key)
    # 连接 AI 模型返回语音转文本的结果
    rsp = client.asr(read_file(f'{uploads_path}/stock.wav'), 'wav', 16000, {'dev_pid': 1536})

    return rsp['result']

这一期的开发不涉及前端录音的功能，先实现本地录音的语音转文本功能，本地语音文件放在根目录下的 uploads 下，设置如下：

- FlaskTemplate
    - .circleci
        - config.yml
    - instance
        - __init__.py
        - default.py  # local 配置
        - development.py  # dev 配置
        - production.py  # prod 配置
        - staging.py  # staging 配置
    - server
        - AI
            - __init__.py
            - controller.py
            - view.py
        - __init__.py
        - core.py
    - tests
        - unit_tests
            - __init__.py
            - test_index.py
        - __init__.py
    - uploads
        - __init__.py  # 存放当前路径的变量
        - stock.wav  # 语音转文本本地 demo 文件
    - venv
    - .gitignore  # 非git版本管理文件
    - config.py  # git版本管理配置参数
    - README.md  # 创建git仓库时选择生成的说明文档
    - requirements.txt  # 项目 package 安装说明
    - run.py  # 项目启动文件

因为文件使用需要相应路径，所以在 uploads/init.py 中写入了获取当前路径的变量

# -*- coding: utf8 -*-

import os

# 获取当前路径
uploads_path = os.path.abspath(os.path.dirname(__file__))

8. 实现

通过上面的开发，实现了对项目中根路径下 uploads/stock.wav 的自动语音转文本的功能。