Hello! AudioStreamBasicDescription

Apple是如何定义Audio的

In Core Audio, the following definitions apply:

An audio stream is a continuous series of data that represents a sound, such as a song.

A channel is a discrete track of monophonic audio. A monophonic stream has one channel; a stereo stream has two channels.

A sample is single numerical value for a single audio channel in an audio stream.

A frame is a collection of time-coincident samples. For instance, a linear PCM stereo sound file has two samples per frame, one for the left channel and one for the right channel.

A packet is a collection of one or more contiguous frames. A packet defines the smallest meaningful set of frames for a given audio data format, and is the smallest data unit for which time can be measured. In linear PCM audio, a packet holds a single frame. In compressed formats, it typically holds more; in some formats, the number of frames per packet varies.

The sample rate for a stream is the number of frames per second of uncompressed (or, for compressed formats, the equivalent in decompressed) audio.

AudioStreamBasicDescription 结构

struct AudioStreamBasicDescription
{
    Float64             mSampleRate;
    AudioFormatID       mFormatID;
    AudioFormatFlags    mFormatFlags;
    UInt32              mBytesPerPacket;
    UInt32              mFramesPerPacket;
    UInt32              mBytesPerFrame;
    UInt32              mChannelsPerFrame;
    UInt32              mBitsPerChannel;
    UInt32              mReserved;
};
typedef struct AudioStreamBasicDescription  AudioStreamBasicDescription;

PCM时采样频率叫做sample rate。
每一次采样可以得到若干采样数据，对应多个channel。
每一个采样点得到的若干采样数据组合起来，叫做一个frame。
若干frame组合起来叫做一个packet。

AudioStreamBasicDescription 各字段的含义

mSampleRate

采样率,表示录音设备在单位时间内对声音信号进行了多少次采样，常用的采样率有 16000 32000 44100 等

AudioFormatID

采样数据的类型，PCM,AAC等

kAudioFormatLinearPCM               = 'lpcm',
kAudioFormatMPEG4AAC                = 'aac ',
kAudioFormatMPEGLayer3              = '.mp3',

mFormatFlags

描述AudioBufferList的格式

  kAudioFormatFlagIsFloat                     = (1U << 0),     // 0x1
    kAudioFormatFlagIsBigEndian                 = (1U << 1),     // 0x2
    kAudioFormatFlagIsSignedInteger             = (1U << 2),     // 0x4
    kAudioFormatFlagIsPacked                    = (1U << 3),     // 0x8
    kAudioFormatFlagIsAlignedHigh               = (1U << 4),     // 0x10
    kAudioFormatFlagIsNonInterleaved            = (1U << 5),     // 0x20
    kAudioFormatFlagIsNonMixable                = (1U << 6),     // 0x40

kAudioFormatFlagIsFloat

是否是浮点数，没有设置，默认是 int 类型

kAudioFormatFlagIsBigEndian

是否是大端，没有设置，默认是小端

kAudioFormatFlagIsSignedInteger

是否是 signed int，没有设置，默认是 unsigned int

kAudioFormatFlagIsPacked

是否mBitsPerChannel 会占满整个通道，如果没有占满，就会依高位对齐或低位对齐。
没有设置的时候，满足 ((mBitsPerSample / 8) * mChannelsPerFrame) == mBytesPerFrame 的条件，默认会设置此选项。

kAudioFormatFlagIsNonInterleaved

设置是否是平面类型，是否是交错类型。

音频数据的layout是分交错布局和平面布局，一个双声道音频数据为例则数据有两种布局的可能

交错布局:LRLRLR...
平面布局:

平面1 LLLLLL...
平面2 RRRRRR...

mChannelsPerFrame

描述音频文件的声道数。单声道 1 双声道 2 。这个值不能为0

mBitsPerChannel

每个音频样本的bit位数，1byte = 8bit，一般值为 8 16 32

mBytesPerFrame

每一音频帧中的字节数
计算方法

交错布局: mBytesPerFrame = mBitsPerChannel / 8 * mBitsPerChannel
平面布局: mBytesPerFrame = mBitsPerChannel / 8

mFramesPerPacket

一个数据包中的帧数，每个packet的帧数。如果是未压缩的音频数据，值是1。动态帧率格式，这个值是一个较大的固定数字，比如说AAC的1024。如果是动态大小帧数（比如Ogg格式）设置为0。

mBytesPerPacket

一个数据包中的字节数,mBytesPerPacket = mBytesPerFrame * mFramesPerPacket

mReserved

填充结构以强制统一 8 字节对齐。必须设置为 0