Apple是如何定义Audio的
In Core Audio, the following definitions apply:
- An audio stream is a continuous series of data that represents a sound, such as a song.
- A channel is a discrete track of monophonic audio. A monophonic stream has one channel; a stereo stream has two channels.
- A sample is single numerical value for a single audio channel in an audio stream.
- A frame is a collection of time-coincident samples. For instance, a linear PCM stereo sound file has two samples per frame, one for the left channel and one for the right channel.
- A packet is a collection of one or more contiguous frames. A packet defines the smallest meaningful set of frames for a given audio data format, and is the smallest data unit for which time can be measured. In linear PCM audio, a packet holds a single frame. In compressed formats, it typically holds more; in some formats, the number of frames per packet varies.
- The sample rate for a stream is the number of frames per second of uncompressed (or, for compressed formats, the equivalent in decompressed) audio.
AudioStreamBasicDescription 结构
struct AudioStreamBasicDescription
{
Float64 mSampleRate;
AudioFormatID mFormatID;
AudioFormatFlags mFormatFlags;
UInt32 mBytesPerPacket;
UInt32 mFramesPerPacket;
UInt32 mBytesPerFrame;
UInt32 mChannelsPerFrame;
UInt32 mBitsPerChannel;
UInt32 mReserved;
};
typedef struct AudioStreamBasicDescription AudioStreamBasicDescription;
PCM时采样频率叫做sample rate。
每一次采样可以得到若干采样数据,对应多个channel。
每一个采样点得到的若干采样数据组合起来,叫做一个frame。
若干frame组合起来叫做一个packet。
AudioStreamBasicDescription 各字段的含义
mSampleRate
- 采样率,表示录音设备在单位时间内对声音信号进行了多少次采样,常用的采样率有 16000 32000 44100 等
AudioFormatID
采样数据的类型,PCM,AAC等
kAudioFormatLinearPCM = 'lpcm',
kAudioFormatMPEG4AAC = 'aac ',
kAudioFormatMPEGLayer3 = '.mp3',
mFormatFlags
描述AudioBufferList
的格式
kAudioFormatFlagIsFloat = (1U << 0), // 0x1
kAudioFormatFlagIsBigEndian = (1U << 1), // 0x2
kAudioFormatFlagIsSignedInteger = (1U << 2), // 0x4
kAudioFormatFlagIsPacked = (1U << 3), // 0x8
kAudioFormatFlagIsAlignedHigh = (1U << 4), // 0x10
kAudioFormatFlagIsNonInterleaved = (1U << 5), // 0x20
kAudioFormatFlagIsNonMixable = (1U << 6), // 0x40
kAudioFormatFlagIsFloat
是否是浮点数, 没有设置,默认是 int 类型
kAudioFormatFlagIsBigEndian
是否是大端, 没有设置,默认是小端
kAudioFormatFlagIsSignedInteger
是否是 signed int, 没有设置,默认是 unsigned int
kAudioFormatFlagIsPacked
是否mBitsPerChannel 会占满整个通道,如果没有占满, 就会依高位对齐或低位对齐。
没有设置的时候,满足 ((mBitsPerSample / 8) * mChannelsPerFrame) == mBytesPerFrame
的条件,默认会设置此选项。
kAudioFormatFlagIsNonInterleaved
设置 是否是平面类型,是否是交错类型。
音频数据的layout是分交错布局和平面布局,一个双声道音频数据为例则数据有两种布局的可能
- 交错布局:LRLRLR...
- 平面布局:
- 平面1 LLLLLL...
- 平面2 RRRRRR...
mChannelsPerFrame
描述音频文件的声道数。 单声道 1 双声道 2 。这个值不能为0
mBitsPerChannel
每个音频样本的bit位数,1byte = 8bit,一般值为 8 16 32
mBytesPerFrame
每一音频帧中的字节数
计算方法
- 交错布局: mBytesPerFrame = mBitsPerChannel / 8 * mBitsPerChannel
- 平面布局: mBytesPerFrame = mBitsPerChannel / 8
mFramesPerPacket
一个数据包中的帧数,每个packet的帧数。如果是未压缩的音频数据,值是1。动态帧率格式,这个值是一个较大的固定数字,比如说AAC的1024。如果是动态大小帧数(比如Ogg格式)设置为0。
mBytesPerPacket
一个数据包中的字节数,mBytesPerPacket = mBytesPerFrame * mFramesPerPacket
mReserved
填充结构以强制统一 8 字节对齐。必须设置为 0