編碼解碼器主要特點-中英對照
Speex的主要特點可以總結有以下幾點:
•免費軟件/開放源碼、免專利費和版權費
•采用嵌入型的比特流來集成窄頻帶和寬頻帶
•適用的比特率的范圍很廣(從2.15 kbps 到44 kbps)
•動態比特率轉換(AMR)和可變比特率(VBR)運算
•聲音活動探測(VAD和VBR整合)和不連續傳送(DTX)
•可變的復雜性
•嵌入式的寬頻帶結構(可擴展的采樣率)
•32kHz的超寬頻帶的采樣率
•強度立體聲編碼選項
•定點的實現
2.3 預處理器
這一部分引用了在1.1.x 分支中介紹的預處理器模塊。預處理器是設計用來在運行編碼器之前來處理聲音的。預處理器提供了三個主要的功能:
•噪音抑制
•自動增益控制(AGC)
•聲音活動探測(VAD)
圖2.1 聲學回音模型
降噪器可以用來減少出現在輸入信號中的背景噪音的數量。不論這降噪以后的信號是不是由Speex來進行編碼,這過程都提供了更高質量的語音。然而,在編碼解碼器使用降噪的信號的時候,都會得到附加的好處。語音編碼解碼器通常(也包括Speex)不能很好地處理嘈雜的輸入,即會傾向于放大噪音。而降噪器則會大大的減少這個影響。
自動增益控制(AGC)是一種用來處理下面這種情況的特性:由于不同的設置之間存在大量的差別,所以記錄的音量可能會有差別。AGC提供了一種將某一信號調節到參考音量的方法。這對于網絡語音電話很有用,因為它免除了人工調節麥克風增益的需求。另外一個優勢怎是通過將麥克風的增益設置到一個保守(低的)水平,從而更容易的避免剪音。
由預處理器所提供的聲音活動探測(VAD)比由編碼解碼器所直接提供的要更加先進。
2.4 自適應抖動緩沖器
當傳送中的聲音(或者就此而言的任何內容)超過了UDP或者RTP,數據包可能會丟失,或者經過不同的延遲而到達,或者甚至發生故障。抖動緩沖器的目的是給數據包重新排序并且使它們緩沖足夠長的時間(但是不會超過所必須得時間)從而使它們能夠傳送從而解碼。
2.5 回聲消除器
在任何的免手持的通訊系統中(圖2.1),來自遠端的語音都是在本地的揚聲器上播放,然后傳送到房間里并且由麥克風所捕捉到。如果由麥克風所捕捉的音頻直接發送到遠端的話,那么遠端的用戶會聽到他自己聲音的回音。因此回聲消除器的作用就是在回聲被發送到遠端之前將其消除。回聲消除器的目的是改善遠端的通話質量,理解這一點是很重要的。
2.6 重采樣器
在一些情況下,將音頻從一種采樣率轉換成另一種采樣率可能是一種很有用的方法。這么做是有很多種原因的。這樣可以混合不同采樣率的數據流,從而來支持聲卡所不能支持的采樣率,或者來進行轉碼等等。這就是為什么現在重采樣器會成為Speex項目的一部分的原因。這個重采樣器可以用來在任何兩種任意的頻率(比率只能是一個有理數)之間相互轉換,并且可以在質量/復雜性之間達到平衡。
2.2 Codec
The main characteristics of Speex can be summarized as follows:
• Free software/open-source, patent and royalty-free
• Integration of narrowband and wideband using an embedded bit-stream
• Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
• Dynamic bit-rate switching (AMR) and Variable Bit-Rate (VBR) operation
• Voice Activity Detection (VAD, integrated with VBR) and discontinuous transmission (DTX)
• Variable complexity
• Embedded wideband structure (scalable sampling rate)
• Ultra-wideband sampling rate at 32 kHz
• Intensity stereo encoding option
• Fixed-point implementation
2.3 Preprocessor
This part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the
audio before running the encoder. The preprocessor provides three main functionalities:
• noise suppression
• automatic gain control (AGC)
• voice activity detection (VAD)
8
2 Codec description
Figure 2.1: Acoustic echo model
The denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality
speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the
codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which
tends to amplify the noise. The denoiser greatly reduces this effect.
Automatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount
between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over
IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the
microphone gain to a conservative (low) level, it is easier to avoid clipping.
The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the
codec.
2.4 Adaptive Jitter Buffer
When transmitting voice (or any content for that matter) over UDP or RTP, packet may be lost, arrive with different delay,
or even out of order. The purpose of a jitter buffer is to reorder packets and buffer them long enough (but no longer than
necessary) so they can be sent to be decoded.
2.5 Acoustic Echo Canceller
In any hands-free communication system (Fig. 2.1), speech from the remote end is played in the local loudspeaker, propagates
in the room and is captured by the microphone. If the audio captured from the microphone is sent directly to the remote end,
then the remove user hears an echo of his voice. An acoustic echo canceller is designed to remove the acoustic echo before it
is sent to the remote end. It is important to understand that the echo canceller is meant to improve the quality on the remote
end.
2.6 Resampler
In some cases, it may be useful to convert audio from one sampling rate to another. There are many reasons for that. It can
be for mixing streams that have different sampling rates, for supporting sampling rates that the soundcard doesn’t support, for
transcoding, etc. That’s why there is now a resampler that is part of the Speex project. This resampler can be used to convert
between any two arbitrary rates (the ratio must only be a rational number) and there is control over the quality/complexity
tradeoff.
2013.2.5