Licenced & Open Source Voice Codecs in IP Telephony (VOIP) – A brief Introduction

In this article, we will see why its important to select the right voice codec for IP Telephony (VOIP) implementations and a short introduction to some of the popular voice codec’s – G.711, G.729, G,723.1, iLBC, Speex & GSM codec. These are narrow band codec’s.

What is a Voice Codec?

A codec digitally encodes and compresses analog audio signals using complex mathematical models. A codec’s primary role is to seek a balance between the transmission efficiency (bandwidth) and the quality of voice signals. That is to say, transmitting the best quality of digital voice signals at the lowest bandwidth possible.

A codec refers to both coder/ decoder & compression/ decompression.

Why is a Voice Codec required?

A voice codec is required mainly for compressing the digital voice signals so that they can be transmitted across IP networks (including lossy networks like the Internet) with least possible bandwidth and affordable quality. When codec’s are used in IP Telephony, they generally introduce processing delays as complex mathematical formulas are used for encoding/compression are CPU intensive.

Codec’s are especially required in large IP Telephony/ multi-location VOIP installations as each call on the IP network consumes a certain amount of bandwidth and the total bandwidth consumed for uncompressed voice packets could be enormous.

If you are going with a default IP Telephony installation, there is a good chance that you might be dealing with G.711 Voice Codec for all your calls. But its important to note that, on larger/ multi-site VOIP installations it might be better to go with one of the licensed codec’s or even the open-source codec’s that are mentioned below (even if only partially). These codec’s make a considerable difference to the overall voice quality and bandwidth consumption for IP Telephony implementations.

Some techniques used to compress the digital voice data:

  • VAD – Voice Activity Detection: In IP Telephony, both the conversations as well as the silence in between the conversations are digitized. So, we have both – packets containing voice as well as packets containing silence. Using VAD, packets of silence can be discarded after their duration is appropriately marked. So, the total number of packets transmitted after compression is lesser (generally around 30% lesser).
  • CNG – Comfort Noise Generation: This is not a compression technique, but when voice is compressed using VAD, the awkward silence between the speech might be interpreted as lost connection and hence white noise is generated locally at both ends using CNG. This makes the call appear connected to both the parties during silence, as some background noise is audible during that duration.
  • CELP – Code Excited Linear Prediction: In this method, various human sounds are mathematically modeled and a code book of all possible sounds is produced. So, instead of sending the actual sound packets across, only their codes are sent across. This is a very simplistic explanation, and a lot more techniques are involved.
  • In some compression techniques, the headers can be compressed separately like the payload compression which provides additional bandwidth efficiency while transmission.

These are some methods by which voice is compressed (to be decompressed at the other end), and they are by no means comprehensive. But this is given here to get a hang of how voice compression is done in digital networks.

G.711:

G.711 is considered to be the base codec in IP Telephony. It gives the highest quality of voice but takes up largest possible bandwidth. A small compression is provided by a technique called companding, and for this codec CPU utilization is least. This technique is recommended for smaller IP Telephony implementations and in places with large network backbone bandwidths where high quality voice is required. This could also be used in multi-vendor IP Telephony projects.

G.711 uses a technique called Pulse Code Modulation (PCM) to give a bit-rate of around 64 Kbps theoretically. Practically though, after the addition of all overheads, it might consume around 87.2 Kbps per channel. G.711 is susceptible to packet loss when compared with other codec’s. Further, the end to end delay increases with increasing number of concurrent calls as processing delay (due to larger packet sizes) increases with each additional concurrent call that has to be processed by the VOIP Server.

There are two types of G.711 compression – Mu(U) Law, popular in North America and Japan, and A Law popular elsewhere. There are additional types and annexe to this standard that gives more flexibility and variation.

G.729:

G.729 uses a technique called as CS-ACELP (Conjugate Structure Algebraic Code Excited Linear Prediction). This technique of compression, as you might guess, uses complex mathematical formulas and hence takes more CPU processing resources. So, it has the highest value of delay in packetization. G.729 is recommended for many users and heavy data but the quality of voice may not be as good as G.711 compression. The theoretical bit-rate achievable by G.729 is 8 Kbps, and bandwidth of 31.2 Kbps with all overheads. As you can see, that is much lesser than G.711.

G.729 is a licensed codec and is not free. The codec needs to be purchased either by the end user or manufacturer based on the total no. of concurrent calls expected in the network (for example). G.729 has annexes like G.729A, G.729D & G.729E which support 8 Kbps, 6.4 Kbps, 11.8 Kbps bit-rates respectively. G.729D & E provide variable bit-rate (support for multiple bit-rates).

G.723.1:

G.723.1 is a dual rate speech codec supporting bit rates of 6.3 Kbps and 5.3 Kbps with a practical bandwidth consumption of  around 20-22 Kbps per channel. Though it is recommended that both sides transmit at same rate, the codec will still work if one side transmits at 5.3 Kbps and the other side at 6.3 Kbps. G.723.1 has the lowest transmission (propagation) delay among the three of them and hence is suitable for very large installations across unreliable networks with low bandwidth.

G.723.1 provides good immunity against network imperfections like packet loss, lost frames & bit errors. This codec is quite popular with audio/ video conferencing applications as well. This is not a free codec and is licensed. Either the end users or the manufacturers would have to purchase them based on the maximum number of concurrent calls expected in the network.

Open Source and Free Codec’s:

GSM:

GSM is the same as Global System for Mobile Communications that are used by cell phone providers, who are also shifting to IP based networks,  by the way. The bit rate is 13 Kbps and the speech signals are divided in to blocks of 20 ms. This is a free and open source codec with a good compression rate but average voice quality. Since it is already widely used in cell phone communications, large enterprise networks could use them if they are expecting a huge number of concurrent calls within their VOIP networks.

iLBC:

iLBC refers to Internet Low Bit-rate Codec. iLBC results in a payload bit-rate of 13.3 Kbps & 15.2 Kbps with encoding frame lengths of 30 ms & 20 ms respectively. It provides a mix of low bandwidth usage and decent quality especially in lossy networks like the Internet. However, its compatibility with common VOIP systems needs to be checked before implementation and this codec is quite CPU intensive. iLBC is license free and can be used without paying royalty fees. It is also open source. Anyways, its better to check the special terms and conditions given in their site before using this one in your network.

Speex:

Speex is a variable bit-rate codec which can operate with bit-rates from 2.15 to 22.14 Kbps. It can dynamically modify its bit-rate in that range according to the changing network conditions. There are both narrowband and wideband versions of this codec, available free of cost. It is also an open source based codec. Speex uses CELP as its encoding technique and is designed for VOIP and packet based networks. Speex supports a number of features like Intensity Stereo Encoding, integration of multiple sample rates in the same bit stream, Variable Bit rate Operation, etc in addition to the other usual compression features. Be sure to check if you can implement this codec, before purchasing the licensed codecs. Open Source PBX systems like Asterisk offers native support for this codec.

Limitations of Voice Compression techniques:

As mentioned earlier, almost all the compression techniques achieve lower bandwidth transmission by sacrificing some amount of voice quality. This is inevitable. Also, some compression methods may not support DTMF (Dual Tone Multi Frequency) touch functions, Fax Transmissions, High quality audio and MOH – Music On Hold functions. So, it is better to check for these before selecting one.

excITingIP.com

You could stay up to date on the various computer networking technologies by subscribing to this blog using your email address in the box mentioned as “Get Email updates when new articles are published”.