Every system implementing VOIP/IP Telephony uses an audio codec to compress the audio signals at one end and de-compress the same at the other end. Although most of them are standardised, VOIP vendors implement proprietary codec’s too. Some examples of popular standardized Codec’s include G.723, G.729a etc.
The type of Codec used is an important factor that affect the VOIP call quality as higher the compression, lesser the size of data to be transmitted over the other side. But there is a flip side too – the voice quality generally suffers with higher compression rates. Most Codec’s can accommodate different target compression rates like 8 Kbps, 6.4 Kbps, 5.3 Kbps etc (Standard 64 Kbps required to transmit voice over T1 lines – Single channel, PCM). The bit rates mentioned are for audio only, and protocol overheads is added over that and hence the actual bit rate realized is quite higher.
The Codec’s also introduce a digitizing delay as each algorithm requires a certain amount of data to be buffered before it is processed. If the Codec is very complex to be implemented, more CPU resources would be required and hence this too affects the VOIP call quality.
Network latency is caused both due to the distance that the packet needs to travel and also due to the changing network conditions. More the distance needed for a packet to traverse (Eg. across continents), higher the delay. The delay also depends on the number of router hops that a packet needs to take to reach the destination. Higher the number of hops, more the delay.
The Compression algorithms also cause their own delays. For example, G.723 Codec generally adds a fixed 30 ms delay. The total network latency (two way round trip delay), for the VOIP call to be clear is around 150 ms to 500 ms. Although more than 250-300 ms delay is not preferred for most VOIP systems.
Jitter and Jitter buffer:
When the packets are sent from the Codec after compression, they are sent at a constant rate with equal spacing between them. But when they are received at the other end, the decompression algorithm also expects the packets to arrive with equal spacing between them and in the same order as they were sent. But since network imposes delays at packet level, the packets may arrive at different time intervals and they may not arrive in the same order, as they were sent. To compensate for this, there is a small Jitter buffer at the receiving end, which induces a certain calculated delay before sending the packets for decompressing. The Jitter buffer induces a small delay to collect a certain number of packets for rearranging them in the proper order as well as inducing equal spacing between them before sending them for decompression.
Some of the packets are always lost in an IP Network. It may be due to a lot of reasons like excessive collisions, physical media errors, overloaded links etc. Some protocols like TCP account for such packet losses and allow for recovery of lost packets, while some other protocols like UDP doesn’t allow recovery of lost packets.
The Codec’s perform certain operations to compensate for the lost packets (like using the previous packet instead of the lost packet or perform more sophisticated interpolations to approximate for these losses, etc). Generally packet losses up to 5% are compensated for, and the user may not experience a sufficient degradation in voice quality. But a packet loss of more than 5% might lower the quality of voice or induce noticeable delays.
The packet size poses an interesting situation. If the packet size (RTP) is higher, the overall bandwidth is reduced as more information can be packed in to a single packet and there is a substantial amount of overhead control packets (header information) that needs to be added to every packet that goes out. This overhead control information is almost thrice the size of the original payload packet (RTP) itself! So, it is better that the packet size is bigger, but if the packet size is too big then there is a packetization delay which is induced as the sender needs to wait for some time for filling up the payload.
It is better to send bigger sized packets anyway as the overall bandwidth required is reduced. But that is generally done by increasing the inter-arrival timing so it is better to check if the delay budget allows for it. In certain Point to Point links, cRTP (Compressed RTP) is preferred as it compresses the header information required to send the control signals across. cRTP almost brings down the size of each packet by almost half, but it generates additional processing overload for the routers and used only for certain types of point to point WAN links as it does not contain the IP address information in the packet and hence not rout-able.
Since only one person talks in a two-way communication at any given point of time, it is better not to transfer any packets for the other person who is silently listening. Several vendors take advantage of this attribute to reduce the overall bandwidth required for the transportation of the voice packets across WAN links.
IP Telephony invariably involves the conversion of IP media to analog/digital and vice versa. There is an echo induced due to this conversion at various points in the network. There are two types of echo. Hybrid echo is generated due to the impedance mismatches at the various analog/digital points in the network. Acoustic echo is generated at the phone. It happens as the voice leaving the speaker is picked up by the microphone. It is generally difficult to monitor and contain echo, but certain vendors provide echo cancellation (hardware and software) modules at the gateway level where the translation takes place, to contain echo.
Most of the above parameters can be monitored by specialized tools and adjustments can be made accordingly
The overall network load is one important parameter that determines the quality of voice communications. More the network load, more collisions and lesser quality of transmission. Though this aspect may not always be under the control of the network administrators, following things could be done to increase the efficiency of transportation of the voice packets.
It is always recommended to set a higher priority to the voice packets traversing through the network, than the data packets (like mail traffic, etc). This is because voice/video packets are delay sensitive and even a slight delay might cause a degradation in quality. But even if the data packets like mail (SMTP) are delayed, it doesn’t make a noticeable difference to the user. The prioritization of real time packets needs to be done at every stage of transporting them (like switches, routers, WAN links etc).
Another alternative is to use bandwidth reservation or bandwidth limiting techniques in the network based on the application/protocol. This would ensure that some bandwidth is always reserved for voice packets and the sudden sprout in the usage of certain applications (like P2P) does not interfere with the sending of voice packets over the IP network.
Other parameters like Call set-up times (time taken for initial dialing of digits to establishing a voice connection), Call success ratio (ratio of successful connects to dial attempts) and Call set-up rate (the number of calls that can be set up per second, in the network) are also important factors that affect the VOIP call quality. Other factors like the type of protocol used – like SIP or H.323 may also affect the performance as various processes are handled differently by each of them.
You could stay up to date on the various computer networking technologies by subscribing to this blog with your email address in the sidebar box that says ‘Get email updates when new articles are published’.