Help! I sound like a robot!

August 7th, 2014 by

image of a robotAs our IPS connection was force-upgraded to fibre optics, we are now enjoying the beauties of VoIP phone service. This includes missing calls, temporal inability to make calls, and all the other enjoyments of Internet telephony. Glad I kept the scabby old DECT phone and connected it to the router. That’s the only one that works reliably. I then installed a SIP client software and made my first call. But I sounded like a robot from outer space; nobody understood me. But don’t despair, I figured out the solution.

First of all, if you are also looking for a Internet telephony client software, look no further: just download Jitsi and be happy. I have tried a couple of free and paid phone clients, but Jitsi does them all in. It offers the highest quality audio options on the market, the interface is sleek and simple, and it has the best privacy options on the market. The paid ones don’t give you the competitive edge that you’d expect from a commercial product. Overall it seems there are only two or three client implementations out there, and most of the commercial products seem to just provide a different skin or UI on one of them.

But back to the robot voice problem. When I made a call to another Internet phone, or to a mobile phone, the quality of my voice was just pristine. But when I called a land line, I would sound like a robot, very low voice, dull with all detail removed like in early text-to-speech systems. Unintelligible to anyone. I heard fine what others said, though. The speed of what I was saying was not changed, however. This last aspect made me think. Apparently some data was getting through, but somehow not enough to duly reproduce my voice.

And then enlightenment arrived. Being a telecoms engineer it struck me that if data is lost, the pitch is changed, but the playback speed is right – then it must be a sample rate conversion problem! I checked out the technical call info in Jitsi, and voilà, it was using the G.722 codec at 16 kHz sampling rate.

Jitsi Account Settings Screen ShotSo I opened up Jitsi’s account preferences, and opened the settings for the SIP account on our router through which I make calls to regular phone networks (i.e. land line and mobile). On the encodings tab, I checked “override global encoding settings” (of course I want to keep the high-quality audio for Internet calls), upon which the list of codecs below becomes editable. There, I checked all codecs bearing an “8000” label, and un-checked all others. Made a test call – and my voice sounded just like it should. Problem solved.

The “8000” label means that 8,000 audio samples are generated and processed every second. This is what all land line and mobile phone systems do by default, and what hence our router speaks out to the phone service provider. My client had however produced 16,000 audio samples per second when talking to the SIP server in our router. Hence, our poor little router had to translate between the 16,000 incoming audio samples, and the outgoing 8,000. What it apparently ended up doing was dropping every other audio sample. That nicely explains the lowered pitch, and the maintained speed. The aliasing effects introduced by mismatching sample rates generated the robot sound.

Apparently the mobile networks can handle better quality audio, as my calls with 16,000 samples per second to mobiles worked fine. The 8,000 samples per second seems to be the smallest common denominator that will work with any phone system. Loose a little quality, and gain lots of interoperability in exchange. If you see similar problems, just limit the audio quality to 8,000 samples per second in you client when making calls to regular phone networks via your cheapo DSL router box.