How super-wideband is super-wideband enough?
With the recent discussions of super-wideband codecs (cf. Mats’ post), I had the notion to find, at least for me, at what bandwidth a speech signal would be transparent from its source. Or, more accurately, at what bandwidth I could no longer declare the signal to be non-transparent.
Transparency, in the context of subjective evaluation, means to be indistinguishable from a reference. In a general sense it is ultimately the goal of all lossy compression schemes (of which most audio codecs are examples). The encoded and decoded output of audio codecs such as AAC can be transparent to reference at practical bitrates.
I prepared a small ABX test, in which the listener is presented with a series of unknown signals which they must correctly identify as a reference or alternative. I used a female source sampled at 48 kHz and the source resampled to a series of different bandwidths for the alternatives. For each bandwidth, I used 10 trials, striking a balance between statistical significance and time required for the test.
On to the results! A correct score here means I was able to correctly identify the unknown signal as reference or alternative. (The frequency listed is the signal bandwidth; half the sampling frequency).
8 kHz — 10 correct
10 kHz — 10 correct
12 kHz — 10 correct
14 kHz — 10 correct
16 kHz — 8 correct (still ~95% confidence)
18 kHz — 3 correct
I stopped there, since I was clearly unable to any longer make a distinction. Furthermore, I probably can’t hear much higher than 18 kHz (yet another interesting test!). To be honest, I was very surprised with the 16 kHz result, so surprised in fact that I repeated the test only to arrive at the same result (which merely serves to improve the confidence…). I was, however, using a high quality USB sound device and accurate headphones. To make the score somewhat more relevant to a practical VoIP scenario, I retook the 16 kHz test using my laptop sound card and a cheap headset. This time I scored 5 correct, which is no better than flipping a coin.
All of this should be taken with a grain of salt, but it suggests that i) it’s possible to distinguish between reference and 16 kHz bandlimited speech given the best equipment, and ii) that a typical VoIP user might consider something less such as 14 kHz bandlimited speech to be transparent.





