Author Archive for andrew

Security by isolation has an ideal partner in Chrome OS

Andrew MacDonald
Posted by Andrew MacDonald
on November 19th, 2009 in Technology

The GOOG blog announced the open-sourcing of Chromium OS today. Though I’m more interested in the release of Chrome OS proper, this did remind me of what seems to be an overlooked use for it: as a lightweight “virtualized” operating system.

Being naturally skeptical, I’m dubious of the protection offered by anti-virus software. Although such stuff certainly provides some value, it is not the panacea that a typical user might believe it to be. At the least, I want something a bit more proactive. Through reading an interesting interview with a security expert, I was exposed to a completely different strategy to everyday computing security: through isolation. In this context, the strategy boils down to running another guest operating system in a virtual machine on your host OS.

The guest OS is completely unaware of the virtual environment. As far as it knows, it’s running on a physical machine as usual. The isolation idea then, is to perform as many risk-to-security activities as possible on this guest OS. For most of us that largely reduces to web browsing. We don’t store anything important on the guest OS and treat is as throw-away. If and when it becomes compromised, we simply have the virtual machine reset it to a last good state.

You can get as intricate as you want with this idea. The interviewee mentioned above describes using three varying levels of security on her personal systems. The problem you start to encounter is a lack of system resources: running multiple OSes on a single physical machine can become quite a drag. This is where Chrome OS comes in. It’s lightweight and designed to run web applications exclusively, with speed and security. Its features read like a wish-list for our guest OS.

As for the virtual machine itself, there are several to choose from. The most popular are probably those made by VMWare, but I use VirtualBox for its wide platform support, rich feature set, active development and cost (free…).

Chrome, and how to make the internet not suck (as much)

Andrew MacDonald
Posted by Andrew MacDonald
on October 22nd, 2009 in Technology

I’m a fan of Google’s Chrome. The clean interface, fast renderings and frantic pace of development are big draws for me. Linux users will be happy to know that the associated Chromium open-source project is now stable enough for everyday use as far as I can tell and awesome in general. For Ubuntu in particular, you can grab daily builds through apt from the Chromium PPA (look for “Adding this PPA to your system”). This allowed me to ditch the sluggish and horribly rendering Firefox on Ubuntu.

Anecdotally, even a staunch IE user in the office (who will remain nameless) switched to Chrome recently, saying “I just couldn’t take it anymore”.

I suspect that a large portion of potential adopters of the browser are current Firefox users. A common transition concern for these users is the loss of the rich collection of extensions that has evolved around the Firefox platform. The popular AdBlock is one extension that I missed at first. I soon discovered, however, that there are other, and possibly better, ways of achieving the same effect.

I’ll present two options, the most sophisticated of which is certainly Privoxy. This software operates a proxy server on your machine. You simply instruct your browser to use this local proxy for all http traffic and Privoxy takes care of the rest. It has a large feature set to maintain privacy and generally improve the browsing experience, possibly the most interesting of which for casual users is ad blocking. I found the out-of-the-box configuration to be effective for this, but gave a spot of trouble to some of my commonly used websites. It takes some tweaking of the configuration to restore these to health.

A less elegant but easier solution is to modify your hosts file. This is used by your OS to map hostnames to IP addresses. Populating the file with known malicious and ad-serving hosts can have requests for them redirected to an invalid address (0.0.0.0 is common), preventing your browser from connecting to them at all. Several services and altruistic folks maintain hosts files for download. One I like is appropriately entitled “how to make the internet not suck (as much)“.

The challenges of server-based AEC

Andrew MacDonald
Posted by Andrew MacDonald
on September 10th, 2009 in Technology

Mr. Brunberg, my sales engineering colleague who suggested the topic of this post, informs me that server-based acoustic echo control (AEC) is a frequent source of inquiry. It’s a technically feasible task, but not without its share of difficulties. This post will provide an overview of the associated challenges.

First, allow me to better qualify what I mean when I say server-based AEC. During a voice call, acoustic echo is generated at client endpoints when the playout and capture devices are acoustically coupled (i.e. can “hear” each other). It would typically be the role of the client to process the captured stream to remove the echo. Server-based AEC proposes to do this processing somewhere in the network at a mediating server.

There are several reasons it’s preferable to handle AEC at the endpoints:

Complexity

Echo control is a computationally intensive task, and adopting the philosophy of the internet, we would like to distribute complexity by pushing it to the endpoints of the network whenever possible. Significant load would be added to a server required to perform AEC on every conference channel.

Intermediary processing stages

A client-side AEC sees the capture signal early in the processing chain, usually immediately after being provided by the system. Assuming that saturation has been avoided in the analog-to-digital conversion stage, the only degradation to the signal should be due to the actual acoustic channel. At the server, however, several additional processing stages will have been performed. The degradation this causes to the signal affects the quality of echo control possible. Refer to the diagram below to follow the signal path.

block_diagram

First assume we have access to a signal which has been received from one client and is destined to be played out at another. This is the farend signal. It’s encoded at the server, transmitted over the network, and decoded at the client. The client jitter buffer might perform some time-stretching as it adjusts its buffer size. It’s then played out and the echo is captured (the unavoidable degradation also seen by a client-based AEC). The captured signal might undergo various speech enhancement processing, such as noise suppression. We then have another encoding/decoding and jitter buffer processing at the server before we finally receive the nearend signal to provide to the AEC.

Delay estimation

Perhaps the most crucial problem for server-side AEC is delay estimation. As described more fully in an earlier post, the farend and nearend signals must be synchronized in time. This is necessary for the filter to adapt to the channel and provide an estimate of the echo. An AEC operating at an endpoint must compensate for the system render and capture buffers to provide this synchrony. A server-based AEC must go further yet. The network, jitter buffer and additional processing stages such as encoding/decoding add latency between farend and nearend signals which must be accounted for.

The networking delay can be obtained from the round trip delay supplied by RTCP sender reports. The jitter buffer at the server can report its delay to the AEC. However, unless signaled somehow to the server, the latency of the client jitter buffer, render and capture buffers, and other algorithmic latencies are unknown. A standard AEC can handle a latency offset up to some proportion of the length of its adaptive filter. So there is some play here, and by sinking enough complexity it might be possible to operate without knowledge of these unknown latencies. Another approach is to use a dedicated delay estimator operating on the signals themselves. This stage should be of lower complexity than the AEC filter (which provides implicit “delay estimation” itself) allowing us to avoid an excessively complex solution.

Relaying

What if the server is simply relaying packets and has no access to the decoded data? I have read about some attempts to perform echo control in the coded domain, but I think the best you can hope for is a very crude half-duplex suppressor. More likely, the relaying server would be forced to decode the signal in order to remove the echo before re-encoding and sending the packet on its way.

As we’ve seen there are a number of challenges to server-based AEC.  Although these are not insurmountable, it’s certainly clear that we should perform echo control at the endpoint whenever possible.

Cross-platform audio development

Andrew MacDonald
Posted by Andrew MacDonald
on July 23rd, 2009 in Technology

Writing a cross-platform application is certainly a non-trivial task. When the app must interface with hardware such as the audio system,  the difficulties grow. OS abstraction layers attempt, with varying degrees of success, to hide hardware dependent behaviour. Almost inevitably though, there will be hardware-specific peculiarities that will have to be dealt with. The abstraction layers themselves, such as CoreAudio on Mac OS X and DirectSound on Windows, have different interfaces and design philosophies which must be considered. There may even be a requirement to support multiple abstraction layers on a single OS such as OSS, ALSA and PulseAudio on Linux.

The demands of real-time audio add further complexity. To minimize end-to-end latency, the I/O buffers must be kept as small as possible without introducing under/over-runs. This forces the app to meet strict scheduling demands. Additionally, and as discussed at greater length in another post, echo control requires obtaining precise timing information from the OS. Part of the value offered by our VoiceEngine product is in handling these tasks across an array of PC and mobile platforms.

There are several cross-platform audio libraries such as libao and OpenAL. In particular, I am following the development of PortAudio with interest, as it seems the most suitable for real-time audio apps. The full audio hardware functionality required by VoiceEngine isn’t specified in the interface, and some of that which is hasn’t yet been implemented on every supported platform. However, it looks like it can offer a good foundation for the core functionality of real-time audio projects.

Google voice + Gizmo

Andrew MacDonald
Posted by Andrew MacDonald
on June 29th, 2009 in Technology

Google Voice has been much discussed with its impending widespread public release. I gave it a whirl over the weekend, and it seems quite nifty. To tie in loosely with my last post about novel methods of reducing mobile phone costs, it seems you can use Google Voice and Gizmo to make free PSTN-terminated calls.

Gizmo is a free VoIP client, which can be installed on a variety of PC and mobile platforms and provides you with a SIP number. Google Voice can be directed to forward and even connect calls to this number. Since domestic calls made through Google Voice are free, we seemingly have a neat system of free calling here.

Some instructions can be found on a Google forum, though some users in the thread report problems connecting the calls to Gizmo. Very interestingly, a blogger describes using this method to make free calls on his Android phone over the 3G/EDGE network.

Perhaps free-as-in-speech VoIP nirvana is not as far off as I had feared…

Circumventing your telco?

Andrew MacDonald
Posted by Andrew MacDonald
on May 28th, 2009 in Technology

I read an interesting post on Slashdot yesterday about an experiment to use a WiMAX-connected netbook as a phone. Given that a) I personally find current 3G data plans a little too close to highway robbery, and b) mobile hardware and OS providers are marginalizing VoIP applications (as discussed here), I find notions of this sort intriguing.

With Clear rolling out WiMAX connections in US cities, the idea is technically feasible. Predictably however, as mentioned in the story’s responses, there are some problems. Battery life, spotty coverage and having to carry around an enormous excuse for a mobile phone are a few of the more significant examples (though if you regularly carry a laptop anyway, this last point can be largely overlooked). There are also some more pedestrian hardware issues like getting your OS to stay active when you close the netbook’s lid.

One reader even suggests that Clear in Seattle is blocking Skype traffic to support their own VoIP service? [unsubstantiated]

Looks like free-as-in-speech VoIP nirvana may not yet be upon us.

Finding (and keeping) your hearing range

Andrew MacDonald
Posted by Andrew MacDonald
on May 14th, 2009 in Technology

In a previous post, I suggested I likely couldn’t hear sounds much higher than 18 kHz. As luck would have it, a colleague shortly thereafter directed me to a blog post which allows you to test the upper limit of your hearing. It contains a series of tones; listening to them in progression will demonstrate the point above which you lose hearing. I surprised myself by being able to hear the 19 kHz tone. Arguments could be made about the validity of this method, but it’s interesting nonetheless.

A significant determining factor in hearing range is, of course, hearing damage we’ve sustained. Because we are biased towards conflating effect with observability, hidden deterioration such as hearing damage can be easily overlooked.

But it’s never too late to start saving your hearing! I’ve read that the real killer is chronic exposure to high but not necessarily uncomfortable noise levels, such as that experienced in some workplaces. Presumably (and according to law) the employer has taken the appropriate protection measures.

For the rest of us, loud music is probably the largest concern, whether inflicted at a concert or from our iPods. To address the former, you should carry earplugs to a show. If you actually enjoy the music you’re listening to though, cheap foam earplugs aren’t the ticket: they exhibit a frequency response far from flat, resulting in muffled sounds. For a more enjoyable experience, try a step up like these Etymotics, which have a much flatter response.

When listening on headphones, being conservative about setting the volume is usually protection enough. However, in high noise situations, such as on the train to work or a plane, we often unconsciously raise the volume to dangerous levels in order to hear the music over the noise. The solution again comes in the form of ear plugs; in ear headphones suppress the unwanted noise, allowing you to listen at a safe level. Headroom has a good selection.

Dedicated VoIP and the new USB mics

Andrew MacDonald
Posted by Andrew MacDonald
on April 2nd, 2009 in Technology

Although we generally recommend using a headset for VoIP calls, I prefer using separate playout and capture devices. One of the barriers to adoption of computer-based voice communication is the lack of a dedicated channel. If my phone rings, I pick up the receiver and talk. If my computer “rings” it could be that I first need to fool around ensuring the correct devices are connected and will be used by my software. As a result, I want my VoIP devices to align as closely with my usual computing habits as possible. In this way I can get close to having a dedicated channel. Since I’m often wearing a pair of headphones, I use a standalone mic for VoIP.

I’ve embarked on a bit of a search for an ideal mic for this purpose, so allow me to share a few tips I’ve found. The first, is that unless you’ve taken special care with your sound card, it’s not unlikely that it will suffer some electrical noise. The inside of a computer is a busy place electrically, particularly in a laptop. Which brings up a second point: avoid using your built-in laptop mic. They tend to be perfectly positioned to pick up your laptop’s fan noise, which often kicks into high-gear during a video call.

So what should you use? USB mics seem to be able to deliver the most consistent quality. Their electronics are isolated from the noisy computer innards and most computers are now equipped with a plethora of USB ports. This ensures the device is always connected and brings us closer to the desired dedicated channel. On to a recommendation then, a basic, affordable and time-tested mic from Logitech.

Mostly in response to the number of budding podcasters (at least I assume based on the marketing material) some higher quality USB mics are now appearing on the scene. VoIP adopters can benefit from this trend. Especially when coupled with higher bandwidth calls, a good mic can evoke a real lifelike voice quality. Blue Microphones, known for their iconic mic designs, has entered this consumer-grade USB mic field. This includes the Snowball, but of possibly more interest to VoIPers is the portable Snowflake. For conference use we’ve had good results with the Samson UB1. I can only hope this trend of decent and reasonably priced USB mics continues.

A few caveats: i) I’ve discovered many USB mics have sometimes insufficient gain. This simply means you don’t want to be too far from them for best results. ii) Although convenient and USB-based, I normally caution against using the mic built into your webcam. They tend to be of lower quality and suffer from clock drift. In my experience, dedicated USB mics are less susceptible to the latter phenomenon.

On a broader and related note, Citrix has a comprehensive overview of audio best practices for VoIP here.

How super-wideband is super-wideband enough?

Andrew MacDonald
Posted by Andrew MacDonald
on February 26th, 2009 in Technology

With the recent discussions of super-wideband codecs (cf. Mats’ post), I had the notion to find, at least for me, at what bandwidth a speech signal would be transparent from its source. Or, more accurately, at what bandwidth I could no longer declare the signal to be non-transparent.

Transparency, in the context of subjective evaluation, means to be indistinguishable from a reference. In a general sense it is ultimately the goal of all lossy compression schemes (of which most audio codecs are examples). The encoded and decoded output of audio codecs such as AAC can be transparent to reference at practical bitrates.

I prepared a small ABX test, in which the listener is presented with a series of unknown signals which they must correctly identify as a reference or alternative. I used a female source sampled at 48 kHz and the source resampled to a series of different bandwidths for the alternatives. For each bandwidth, I used 10 trials, striking a balance between statistical significance and time required for the test.

On to the results! A correct score here means I was able to correctly identify the unknown signal as reference or alternative. (The frequency listed is the signal bandwidth; half the sampling frequency).

8 kHz — 10 correct
10 kHz — 10 correct
12 kHz — 10 correct
14 kHz — 10 correct
16 kHz — 8 correct (still ~95% confidence)
18 kHz — 3 correct

I stopped there, since I was clearly unable to any longer make a distinction. Furthermore, I probably can’t hear much higher than 18 kHz (yet another interesting test!). To be honest, I was very surprised with the 16 kHz result, so surprised in fact that I repeated the test only to arrive at the same result (which merely serves to improve the confidence…). I was, however, using a high quality USB sound device and accurate headphones. To make the score somewhat more relevant to a practical VoIP scenario, I retook the 16 kHz test using my laptop sound card and a cheap headset. This time I scored 5 correct, which is no better than flipping a coin.

All of this should be taken with a grain of salt, but it suggests that i) it’s possible to distinguish between reference and 16 kHz bandlimited speech given the best equipment, and ii) that a typical VoIP user might consider something less such as 14 kHz bandlimited speech to be transparent.

Practical concerns in acoustic echo control for PC

Andrew MacDonald
Posted by Andrew MacDonald
on January 29th, 2009 in Technology

Echo control differs from many other audio processing tasks in that it depends on two streams: the farend (audio to be played on the speaker) and the nearend (audio recorded from the microphone). In a hands-free call, the nearend typically contains an echoed version of the farend. In order to identify and remove this echo component from the stream, the signals must be time-aligned in some fashion. It is this need for time-alignment that is at the root of many of the practical difficulties in acoustic echo control (AEC) that are not apparent for other tasks such as noise suppression and coding which operate on only a single stream.

There are two crucial and sometimes unmentioned factors which contribute to this time-alignment problem for AEC on a PC platform:

1. The AEC will be running on a non-real-time operating system. This means that processing is not guaranteed to take place at any particular time, but will instead be performed in some kind of best-effort manner. The effect is that the delay between the farend and nearend signals is unknown a priori and will probably change over time. When the CPU is heavily loaded this is aggravated; it’s even possible that buffers will overflow and we’ll lose some of the stream data. It’s necessary to compensate for this delay to achieve our desired time-alignment.

2. There is a wide array of available hardware devices which can be used in combination. Recording and playout devices (often soundcards, but alternately webcams and other USB devices) run on hardware clocks just as the CPU does. This controls the rate at which data is recorded or played out. If these clocks differ, and data is recorded at a different rate than it is played out, the farend and nearend streams will drift away from each other. This phenomenon is aptly labeled clock drift. Again, it’s necessary to compensate for this effect to achieve time-alignment.

The wide variety of hardware leads to another issue. In any practical scenario there will be some amount of non-linear distortion in the echo path. Poor or overdriven speakers and microphones are usually the cause. This type of distortion can be heard for instance if a user speaks very loudly into the microphone, causing signal saturation. The traditional echo canceller uses a linear filter to remove echo, which by definition cannot model this distortion. An effective algorithm must be prepared for this eventuality.

In the literature these practical considerations are sometimes not given their due weight. An AEC algorithm that performs well “in the lab” can be surprisingly underwhelming in a real scenario. It is therefore important that we use actual field recordings including time-alignment mismatches to test performance.  

GIPS relies on an integration between our AEC algorithm and VoiceEngine’s cross-platform sound device handling to effectively contend with these practical considerations.