Sequoia Mobile Technology

Designers’ Frequently Asked Questions is intended as a resource to share recent information and design tips with developers and program managers.

Topic List:

Voice System Development

Q1: How do I decide which voice development environment is best for my application?
Q2: How can I maximize the number of Asterisk channels handled by a single processor?
Q3: What should I keep in mind when I design an IVR menu?
Q4: How do I make a T1/E1 crossover cable?

Q: How do I decide which voice development environment is best for my application?

A: The answer to this question depends on several factors, including:

(a) The complexity of your application

(b) How fast you must get it deployed.

(d) Your staff’s telephony development experience

There are two main flavors in voice development environments, GUI (graphical user interface) and shell/program-oriented (ie: text-based), and a number of choices in each category. Over the last few years at SMT, we’ve been working primarily in two environments that we like: Envox (www.envox.com) and Asterisk (www.asterisk.org). Envox’s GUI approach is very nice when: (a) you have a fairly complex app to develop; (b) you don’t have much time; and (c) when the value of each transaction is relatively high. I note the issue of value because working with Envox means you have to purchase licenses for the software and (usually) fairly expensive hardware from Dialogic and others. But, the GUI approach and the availability of built-in functions for an extraordinary number of tasks mean that you really can just grind out the application quickly, and often without the need to write external code in C, VB, Perl, etc.

On the other hand, the rapid growth and maturation of Asterisk in the past year or two has changed the cost model for many applications dramatically. Originally conceived as an open-source telephone PBX software package, the software has grown to accommodate a large number of built-in telephony and IVR (interactive voice response) functions. Voice over IP (VoIP) has been built-in right from the start, rather than as an after-thought as in many other environments. Most importantly, Asterisk embodies the design philosophy that most of the processing power needed to do the job should reside in the main computer, not in the peripheral cards. This allows systems that connect to the switched telephone network to be implemented using inexpensive cards from Digium, Sangoma, and other vendors – at a fraction of the cost of Dialogic cards that perform the same function.

Implementing a system in Asterisk may be daunting for some users because it means getting your feet wet in the Linux environment and possibly implementing more complex tasks in languages like Perl and ‘C’. The reward, though, is a system that can be built relatively inexpensively and yet handle large call loads and complex tasks like transcoding for VoIP fairly effortlessly.

Finally, if your budget allows and the task is complex, you might consider prototyping your application in Envox, tweaking it with your trial customer(s), and later rolling it out in large scale using Asterisk.

Q. How can I maximize the number of Asterisk channels that can be handled by a single processor?

A. Scaling an Asterisk application larger than 120-channels (four E1’s) or so usually means at least thinking about a multi-processor architecture. Although recent advances in processor power and design improvements in both the Digium and Sangoma interfaces have improved performance and channel capacities beyond this limit, system resiliency should also be considered. In an asterisk environment, if the processor fails, everything stops instantly – all the conversations are cut off and the telecom interfaces go down. A bit of redundancy is cheap insurance.

That being said, there are several design considerations to keep in mind when trying to maximize the channel capacity of a single Asterisk box, including:

(1) In our experience, one significant processor drain in an asterisk environment is “transcoding” – changing from one voice encoding protocol to another, for example from GSM to aLaw PCM, or vice-versa. So, if bandwidth is cheap, for example in routing a call that arrives from the local telco via an E1 channel in one asterisk system out to another local machine, it’s better to use the native encoding of the telco network, either uLaw or aLaw PCM and carry the voice all the way through without re-coding it. Another example is if an asterisk-based IVR system is to play prompts over an E1 connection, the prompts themselves should be recorded in raw PCM, again either aLaw or uLaw, to maximize efficiency.

(2) Use AGI (“asterisk gateway interface”) only when you have to. The Asterisk dialplan (see http://www.voip-info.org/tiki-index.php?page=Asterisk for more info) includes many commands to do just about everything you can think of in a PBX. Dialplan command execution is implemented via a “state machine” architecture which means that it runs efficiently. If you have something that can’t be accomplished by the dialplan, you can write an AGI program instead, and call it conveniently from the dialplan - which works great. However, keep in mind that every time an AGI is called from the dialplan, a new process is created for your AGI program, which runs until you’ve done what you have to. So if you’ve got 78 calls in your system at the same time, you’ve got 78 new processes running! We’ve found that the Linux kernel quickly tires of all this state switching when you’re running a large number of AGI programs alongside Asterisk. The bottom line, for us anyway: we don’t use AGI’s when we have a lot of lines (more than 80 or so) running the same code on one processor.

At EVT we’ve had better luck by using dialplan commands in combination with a database interface, so that other processes can access dialplan data to get things done outside the dialplan. Another approach, if you’re doing something that’s just slightly different than the way a standard dialplan command works, is to consider making a change to the Asterisk code itself, assuming you feel comfortable with ‘C’ programming!

(3) ...and there other little things to remember, like turning off any graphical interface in your Linux setup, and anything else that generates lots of interrupts.

Q: What should I keep in mind when I design an IVR menu?

Sometime in the late 80's, I came across a book (subsequently lost!) that laid out principles of IVR menu design - sort of like the audio equivalent of Apple's superb Human Interface Guidelines manual of the same era. Although IVR design principles have evolved with the introduction of new technologies like speech recognition, the underlying well-accepted design rules still apply. The following may be of interest to people just getting into IVR menu design:

A Few General IVR Design Rules
(a) Keep the number of menu choices low (and never greater than 5) - People simply can't remember more than this many. Rather than having long lists of choices, separate complex menus into a hierarchy of simpler menus, with related categories grouped together.

(b) Say each menu choice first, followed by what you press. Example: "For customer service, press 1. For sales, press 2...etc", although slightly wordier, is easier to follow and remember in a menu than "Press 1 for customer service, 2 for sales...". The only exception to this might be: "Press 1 for Yes, 2 for No" because it's very short and faster to say it this way.

(c) Don't require a pound/hash ("#") terminator for entries, except when accepting a variable-length field like a password, but allow one anywhere if they happen to enter it by habit, and ignore it if appropriate. Even in a variable-length field, either a pound/hash entry or a few second timeout should work to terminate the entry. For a variable-length entry, be sure to tell them that they should enter a pound/hash, so they save time.

(d) Be polite, but don't overdo it. Say "Please" in the first prompt, but don't make things unnecessarily wordy by saying it in every prompt.

(e) Allow dialing ahead in all but the most complicated situations. When someone presses a digit during a prompt, it should stop the prompting and let them finish the current input.

IVR Prompt Preparation
The tone and wording of prompts can have an important effect on how your system (and your company) is perceived. It's not really necessary to use a professional recording artist to achieve good results, but attention should be paid to the selection of voice, and the tone and cadence used. "Friendly, efficient, and forgiving" is the tone that we look for at EVT in our customer recordings.

Prompts should be recorded using a good quality, uni-directional microphone, to minimize background noise. When using recording software, we suggest that you record at a higher quality (higher sample rate), and later convert it to whatever telephone-quality rate you desire. For normal telephone prompting, a good encoding choice for the final prompts is "Raw PCM, 8000 Hz", with either MuLaw (US and some asian countries) or ALaw format (Europe and most other countries).

Good recording software should have a "level equalization" (normalizing) feature, so that minor level differences in the recordings can be automatically adjusted for. For best results, prompts will often have to be trimmed at the beginning and at the end to eliminate silence - either manually or automatically by the software. When recording digits that will be strung together in various ways, we suggest recording them in one recording ("1...2...3...etc"), and then cutting them up with voice editing software. This achieves a more consistent speed and intonation when they are used together.

Q: How do I make a T1/E1 crossover cable?

A: E1 and T1 connections only use pins 1, 2, 4, and 5

In a straight-through cable, which is usually what’s required when you connect your voice card to your T1/E1 provider, those four pins are wired end-to-end, with the same pins at both ends. Although longer cable runs should have pins 1 and 2 in one twisted pair and 4 and 5 in the other pair, shorter runs can use a standard CAT5 ethernet cable instead. This cable has the proper pins connected, but 4 and 5 are not in a twisted pair which would upset the balance of the signals in a longer run (over 5 meters).

A crossover cable is usually used to connect one voice system to another voice system, or possibly to an on-site PBX. Although the same pins are used, the wiring is different:

Pin 1 --- Pin 4

Pin 2 --- Pin 5

Pin 4 --- Pin 1

Pin 5 --- Pin 2

Note that you can not use a crossover CAT5 cable as a crossover E1/T1 cable – the pins are not connected properly. You either have to buy one or make one. For shorter runs, I usually cut off one end of a regular CAT5 cable and wire the other end properly with the help of a multi-meter and my trusty crimping tool.