Content area
Programming new Internet telephony services requires decisions regarding such things as where the code executes and how it interfaces with network protocols. The authors propose a CGI solution for trusted user/developers and the Call Processing Language for untrusted user/developers.
PROGRAMMING INTERNET TELEPHONY SERVICES
INTERNET TELEPHONY
Programming new Internet telephony services requires decisions regarding such things as where the codeexecutes and how itinterfaces with network protocols. The authorspropose a CGI solution for trusted user/developers and the Call Processing Language for untrusted user/developers.
JONATHAN ROSENBERG
Bell Laboratories JONATHAN LENNOX AND HENNING SCHULZRINNE Columbia University
Internet telephony enables a wealth of new service possibilities.1 Traditional services, such as call forwarding and 800 numbers, can be enhanced by integrating them with e-mail, the Web, instant messaging,
and directory services. Users can have calls redirected to Web pages, use streaming media tools to record voice mail, use instant messages in place of call waiting notifications, or have call logs reported via e-mail. IP telephony can offer improved speech quality through advanced speech and audio codecs. Communications can encompass not just voice, but video, shared applications, and even virtual reality. Much more powerful user interfaces can make these services easily accessible. Gateways to the public switched telephone network (PSTN) can extend these services to traditional landline phones, cellular phones, and pagers.
With such a wide range of services possible, it becomes critical to provide a means for rapidly conceiving, developing, and deploying them. It should not be necessary to add new network elements for each new service, nor to reinvent the interfaces to existing elements. In addition, it should be easy for third parties to create new services. By third parties, we mean individuals or organizations besides those that own or build the routers, hubs, switches, and servers that actually implement the service. This separation allows end users flexibility: they can purchase different services from different providers.
In this article we consider this problem in more detail. Since the services are ultimately realized through Internet telephony signaling protocols, we discuss them first. We then discuss the design decisions necessary in developing a programming mechanism, and argue that two solutions are required. We propose our solution for service creation by trusted users the common gateway interface (CGI)and for service creation by untrust-
IEEE INTERNET COMPUTING 1089-7801/99/$10.00 1999 IEEE http://computer.org/internet/ MAY JUNE 1999
63
I N T E R N E T T E L E P H O N Y
Location service
university.edu
SIP proxy forcs.university.edu
User agent
User agent server [email protected]
Figure 1. A Session Initiation Protocol (SIP) transaction begins when the caller (user agent client) forwards an INVITE request to a local SIP proxy server (1).
ed usersthe Call Processing Language (CPL). We conclude by summarizing and highlighting some of the open issues.
SIGNALING PROTOCOL
There is a strong connection between programming IP telephony services and the protocols used to deliver them. Delivery of IP telephony services requires a number of different protocols, including transport protocols, such as the Real-Time Transport Protocol (RTP),2 used to carry voice on IP networks; quality of service (QoS) protocols, such as the Resource Reservation Protocol (RSVP)3 and differentiated services,4 used to provide low delay and loss for voice transport; and directory access protocols, used for user location and policy. Most important, however, are the signaling protocols.
Signaling protocols are used to set up and tear down calls, carry information required for them to progress (such as media codecs and addresses), locate users to be called, negotiate capabilities, and invoke services like hold, mute, and transfer. Since these protocols are what ultimately provide services, understanding them is key to programming services.
There are three protocols currently used for signaling IP telephony services: H.323,5 the Media
Gateway Control Protocol (MGCP),6 and the Session Initiation Protocol (SIP).7H.323 was developed in the International Telecommunication Union (ITU). It was originally conceived for multimedia conferencing on a LAN, but has since been extended to cover Internet telephony. It provides call control, conferencing functions, call management, capabilities negotiation, and supple-mentary services.
MGCP is a control protocol, allowing a central coordinator to monitor events in IP phones and gateways and instruct them to send media to specific addresses.
SIP was developed in the Internet Engineering Task Force (IETF) and engineered for light-weight distributed call control and capabilities negotiation.
SIP offers many advantages as a platform for programming telephony services. Its clean request-response model is amenable to simple programming, and its textual formatting and simple header structure make it easy to use text processing languages, such as Perl, and textual interfaces, such as CGI, for developing services. Finally, its ability to work in a fully distributed fashion, avoiding routing loops and maintaining consistent behavior across servers, helps avoid feature interactions when programming services. (Schulzrinne and Rosenberg8 compare SIP and H.323; a second paper offers a detailed description of SIP.9)
Figure 1 depicts a typical SIP transaction. The caller, also known as the user agent client (UAC), creates an INVITE request for some user, sip:[email protected]. This request is forwarded to a local SIP proxy server (1). The proxy looks up company.com in the Domain Name Service (DNS), and obtains the IP address of a server handling SIP requests for this domain. It then proxies the request to this server (2). The redirect server for company.com knows about user joe, but this user is currently logged in [email protected]. The redirect server at company.com can know this through a static configuration, database entry, or dynamic binding set up by the user using a SIP REGISTER message. Thus, the server redirects the proxy (3) to try this address. The local proxy
64 MAY JUNE 1999 http://computer.org/internet/ IEEE INTERNET COMPUTING
I P T E L E P H O N Y S E R V I C E S
looks up university.edu in DNS, and obtains the IP address of its SIP server. The request is then proxied there (4). The university server consults a local database (5), which indicates (6) that [email protected] is known locally as [email protected], so the main university server proxies the request to the computer science server (7). This server knows the IP address where the user is currently logged in, so it proxies the request there (8). The user accepts the call, and the response is returned through the proxy chain (9), (10), (11), (12) to the caller. The caller then acknowledges receipt of the response (not shown), and media can flow directly between the parties.
PROGRAMMING SIP SERVICES
The key to programming Internet telephony services with SIP is adding logic that guides behavior at each of the system elements. In a SIP proxy server, this logic would dictate where the requests are proxied to, how the packet is formatted, and how the responses are processed. For example, a simple service such as call forwarding based on time of day would require logic in the SIP server to obtain the time when a call setup arrives and, based on the time, to proxy the request to a chosen destination. In general, the logic can direct the servers actions based on many inputs: time of day, caller, call subject, session type, call urgency, media composition, data obtained from Web pages, and data obtained from directories. The logic may also instruct the server to generate new requests or responses.
Logic can also be added to user agents (that is, end-system software). However, user agents are usually owned by end users rather than network service providers, so providing logic for them is a different problem. The breadth of platforms used, the security implications, and the trust models are substantially different. For this reason, we consider only network servers for the remainder of this article. However, the approaches for programming services advocated herein could also be implemented in a user agent server (UAS).
The basic model for providing logic for SIP services is shown in Figure 2. A SIP server has been augmented with service logica program responsible for creating the servicesand an interface exists between the two. When requests and responses arrive, the server passes information to the service logic, which makes decisions based on this information and information gathered from other resources, and passes instructions back to the server. The server then executes the instructions.
Filling in the details of this model require answers to a number of questions:
n Where does the logic reside?
n When does the logic execute?
n What are the restrictions on the resources available to the program?
n What information about the SIP messages are provided to the program?
n What level of control does the program have
over the servers execution?
There is no one solution that addresses each of these issues. In particular, the solution for the last three issues depends greatly on the level of trust between the server and the program. If the level of trust is low (as it may be with consumer-defined logic), very specific structured information should be passed from the server to the program, and a very narrowly defined set of controls should be exposed to the program. This restricts the set of services that can be defined, but increases the level of server security and ensures that the program cannot perform malicious operations or cause the server itself to crash. For trusted users, such as administrators or privileged users in a corporate environment, the trust levels are higher and greater flexibility is warranted.
Program Location
The service logic can reside either on the servers themselves or in special computers separate from the servers. In the latter case, the interface between the server and the service logic requires some protocol. It can be a special-purpose protocol or some form of remote procedure call (RPC). The interface can also be through a distributed computing platform, such as Common Object Request Broker
Service logic
Programming interface
Requests
Responses
Requests
Responses
SIP server function
Figure 2. Model for programming SIP services. A SIP server is augmented with service logic.
IEEE INTERNET COMPUTING http://computer.org/internet/ MAY JUNE 1999
65
I N T E R N E T T E L E P H O N Y
Architecture (CORBA) or Distributed Computing Object Model (DCOM). This allows the service logics location to be independent of the interface.
When the service logic and server are co-resident, their interface can be a simple application programming interface (API). Placing the logic in
When the service logic and server are co-resident, their interface can be a simple API.
an external server has many advantages. For example, it increases security since malicious or buggy code that crashes has less effect on the physically separated servers. Multiple computers can execute the logic for a single server, which provides load balancing and improves scalability. On the other hand, executing the service logic on the same server simplifies the interface. Network issues such as loss, delay, and encryption can be ignored. Execution time for the logic also improves, since it is not necessary to traverse a network.
Program Invocation Times
Not all services require the service logic to be consulted for every event or message received. A large class of services require the logic to execute only when the initial INVITE message is received. Subsequent message-processing rules can follow standard procedures defined by the protocol itself. Furthermore, some calls will not require any services at all, in which case the SIP server should behave as it normally would, not consulting the service logic at any point. It is necessary, therefore, to have some means of specifying the point when service logic is executed. The execution points can be either defined by some administratively set policy or controlled dynamically by the service logic.
A related issue is whether or not the service logic is persistent. If it runs as a separate process, it can remain active for the duration of the call (and beyond), and therefore be persistent, which would mandate an asynchronous interface between the logic and the server. It also introduces cleanup issues. Protocol or server errors could cause a particular calls service logic process to remain active long after the call is over. Some means of cleanup is needed to destroy these old processes. The advantage is that the service
logic can pass control instructions back to the server at any time, rather than depending on the server to execute the service logic only on specific events. This enables numerous services (such as the click-to-dial service defined in Petrack and Conroy10) that would otherwise be impossible to support.
As an alternative, the service logic can be executed synchronously. When the server receives a message, it executes the logic. The logic passes the control information back to the server and ceases execution. This is most easily accomplished by having the service logic executed as a function call from the server. However, it can also be executed as a separate process that terminates once the control information is passed back to the server.
Resource Restrictions
The service logic can have access to a large number of resources. On the Internet, these include name services (that is, DNS), Web pages, directories, mail servers, media servers, QoS controls, policy repositories, presence systems, and instant messaging service, to name a few. The logic can also have access to resources on other networks, such as the telephone network. The ability to query 800-number databases, for example, would allow migration of free phone services to the Internet.
With a breadth of resources comes a wide range of failure modes. The likelihood of bugs, malicious actions, and unusual and untested scenarios increases. The right operating point, as we have indicated above, depends on the level of trust between the server and the logic. For end-user-defined services, access to resources will often need to be restricted. For administrator-defined services, they should be more flexible.
Interface
The server will need to pass information about the SIP transaction, including message information and call states, to the service logic. This information may range from brief to verbose. In the abbreviated case, only the message types (whether INVITE, ACK, or BYE for requests, or the response code for responses) and current state might be passed. In the verbose case, the entire message might be passed along with a copy of the server state. The right operating point, once again, depends on the level of trust and desired flexibility. Verbosity lends to flexibility, but increases complexity and the possibility of error.
Control data, instructing the server what to do next, must also be passed from the service logic back to the server. This can also range from simple
66 MAY JUNE 1999 http://computer.org/internet/ IEEE INTERNET COMPUTING
I P T E L E P H O N Y S E R V I C E S
(a list of universal resource identifiers, URIs, to proxy to) to complex (an entire message to be sent).
Existing Models
The concept of separating the service logic from the server is certainly not new. This idea is at the heart of the intelligent network (IN),11 a key component of the telephone network. The IN arose from the need to separate services from the telephone switches, enabling rapid development of new services. When a call setup message arrives at a telephone switch, the switch contacts a separate devicea service control point (SCP)to receive instructions about call progress. The IN standards define a basic call state model (BCSM), which is used to define the controls the SCP has over the switch. This model contains the basic states for a call and the events that cause the model to move between states. The switch is configured with a number of decision points (DPs)state changes at which the switch should ask for input from the SCP. When a DP is reached, the current call state and relevant information is reported to the SCP. The SCP can then make a decision about what to do next. It exercises control over the switch by instructing it to proceed to a particular state. The SCP can also instruct the switch to arm or disarm DPs for the remainder of the call.
The concept of separating service logic from servers also exists in the Web. Web servers separate the generation of content (the services) from the detailed protocol handling by using CGI, Java servlets, active server pages (ASPs), or server-side JavaScript. In the case of CGI, the response content is generated by a separate process. When the Web server receives a request, it spawns a separate process to execute the script. The standard output of the script process is connected to a handle on the server, as is the standard input. This means that when the script process reads from its standard input or writes to its standard output, the data actually comes from or goes to the server.
The server also sets a number of environment variables before spawning the coprocess. These variables are used to pass information, such as request details and user information, to the script. The body of the request is written to the standard input of the script. The script writes out the response to be sent to its standard and terminates. The server reads the output, and sends it to the Web browser.
SIP CGI
We have concluded that two mechanisms are needed for a complete service programming solution: a
flexible, general-purpose mechanism for trusted users, primarily targeted at administrators, and a simpler, more restricted mechanism targeted at untrusted users, such as consumers. In the Web, CGI is the most flexible mechanism for creating dynamic content.12 SIPs similarity to HTTP makes applying CGI to Internet telephony straight-forward and advantageous because CGI possesses the following characteristics:
n Language Independence. CGI works with Perl,
C, VisualBasic, Tcl, and many other languages.
n Exposure of all headers. CGI exposes the application to all header content in an HTTP request through environment variables. This approach can be directly applied to SIP because its methods of encoding messages are similar to those in HTTP.
n Creation of responses. CGI can control all aspects of a response, including headers, response codes, and reason phrases, as well as content. This flexibility helps in SIP where services are defined largely through response headers.
n Access to any resources. The CGI script is an ideal starting point for creating IP telephony services because it is a general-purpose program whose flexible interface can use existing APIs to let the service logic access an unlimited set of network services.
n Component reuse. Much CGI componentware provides easy reading of environment variables and easy parsing and generation of header fields. As SIP reuses the basic syntax of HTTP, these tools are immediately available to SIP CGI.
n Environment familiarity. Many Web programmers are familiar with CGI.
n Easy extensibility. Because CGI is an interface rather than a language, it is easy to extend and reapply to other protocols, such as SIP.
Basic Operation of SIP CGI
Like traditional HTTP CGI, a SIP CGI script is first invoked when a SIP request arrives at a server. The server passes the body of the message to the script through its standard input, and it sets environment variables containing information on the message headers, user information, and server configuration. The script performs some processing and generates some data, which is written to the standard output of the script. This data is then read by the server, and the script terminates.
Unlike HTTP CGI, however, the script output need not be the response to send. A script can also
IEEE INTERNET COMPUTING http://computer.org/internet/ MAY JUNE 1999
67
I N T E R N E T T E L E P H O N Y
instruct the server to proxy a request or to create an entirely new request. In fact, the script can instruct the server to generate multiple messages by using the SIP multiplexing rules to place several messages in the script output.
Another important difference between SIP CGI and HTTP CGI is the persistence model. In HTTP CGI, a request arrives, the script executes, a response is generated, and the script terminates. The server generates a response, and the transaction is complete. In SIP, however, a script can cause requests to be proxied. This means that the server will eventually receive responses to these requests, and these responses must be passed to the script for processing. The implication is that after generating its output, the script must somehow persist and continue interacting with the server to process subsequent responses. One option is to keep the script process active for the duration of the transaction, but this departs substantially from the HTTP CGI model. Instead, after processing a message, the script passes a state token (called a script cookie) to the server through a SIP CGI metaheader (a directive passed from the script to the server inside a SIP message; metaheaders are removed by the server before forwarding requests). When the script is reexecuted at some later point, the server passes the cookie back to it through environment variables. This token is opaque to the server and can contain anything of use to the script. In essence, the scripts execution is a procedure call, and the particular procedure is dependent on the semantic of the cookie.
Not all services require the CGI script to be executed for each message that is received. For example, a call-forward unconditional service only requires special logic to be executed when the INVITE is first received. Responses to the proxied INVITE can be processed based on the rules in the SIP specification. To avoid needlessly executing scripts in these cases, the script can instruct the server not to reexecute it when subsequent messages arrive. This feature (which is similar to event DPs in the IN) is also implemented by means of a SIP CGI metaheader containing a description of the conditions under which the script should be executed in the future, applicable to this call only.
In addition to proxying requests, creating new requests, and generating a response, the script can instruct the server to forward a previously received response upstream toward the caller. Each response that is received is stored in the server until the transaction is complete. The server associates a unique identifier with each response. When the script is exe-
cuted, the identifier of the response that triggered execution is passed to the script through environment variables. The script can store these identifiers in the script cookie to be recalled at a later time. When the script is reexecuted later (upon the arrival of another response), the script can instruct the server to return a previously received response. The server does this by placing a special message in the script output. This message contains the identifier for the response in the Request-URI field of the message.
In addition to controlling which messages get sent and when, a script can control the headers in these messages. By default, a SIP server will fill in all of the headers in proxied requests, forwarded responses, or generated responses, according to the rules in the SIP specification. If this behavior is acceptable, the script need not specify any headers in the messages it outputs. A script can, however, instruct the server to place a specific header in a message, replace a header with a new one, or delete a header from a message. The script also has controls on whether the body of the message should be copied, updated, or removed.
Details on the operation of SIP CGI can be found in Lennox, Rosenberg, and Schulzrinne.13
Example SIP CGI Operation
Assume the following request was received, triggering the execution of a SIP CGI script:
INVITE sip:[email protected] SIP/2.0 Via: SIP/2.0/UDP ganymede.university.edu Subject: Ios orbitFrom: sip:[email protected]: sip:[email protected]: [email protected]: 1 INVITEContact: sip:[email protected]
The script outputs the following:
CGI-PROXY-REQUEST sip:b.jacobs@physics. university.edu SIP/2.0
Contact:Subject: Earths rotation
SIP/2.0 180 Ringing
CGI-SCRIPT-COOKIE asd-9unas SIP/2.0
The script output contains three short messages, separated by blank lines. The first instructs the server to proxy the received request to b.jacobs@physics.
68 MAY JUNE 1999 http://computer.org/internet/ IEEE INTERNET COMPUTING
I P T E L E P H O N Y S E R V I C E S
university.edu. The Contact header with no value instructs the server to remove the Contact header from the proxied request. The Subject header instructs the server to replace the Subject header in the proxied request with the one specified. The server will perform these operations and then generate the following proxied request:
INVITE sip:[email protected] SIP/2.0 Via: SIP/2.0/UDP lab2.university.eduVia: SIP/2.0/UDP ganymede.university.edu Subject: Earths rotationFrom: sip:[email protected]: sip:[email protected]: [email protected]: 1 INVITE
The second message in the script output instructs the server to generate a ringing response toward the caller. The server will fill in all of the required header fields and send the following response:
SIP/2.0 180 RingingVia: SIP/2.0/UDP ganymede.university.edu From: sip:[email protected]: sip:[email protected]: [email protected] CSeq: 1 INVITE
The last message in the script output instructs the server to set the script cookie to asd-9unas, a string with meaning only for the script. Next time the script is invoked (when a response to the proxied request arrives), it passes this cookie back to the script in an environment variable.
The header processing rules provide a script flexibility in choosing its level of control. A simple script can let the server handle all header processing. A more complex script can completely manage the server processing by generating all the headers.
CALL PROCESSING LANGUAGE
While SIP CGI is an ideal service creation tool for trusted users, it is too flexible for service creation by untrusted users. We have therefore developed a new scripting language, called the Call Processing Language (CPL), which allows untrusted users to define services. Users can upload CPL scripts to network servers. The logic can be read in and verified, and the service instantiated instantly. In this section we overview the requirements for a language that can be used in this fashion, describe its design, and discuss its primitive constructs.
Language Requirements
Because CPL scripts are generated by untrusted parties and run on a service providers platform, some requirements are imposed on the language.
Verifiability. The service provider must be able to verify automatically that a user-described service is well formed and that its server can successfully execute it. The verification must occur at the time the script is submitted, since delaying it to execution time can keep a user from being able to receive calls. Of course, it is not possible to guarantee successful execution at submission timeunexpected
While SIP CGI is an ideal service creation tool for trusted users, it is too flexible for service creation by untrusted users.
network failures, for example, can cause a service to be unsuccessfulbut a server can confirm that it is able and willing to carry out all the parts of the specified service.
Completion. The service provider must also be able to determine, at submission time, that the service specified in the CPL will be completely executed in a finite amount of time. This implies that the language in which services are specified cannot be Turing-complete, and in particular, certain constructs (such as generalized looping or calls to external services without time-outs) cannot be present. If these constructs were present, guaranteeing completion would be an undecidable problem.
Safety of execution. The service description should not be able to represent unsafe actions, such as modifying other users data or examining arbitrary files on the server. Furthermore, it should not be possible for it to interfere with the operation of the server by using large amounts of CPU time, memory, storage, network bandwidth, or other resources.
Standardized representation. Because customers and service providers may well have software from different vendors, the service descriptions must be compatible between different tools. It is also desir-
IEEE INTERNET COMPUTING http://computer.org/internet/ MAY JUNE 1999
69
I N T E R N E T T E L E P H O N Y
Busy
String-switch field: from
Match:
*@example.com Otherwise
Call
Figure 3. Example CPL decision graph.
able for the language to be readable and producible by both humans and machines. This facilitates the use of automatic authoring tools, but still allows hand-authoring for advanced users.
Unlike SIP CGI, where the details of messages are exposed to the script, only select information and control are made available to the CPL. To support this, the language itself provides a set number of commands that give it access to information or control of the server. These commands are SIP-independent, since they define services at a sufficiently high level. This means the CPL is portable across different signaling protocols and servers.
Language Design
Based on these requirements, we chose to follow the IN service creation models and design a language that represents services in a decision graph. The individual nodes of the decision graph are the primitives of the language. They are specific decisions to be made or actions to be taken in the course of specifying the service. These decisions and actions are arranged in a directed acyclic graph (DAG), which defines the service. Control begins at a single root node, and each node can have several outputs, depending on the result of the choice or action taken at that node. Node outputs then lead down the tree to further actions or decisions. It is possible for some or all outputs of a node to be left unspecified. In the absence of a CPL script, the server should then take its normal or default action in the current call state. Similarly, many parameters to conditions or actions can be left unspecified, also taking default values. An example graph for simple caller-based forward or redirect service is shown in Figure 3.
This representation of services as a DAG also implicitly guarantees most of the CPLs safe execu-
tion requirements. As the flow of control moves only downward in a decision tree, we can conclude (if the decision tree is well formed) that the service must eventually reach a leaf node and terminate. We can also guarantee that the resources the service can use are finite and proportional to the length of the trees longest branch in the worst case. This means the safety of a CPL can be checked by searching for cycles in the graph it represents and computing the trees maximum depth. To ensure bounds on the running time, each actions required time for execution must be restricted. This means actions that interface with external resources, such as database queries, must have time-outs. We have also removed generalized programming constructs, like looping, recursion, and variables.
We decided to represent the decision graphs using a scripting language based on Extended Markup Language.14 XML is similar to HTML; it contains tags that describe the data in the document. We considered using traditional scripting languages, such as Perl,15 Tcl,16 or Python17; portable programming languages, such as Java; and application-specific languages, such as sieve,18 which is used for e-mail filtering. However, XML had several important features. XML documents are perfect for representing structured data, particularly tree structures with optional links, which is the exact structure needed to represent the DAGs that define call services. In addition, since XML contains no specific keywords, we were able to define a precise set representing control primitives and information accesses. XML was also useful since the syntax and semantics of a script can be verified using XML validation against a document type definition (DTD). XML can be produced and read by both humans and machines, satisfying another design goal of CPL.
XML is also a good choice because it is easily extended. Every tag and attribute has an explicitly specified name; thus, a parser can immediately determine whether it can support all requested features, and decide what to do if it cannot support them. Furthermore, XML has built-in mechanisms for adding new tags and attributes, which can come from namespaces specified in the head of the document.
XML is by no means perfect. It tends to be verbose, requiring relatively long programs for simple
70 MAY JUNE 1999 http://computer.org/internet/ IEEE INTERNET COMPUTING
I P T E L E P H O N Y S E R V I C E S
services. In addition, since XML is not a programming language, but rather a syntax, inclusion of certain language features, such as variable assignment, are awkward. However, its limited flexibility is more an advantage than a disadvantage in this application.
Mapping the CPL onto XML is straightforward. There is an enclosing XML tag named call that contains an entire CPL script, indicating the point where execution begins. Both nodes and their outputs are represented as XML tags; parameters are represented as XML tag attributes. Node tags typically contain output tags, and vice versa, representing descent down the decision tree. Convergence (where several outputs point to a single node) is represented with links.
Language Primitives
There are four broad classes of language primitives in the CPL. First, there are switch nodes, which represent decisions a script can make. Second, location nodes indicate where users can be found, either directly or by reference. Signaling actions are the core of the language; they control the behavior of the underlying signaling protocol. Finally, non-signaling actions allow noncall actions to be taken.
Switch nodes. Switch nodes allow the CPL script to make decisions that determine future actions to perform. Two types of decisions exist. The first type depends on the parameters of the original call that triggered the script, such as its sender, its recipient, the types of media involved, the total bandwidth required, and so forth. The other type includes those based on global state independent of the call; the only present example of this is whether the current date or time falls within a given range.
Location nodes. Location nodes specify the locations that subsequent signaling actions should contact. Locations can be specified in two ways: directly, as literal URLs, or indirectly. Indirect lookup allows the server to retrieve a list of locations to contact from an external source such as a database server or a SIP registrar associated with the CPL server.
A CPL script always specifies a set of locations as an implicit global variable. Location nodes modify this implicit variable by either adding to the set, or clearing the set and adding new values to it.
Signaling actions. Signaling actions form the core of the CPL. They control the broad behavior of the
underlying signaling protocol. There are three basic signaling actions: proxy, redirect, and response. Proxy is the most powerful; it causes the CPL server to forward the call to the currently specified location set, and waits for responses from it. The server automatically picks the best response of these. If the best response was success (that is, the call was picked up), the script terminates, since call setup is complete. If not, some output of the node, such as busy, noanswer, or failure, is indicated, and the subsequent nodes pointed to by that output are executed.
Redirect and response are both simpler actions. They immediately terminate the execution of the script, since both actions imply that this call server is done handling the call. Redirect sends a redirection request to the current set of locations; response allows the server to send a failure condition or reject the call.
Nonsignaling actions. Nonsignaling actions allow a script to record events or notify a user of them. For instance, a record can be stored in a log at the server, allowing a user to categorize received calls. Alternatively, an action can send e-mail or an instant message to a user when some event occurs; this allows a script to warn users about failure conditions, when a malfunctioning script might prevent them from receiving calls, or it can alert them of incoming calls when they are not in a position to be reached by telephone.
Note that because XML is easily extended, adding new primitives or parameters to existing primitives is simple, and does not harm backward compatibility. Example CPL scripts are given in the IC Online addenda to this issue (see p. 108). Details on the CPL can be found in Lennox and Schulzrinne.19
CONCLUSION
Internet telephony is much more than point-to-point voice transport on the Internet. It has the potential to combine the best of both traditional telephony services and Internet applications. This will enable new classes of services that do not exist in either network today. However, such flexibility introduces new challenges. How will such services be programmed? How can existing tools for programming Internet services be leveraged for Internet telephony?
We have investigated this problem from two perspectives: programming services as a trusted user and as an untrusted user. The requirements for both are quite differentflexibility is paramount in the former case, and security in the latter. Rec-
IEEE INTERNET COMPUTING http://computer.org/internet/ MAY JUNE 1999
71
I N T E R N E T T E L E P H O N Y
ognizing this, we developed the SIP common gateway interface for programming services in the former case, and the Call Processing Language for the latter. SIP CGI is based on the successful Web CGI model, and affords the same flexibility as HTTP CGI. CPL is an XML-based language, which can be verified automatically to provide security.
A number of open issues remain. SIP CGI has drawbacks that we are working to resolve. First, the script must run on the same machine as the server. Second, SIP CGI scripts cannot easily provide asynchronous directives to the server. We are looking at modifications to SIP CGI to rectify these problems. CPL is still under definition; choosing the right set of primitives is a complex issue. For both CGI and CPL, feature interaction issues require further study. We have already identified a few, but more investigation is needed. Finally, some means for transporting SIP CGI and CPL must be developed, allowing users to upload these scripts to servers. Issues such as authenticity and privacy are paramount here. An initial proposal is to use SIP REGISTER messages for this purpose.20
SIP CGI has been implemented on two separate SIP servers, and CPL is under development on both. n
REFERENCES
1. H. Schulzrinne, Re-engineering the Telephone System, Proc. IEEE Singapore Intl Conf. Networks, Singapore, Apr. 1997.
2. H. Schulzrinne et al., RTP: A Transport Protocol for Real-Time Applications, RFC 1889, Internet Engineering Task Force, Jan. 1996.
3. B. Braden et al., Resource Reservation Protocol (RSVP) Version 1 Functional Specification, RFC 2205, IETF, Oct. 1997.
4. S. Blake et al., An Architecture for Differentiated Services, RFC 2475, IETF, Dec. 1998.
5. ITU Recommendation H.323, Visual Telephone Systems and Equipment for Local Area Networks which Provide a Non-Guaranteed Quality of Service, Geneva, May 1996.
6. M. Arango et al., Media Gateway Control Protocol (MGCP), Internet draft, IETF, Feb. 1999, work in progress.
7. M. Handley et al., SIP: Session Initiation Protocol, RFC (Proposed Standard) 2543, IETF, Mar. 1999.
8. H. Schulzrinne and J. Rosenberg, A Comparison of SIP and H.323 for Internet Telephony, Proc. NOSSDAV, Cambridge, UK, July 1998.
9. H. Schulzrinne and J. Rosenberg, The Session Initiation Protocol: Providing Advanced Telephony Services across the Internet, Bell Labs Tech. J., Vol. 3, Oct.Dec. 1998, pp. 144160.
10. S. Petrack and L. Conroy, The PINT Profile of SIP and SDP: A Protocol for IP Access to Telephone Call Services, Internet draft, IETF, Nov. 1998, work in progress.
11. I. Faynberg et al., The Intelligent Network Standards, McGraw-Hill, New York, 1997.
12. K. Coar and D. Robinson, The WWW Common Gateway Interface Version 1.1, Internet draft, IETF, Dec. 1998, work in progress.
13. J. Lennox, J. Rosenberg, and H. Schulzrinne, Common Gateway Interface for SIP, Internet draft, IETF, Nov. 1998, work in progress.
14. T. Bray, J. Paoli, and C.M. Sperberg-McQueen, Extensible Markup Language (XML) 1.0, W3C REC-xml-19980210, Feb. 1998; http://www.w3.org/TR/REC-xml.
15. L. Wall, T. Christiansen, and R.L. Schwartz, Programming Perl, 2nd edition, OReilly, Sebastopol, Calif., 1996.
16. J.K. Ousterhout, Tcl and the Tk Toolkit, Addison-Wesley, Reading, Mass., 1994.
17. M. Lutz, Programming Python, OReilly, Sebastopol, Calif., 1996.
18. T. Showalter, Sieve: A Mail Filtering Language, Internet draft, IETF, Mar. 1999, work in progress.
19. J. Lennox and H. Schulzrinne, CPL: A Language for User Control of Internet Telephony Services, Internet draft, IETF Mar. 1999, work in progress.
20. J. Lennox and H. Schulzrinne, Transporting User Control Information in SIP REGISTER Payloads, Internet draft, IETF, Mar. 1999, work in progress.
Jonathan Rosenberg is a member of technical staff in the
High Speed Networks Research Department, Bell Laboratories, Lucent Technologies, in Holmdel, New Jersey. He conducts research on technologies related to multimedia communications on the Internet, including transport and error recovery, signaling, architectures, protocols, and service creation. Rosenberg received BS and MS degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, and is a PhD candidate at Columbia University, New York. He is chair of the Internet Engineering Task Force IP Telephony working group.
Jonathan Lennox is a graduate student in computer science at
Columbia University. He is researching service creation for Internet telephony.
Henning Schulzrinne is a professor in the Computer Science and Electrical Engineering departments at Columbia University, New York. (See page 43 for a full biography.)
Readers can contact Rosenberg at [email protected]. Readers can contact Lennox and Schulzrinne at {lennox, hgs}@cs.columbia.edu.
72 MAY JUNE 1999 http://computer.org/internet/ IEEE INTERNET COMPUTING
Copyright IEEE Computer Society May 1999