Wednesday, August 20, 2014

An Undecidable Problem in SIP

A few years back, one of my colleague at 8x8 made an interesting suggestion. We were at the time discussing a recurring problem of SIP call loops in the Packet8 service and his suggestion was to write a program that would analyze all the various forwarding rules installed in the system, and simply remove those that were the cause for the loops. I wish I remember what I responded at the time and that I had the insight to say that writing such program is, well, impossible, but that's probably not what happened.

Now "impossible" is a very strong word and I must admit that I have spent most of my career in computers thinking that nothing was impossible to code and that, worst case, I just needed a better computer. It just happens that there is a whole class of problems that are impossible to code - not just difficult to code, or not that the best possible code will take forever to return an answer without using a quantum computer, but that it is impossible to write a program that always return correct answers for these problems. The SIP call loop problem is one of them.

To make sense of this we will need to define a lot of concepts, so lets start with a SIP call. SIP is, for better or for worse, the major session establishment protocol for VoIP. A SIP client (e.g. a phone or a PSTN gateway) establishes a call more or less like how a Web browser contacts a web site, with the difference that in the case of SIP the relationship with the server lasts for the duration of the call. One of the (deeply broken) features of SIP is that there may exist intermediate network elements called SIP Proxies that can, during the establishment of a call, redirect the call to a new destination. In this case the server, instead of answering the call itself, creates a new connection to a different destination which can itself creates a new connection and so on. This mechanism obviously can create loops, especially when the forwarding rules in these servers are under the control of the end-user - which was the problem we encountered at 8x8. There is many mechanisms a SIP proxy can use to decide if, when, and where to redirect a call, but for the sake of simplicity we will consider only a subset of all these possibilities, and assume that all the SIP proxies involved use the Call Processing Language (CPL), a standard XML-based language designed to permit end-users to control, among other things, how a SIP proxy forwards calls on their behalf. Jive's users can think of a CPL script as equivalent to the dialplan editor, but for just one user and with the additional constraint that it is not possible to create loops inside a CPL script.

Here an example of CPL script taken from the standard (RFC 3880) that will forward incoming calls to the user's voicemail if she does not answer the calls on her computer:

<cpl xmlns="urn:ietf:params:xml:ns:cpl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:cpl cpl.xsd">
  <incoming>
    <location url="sip:jones@jonespc.example.com">
      <proxy>
        <redirection>
          <redirect/>
        </redirection>
        <default>
          <location url="sip:jones@voicemail.example.com">
            <proxy/>
          </location>
        </default>
      </proxy>
    </location>
  </incoming>
</cpl>

One interesting feature of CPL is that it is extensible, so even if only a subset of all the capabilities of a standard compliant SIP proxy can be implemented using baseline CPL, it is possible to add extensions to be able to use any legal feature of the SIP standard. As an example, the following CPL script (also taken from RFC 3880) rejects calls from specific callers by using such an extension:

<cpl xmlns="urn:ietf:params:xml:ns:cpl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:ietf:params:xml:ns:cpl cpl.xsd">
  <incoming>
    <address-switch field="origin" subfield="user" xmlns:re="http://www.example.com/regex">
      <address re:regex="(.*.smith|.*.jones)">
        <reject status="reject" reason="I don't want to talk to Smiths or Joneses"/>
      </address>
    </address-switch>
  </incoming>
</cpl>

For the rest of this discussion, we will consider that we can only use CPL extensions that are legal in a standard SIP proxy.

Now that we established the perimeter of our problem, we can formally formulate the question as follows: If we consider a VoIP system made only of endpoints (phones) and SIP proxies that run CPL scripts, and with the complete knowledge of all the (legal) CPL extensions in use, is it possible to write a program that takes as input all these scripts and an initial call destination (known as a SIP URI) and that can tell us if the resulting call will loop or not?

That seems simple enough: Try to build an oriented graph of all the forwarding rules, and if the graph is not a directed acyclic graph, then it is looping.

Let's say that my former colleague took up the challenge and wrote this program. Using Java, he would have written something like this:

boolean isLooping(Map<URI, Document> configuration, URI initial) {
  // some clever code here...
  }

The "configuration" parameter carries the whole configuration of our SIP system as a list of mappings between an Address-Of-Record (AOR, the identifier for a user) and the CPL script that is run for this user. The "initial" parameter represents the initial destination for a call (which may contain the AOR of a user). The Java function returns true if a call to this user will loop, or false if the call is guaranteed to reach someone or something, other then the originator of the call, without looping.

Now let's change the subject for a little bit and talk about something called the Cyclic Tag System (CTS). CTS is a mechanism to create new bit strings from an initial bit string and an ordered list of bit strings called productions, using the following rules:

1. Remove the leftmost bit of the current bit string.
2. If the removed bit was equal to 1 then concatenate the next production in the list to the current bit string (restarting at the first production when all are used).
3. Continue at rule (1) unless the current bit string is empty.

The following example from Wikipedia uses an initial bit string of 11001 and a list of productions of { 010, 000, 1111 }. Following the rules above, we generate the following bit strings:

11001
1001010
001010000
01010000
1010000
010000000
10000000
...

Simple enough.

Now, let's say that we want to implement CTS in SIP. The CPL language is not powerful enough for that but we can define a very simple extension (still conforming to the SIP standard) that permits to copy a character substring into a parameter for the next destination of a call, e.g.:

<location m:url="sip:c.org;p={destination.string:2:10}" />

Here we extract the "string" parameter from the original destination and copy ten characters starting in second position into the new destination. With this new extension, we can now write three CPL scripts that will implement the CTS example above:

Script attached to user `sip:p1@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p2@jive.com;bitstring={destination.bitstring:1}010” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p2@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

Script attached to user `sip:p2@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p3@jive.com;bitstring={destination.bitstring:1}000” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p3@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

Script attached to user `sip:p3@jive.com`:

<address-switch field=”destination”>
  <address contains=”;bitstring=1”>
    <location m:url=”sip:p1@jive.com;bitstring={destination.bitstring:1}1111” />
  </address>

  <otherwise>
    <address-switch field=”destination”>
      <address contains=”;bitstring=0”>
        <location m:url=”sip:p1@jive.com;bitstring={destination.bitstring:1}” />
      </address>

      <otherwise>
        <reject />
      </otherwise>
    </address-switch>
  </otherwise>
</address-switch>

On our Java function isLooping(), these three scripts would go on the first parameter (indexed by the address of the proxy) and the initial parameter would contain "`sip:p1@jive.com;bitstring=11001`".

It is trivial to write a program that can generate the CPL scripts needed to implement any list of productions. No need to even write a program, a simple XSLT stylesheet can do the job.

The reason we implemented CTS in SIP is that in 2000, Matthew Cook published a proof that CTS is Turing-Complete. Turing-Complete is a fancy expression that means that it is computationally equivalent to a computer, which basically means that any computation that can be done on a computer can also be done by the CTS system - obviously doing it multiple orders of magnitude slower than a computer, but both a computer (or a quantum computer, or a Turing machine, or lambda calculus, or any other Turing-Complete system) and the CTS system can compute exactly the same things. Because only a Turing-Complete system can simulate another Turing-Complete system, by showing that we can implement any CTS production list in SIP we just proved that SIP is equivalent to a computer. In turn that means that it is possible to convert any existing program unto a list of CPL scripts (with our small extension) and an initial SIP URI. So knowing this and knowing that the Java bytecode is also Turing-Complete (it is trivial to implement CTS in Java), we now can write this new function:

Map<URI, Document> convert(Runnable runnable) {
  // complex, but definitively implementable
  }

The function takes a Java class (i.e. a list of bytecodes) as parameter and returns a list of { SIP URI, CPL scripts } mappings as result. The class to convert implements Runnable so we have a unique entry point in it (i.e. its run() method), an entry point that by convention becomes the SIP URI on the first mapping returned by the function.

Now that we have these two basic functions, let's write a complete Java class that is using them:

class Paradox implements Runnable {
  boolean isLooping(Map<URI, Document> configuration, URI initial) {
    // some clever code here
    }

  Map<URI, Document> convert(Runnable runnable) {
    // complex, but definitively implementable
    }

  public void run() {
    Map<URI, Document> program = convert(this);
    if (!isLooping(program, program.keySet().iterator().next())) {
      while (true);
      }
    }
  }

What we are doing here is converting the whole class into a SIP proxy configuration, and passing it to the code that can predict if this program will loop or not. Then we do something a bit tricky with the result, which is to immediately exit if we detect that it will loop. But because this is exactly the code that was under the scrutiny of the isLooping() implementation, it should have returned false, which should have executed the "while (true);" code. But executing the "while (true);" code means that the code is looping, which again contradicts the result we have.

Both cases are impossible which can have only one explanation: The isLooping() implementation is unable to find if this program loops or not. Thus, we have shown that it is possible to write a Java program that loops but cannot detect that it is looping. And because we previously proved that SIP with CPL is Turing-Complete, we know that if it is possible to write such a program in Java, it is theoretically possible to do so in SIP configuration. Because of this, we now know that it is impossible to write a program that can reliably predict if a SIP configuration loops or not. More precisely, for any implementation of isLooping(), it is always possible to find an instance of the "configuration" and "initial" parameters for which this version of isLooping() will return an incorrect response.

Now that we have the answer to our question, let's have a better look at what really happen in a VoIP system. A SIP call never really loop forever (well, unless the PSTN is involved, but that's a different story), because there is a mechanism to prevent that. Each call contains a counter (Max-Forwards) that is decremented when it traverses a SIP proxy, and the call ends when this counter reaches zero. Max-Forwards is a little like CPU quota. It does not improve the quality of a program; rather it just prevents it from making things worse. Putting aside the fact that we are in the business of establishing communications between people, not finding fancy ways to prevent that, the Turing-Completeness of SIP still gets in the way as, for the same reasons than before, it is also impossible to write a program that will reliably predict if a call will fail because the Max-Forwards value will reach zero.

This is a sad state of affairs that the only way to reliably predict a possible failure is to let this failure happen, but the point of this article is that this is not really the fault of the programmer, but just the consequence of the limitations of computation in this universe.

Note that at least some SIP systems are not necessarily Turing-Complete - here we had to add an extension to our very limited system based on CPL to make it Turing-Complete. But it is far easier to prove that something is Turing-Complete than proving it is not, so even if I could not find a way to prove that a pure CPL system is Turing-Complete, that does not prove it is not. But worse, we saw that there is not much difference between a Turing-Complete system and one that is not, so even a small and seemingly irrelevant modification to a system can make it Turing-Complete. So basically writing a program that tries to reach definitive conclusions on the behavior of a system moderately complex - like SIP - is an exercise in futility.

Many thanks to Matt Ryan for his review of this article.