There’s a raging argument going on in WebSocketLand (ie, the hybi@ietf.org list), between Shelby Moore and – well – everyone, about layered designs in protocols. I shared my views, but I thought it might be interesting to some of you lot, my imaginary readers, so I repost here.
To give you some context, the subject of channel binding was being raised as an interesting, and positive, application of layer violations.
On Tue Aug 24 03:32:28 2010, Shelby Moore wrote:
In the ideal way, one would first connect with say TLS, then Upgrade to the Authentication protocol, which would interact with the encrypt layer in a standard API, and then the higher layer would interact with the Authentication layer through another standard API.
FWIW, this is more or less how it happens, just with an ordering change.
One first connects with the application protocol (and there are several we might choose, such as XMPP). Next, a layer insertion happens, inserting TLS. Next, a second layer insertion occurs, inserting SASL. The SASL layer may communicate with the TLS layer for authentication or channel binding – both are abstract concepts, and could be done with an abstract API. Finally the application layer continues.
If you accept the conceit of layer insertion – and why not – then you can use the existing layer model just fine. Adding layer insertion also helps conceptually with compression techniques like XEP-0138 or RFC 4978, which insert a compression layer into the stack.
However, it’s important to recognise the distinction between a model, which is a method for people to understand and discuss particular areas of the overall functionality, and the reality.
In reality, the authentication provided by TLS, if any, is mediated by the application – which controls authorization, and therefore needs to ascertain whether the TLS credentials (typically an X.509 certificate) can be used to authorize the session. Similarly, SASL communicates back to the application protocol to translate from a username to whatever the application deals in (which can be similar or wildly different – for XMPP, a Jid; for LDAP, a DN). So as long as you don’t look too closely, the layer model holds, but in many areas, the boundaries are blurred.
There’s two important things to remember, though:
- In abstract terms, the layers exist, and are reusable. Thus generic TLS and SASL libraries can, and do, exist, and applications can use them.
- But the layer model is just a model; it is there for humans, and doesn’t produce any inherently better overall solution. By limiting the amount of knowledge required at each layer, though, it allows us to work around human limitations and achieve good solutions via combining specialized expertise. So I get to treat TLS as a “magically secure” thing for exchanging certificates and “doing security”, and TCP is “just a pipe”.
If anything, it’s the human expertise that really follows the layer model – as an application protocol guy, every time I need to know something new about TLS or TCP in order to properly code a client or server, that’s the kind of layer violation that I really care about.
Of course, the worst layer violation is the one where we force the user to have to have significant understanding of some portion of the stack. (We do this far too often, by trying to explain X.509 strong authentication to users in Firefox, and trying to explain the intricacies of unauthenticated addressing in internet mail).