WebRTC: Not Quite Magic
In the past year I have been hearing more and more people talk about WebRTC and its plugin-less and auto-magical nature. At hackathons kids pump out amazing projects that allow for rich server-less communication between browsers, polished in just a few hours. Scores of people talk about how you can get a WebRTC based application up and running in 5 minutes flat.
All of this is true when you are developing an application for communicating with people in the same room.
About a month ago I decided it was time to dive into the awesome and easy world of WebRTC. A friend of mine had an idea for an application that involved audio communication between users. Of course I told him I could make it in 5 minutes and only a few dozen lines of code, basing my presumptuous statement on all the buzz I had been hearing at hackathons and around the internet. An hour later we were video chatting. Great! Those people were obviously exaggerating with the whole 5 minutes thing, but this was still really fast and really easy.
I moved on, expanding the application to allow for video chat between multiple people, basically anyone that visited a unique url. This took a day or two to put together. I showed my friend the application and we were soon chatting with seven other people. I was so impressed with WebRTC and whatever framework I had been using at that point.
This excitement turned to confusion when the eighth friend tried connecting. He messaged me saying he couldn’t see me. On my end too, all I saw was the inactive peer-view populated by my web server that kept track of users. I went on to test the application between more people and connection issues came up between some more pairs of peers.
By this point I had a basic understanding of how WebRTC tries to instantiate a connection between two peers. Using a signaling server or another intermediary, a peer exchanges session descriptions with a second peer. Using this information the peers attempt to connect directly. In the case where a peer is behind a NAT or firewall, the peer will need some external help. That help comes in the form of a STUN server. A STUN server’s job is to provide NAT traversal (UDP hole punching) by making a peer aware of it’s public IP address.
I had a functional STUN server set up.
It turns out STUN is not a fix-all. Depending on the network topology and security between two peers, a direct peer to peer connection may just be out of the question. STUN only connected about 70% of peers in my tests. In the unsuccessful cases a server is necessary to relay data. The typical solution is the TURN protocol (traversal using relays around NAT, creative name eh?).
At this point in my adventure I started realizing that to get a reliable and consistent application deployed it would take much more infrastructure than a simple web server for loading up some WebRTC code on each client. The idea of setting up some TURN servers to stream between peers in a peer to peer application seemed contradictory, but knowing that it would let me achieve a 100% connection rate I went ahead and researched TURN servers for a few days. It turned out TURN servers are not very easy to setup or deal with or secure or anything.
With my web server, signaling server, STUN servers, and now TURN streaming servers all setup I was ready to get people video chatting. I gathered the group of people that had been testing out my application and made a chat room…
No luck. Lets just say TURN servers do not let you achieve a 100% connection rate - or anything near it. Of the pairs of peers that could not connect using STUN, TURN only connected a handful. Even with TURN servers I was getting around a 95% connection rate. This seemed outrageous since TURN servers were advertised as the solution for “that small percentage of your users stuck inside an oppressive corporate network”. None of the peers testing my application were behind corporate firewalls* and still TURN, the all powerful firewall-bypassing solution was not connecting everyone.
I decided that I must have configured my TURN servers incorrectly. To check whether these issues were simply due to my own misconfiguration, I took the same group of people and checked if the problematic pairs could connect using a handful of common WebRTC video chatting applications that certainly had TURN servers enabled, including the demo from the 5 minute people. The same exact pairs could not connect using any other application I could find… except for one (but that’s for later).
I had been putting this application together and researching WebRTC related technologies for about a week at this point and it was getting apparent that I would not be able to use WebRTC to create a reliable video chatting tool without all the required fallback servers plus a solution for connecting the remaining 5% of peers that TURN could not handle. I felt frustrated. But the frustration was caused by false expectations accidentally formed out of all the surface level information, naive frameworks, and hype readily available around the internet. All of these information sources conveniently or unknowingly avoided the real challenges presented when using WebRTC.
WebRTC is not broken. The problem is that too many people including myself have deemed it a solution with no strings attached. This audience has advertised how it is “so easy” and “just works”. All the buzz has made it seem like a few lines of code can be an outright solution when really a robust infrastructure is required.
WebRTC In Production #
As mentioned above, while testing other applications to see if they mirrored the same connection issues I came across one WebRTC video chatting application that had zero connection problems, vLine. After doing a little research I quickly figured out that the application was a proof of concept for a PaaS solution aimed at closing the performance gap reflected in my findings. VLine makes WebRTC a viable technology by adding a TCP tunneling fallback layer in addition to the typical STUN and TURN network. The TCP tunneling network allows for a virtually 100% connection rate since it simulates UDP streaming through the HTTPS port.
VLine even has a few blog posts similar to this one describing the same steps and realizations that I had went through. It felt I had wasted a lot of time and should have discovered their product or found their blog posts earlier, but I guess I have shown myself in detail the current state of WebRTC and why I need vLine’s product to run a modern WebRTC based application.
STUN and TURN are not enough.
Please let me know what your experiences have been!
If anything in this article looks incorrect please let me know via twitter.
*The people testing my application were located all around the world in different countries which may have had some effect on the lower than average TURN connection rate.