Snowflake Technical Overview – Latest Draft – August 2016

Snowflake is a new WebRTC Pluggable Transport. This document provides a technical overview of Snowflake in terms of the system’s components, interactions, and code. The intent is to introduce Snowflake to the moderately technical reader, and those interested in contributing to this project and Internet Freedom in general. Specifically, this document will discuss Snowflake’s use of WebRTC, its approach to Rendezvous using Domain Fronting, its method in traversing NAT using ICE negotiation, and a number of additional considerations, without assuming significant prior knowledge in these topics. Snowflake and this live document are a work in progress. As everything in the Snowflake project develops further, this document will be updated, and additional documents will be made available to discuss metrics, further topics, and other relevant results given future work on this project.

1 Introduction
2 Overview
3 Snowflake Circumvention Process
4 Contribution
- 4.1 Source Code
- 4.2 Building with Tor Browser Bundle
History
Contact

1 Introduction

Snowflake is a new circumvention tool which provides access to the free and open internet. As a Pluggable Transport, it provides easy-to-use access to a censorship circumvention system such as Tor. It is inspired by and builds upon the previous work of Flashproxy. Snowflake is much like a hybrid of previous Pluggable Transports, and this document will serve as a guide for exploring this system.

To illustrate in the context of Tor, Snowflake allows anyone to leave a browser tab open to become an ephemeral Tor bridge. Much like the Flashproxy design, Snowflake involves a large network of highly ephemeral volunteer proxies, with the goal of outpacing the censor’s ability to block proxy IP addresses and providing a very easy to use, reliable, and hard-to-filter method of circumventing censorship. Previously, users faced difficulties in manually configuring port-forwarding, which limited adoption of older tools like Flashproxy. Snowflake addresses NAT traversal by making it automatic and not the user’s responsibility, among a number of new advantages.

Usability and reliability are important to the snowflake system, both in terms of simplifying the process for people in censored regions to connect, as well as allowing volunteers to very easily help others connect. This allows the circumvention network to more easily scale up both in the number of volunteers and number of clients. In this way, the Snowflake system grows ever stronger in terms of circumvention capacity, bandwidth, and resiliency with the size of its volunteer network.

2 Overview

Using Tor as the example use-case, the sequence of interactions involved in a snowflake session may be as follows:

User in the filtered region wishes to access the free and open internet. They open Tor Browser, selecting snowflake as the Pluggable Transport. This starts the snowflake client.
Volunteers outside the filtered region browse websites which host the snowflake proxy code. These volunteers’s browsers then become temporary proxies available to serve some snowflake client.
The filtered user’s snowflake client automatically finds some of these volunteer remote in-browser snowflake proxies using a secure rendezvous strategy, which also traverses NAT automatically.
These two snowflake peers establish a peer-to-peer connection over WebRTC.
Once WebRTC is ready, the snowflake client exposes the WebRTC transport for Tor’s use.
Meanwhile, the volunteer’s snowflake proxy connects to a destination Tor relay and begins passing traffic between the snowflake client and the Tor relay.
Tor builds a circuit and the user can now circumvent.

To further clarify: It is not the website hosting snowflake which acts as the snowflake proxy. Rather, it is the visitor to the website – their browser tab becomes the volunteer proxy.

Snowflake involves three components which allows this process to occur:

The snowflake client, which is a client transport plugin conforming to the Pluggable Transport specification (ptspec). Tor utilizes this just like any other Pluggable Transport. Any other ptspec-aware system may as well. This component is written in Golang.
The snowflake proxy, which is a miniature in-browser WebRTC proxy. It conveys data between snowflake clients and some destination — for Tor, this would be a Tor Relay. This component is written in CoffeeScript.
The broker, responsible for Rendezvous. It is similar to the “Facilitator” from Flashproxy, but exclusively uses Domain Fronting for now. This component is written in Golang.

The snowflake client and snowflake proxy may also be referred to as snowflake peers.

In Snowflake, WebRTC occurs only between the snowflake peers: some snowflake client and some snowflake in-browser proxy, as WebRTC serves as the transport crossing the filter boundary. Communication from the proxy to the destination is currently via websocket. Communication to the Broker is over HTTPS / Domain Fronting.

Here is a diagram to further illustrate the snowflake circumvention process:

Many more details are involved in these processes and components, which shall be explored in greater depth below:

3 Snowflake Circumvention Process

3.1 Pluggable Transport Client Behavior

Snowflake is a Pluggable Transport, conforming to the Pluggable Transport specification.

Specifically, Snowflake contains the client transport plugin which provides a localhost SOCKS server as the interface between the client application and the transport. In the Snowflake plus Tor Browser context, this snowflake client transport plugin creates a localhost SOCKS server which the client application, Tor Browser, sets as its proxy setting.

The snowflake client is also responsible for ensuring connections to remote snowflake proxies are available, so that the SOCKS server may handle requests from the Tor browser by passing traffic to a snowflake proxy. It is assumed that these remote peers pass the traffic on to a Tor Relay, allowing the whole system to satisfy expected behavior as a “WebRTC Transport” for Tor.

However, before the snowflake client can utilize the transport, the local snowflake client and remote peer must first establish connectivity using WebRTC.

3.2 WebRTC Connection Establishment

WebRTC is a fairly recent standard providing robust peer-to-peer Real-Time Communication, involving streaming video, audio, and arbitrary binary data. For the purposes of current Snowflake, only binary data channels are utilized via WebRTC DataChannels; media channels are not. WebRTC DataChannels utilize SCTP and DTLS to provide a fairly reliable, encrypted transport. Of course, there are many other aspects to consider here, including but not limited to fingerprinting. In the future, it is possible that WebRTC’s RTP Media Channels could be useful as alternative transports.

Originally, WebRTC was only available either through the JavaScript APIs in modern versions of Chrome and Firefox, or via the native code C++ library. For the purposes of developing Snowflake, it was necessary to create a Golang library which adapts the C++ WebRTC library using cgo.

3.2.1 Session Descriptions

Programs in the web browser cannot passively listen for incoming connections; they must initiate the outgoing connection. Since both the snowflake client and proxy are WebRTC peers subject to this particular constraint, prior to being connected via WebRTC, there must be some way for these peers to send signals and discover each other in order to bootstrap their WebRTC PeerConnection. This signalling over some bi-directional communication channel is required for every WebRTC scenario, though not designed into WebRTC itself. Rendezvous is not part of WebRTC’s scope; all consumers of the WebRTC API are expected to handle Rendezvous for their own use case.

For Snowflake, this process begins when a snowflake client creates a new WebRTC PeerConnection not yet connected to a remote peer. This PeerConnection then creates a single unique DataChannel, which triggers a number of events which prepare a local SDP (Session Description Protocol) Offer, allowing the rendezvous process to begin, which will be described in more detail below. This SDP Offer primarily describes the peer and its capabilities, along with instructions for some remote peer on how to potentially reach the client over the network.

Through the Rendezvous process, the snowflake client’s SDP Offer reaches a snowflake proxy, which generates an SDP Answer in response containing information similar to the SDP Offer, while describing the proxy instead. When Rendezvous succeeds, the snowflake client receives this SDP Answer, and both WebRTC endpoints now have each others’ SDP messages. At this point, the snowflake peers can now attempt to establish a direct connection.

3.2.2 Completing the Circuit

Upon success, the WebRTC PeerConnection and its DataChannel open on both peers and become available for streaming bytes. The snowflake client then wires its DataChannel to the localhost SOCKS proxy mentioned earlier to expose the transport to the client application. The snowflake proxy establishes a simple websocket connection to a Tor relay. At this point, the WebRTC transport is ready and available, allowing arbitrary user traffic, or for instance, establishing a Tor circuit.

However, in order to successfully establish a WebRTC PeerConnection, the SDP messages described above must be transmitted correctly and securely between peers. There are many other considerations here, as the adversary could interfere with various rendezvous strategies.

3.3 Rendezvous

Rendezvous is essentially the process of clients and proxies finding each other. To connect two snowflake peers, it is necessary to exchange signalling messages consisting of the SDP Offers and Answers described above. This allows the peers to know where to begin negotiating a P2P connection.

3.3.1 The Broker

In Snowflake, Rendezvous is managed by the Broker, which is a server running on a third party web service. The Broker is responsible for securely matching snowflake clients with snowflake proxies by exchanging SDP Offers and Answers between them while maintaining book-keeping of the snowflake peers. An arbitrarily high number of proxies and clients may be engaging in this process with the Broker concurrently, which requires additional scrutiny in order to keep the Broker scalable, robust, and resilient to DDoS while remaining secure.

An individual rendezvous process consists of a series of interleaved HTTP requests.

A fresh snowflake proxy sends a POST request to the Broker as a long-poll , indicating it’s looking for a client to serve.
A snowflake client sends a POST request to the broker containing their SDP Offer. The broker holds that request open, forwarding the SDP Offer as the reply to one of the snowflake proxy’s subscribe polls from step 1.
The snowflake proxy receives the SDP Offer and composes an SDP Answer. It then sends another POST request to the Broker, containing the answer as a reply intended for the client which sent the Offer.
The broker forwards the SDP Answer as the reply to the original snowflake client as the response to the POST request.
Both the snowflake proxy and snowflake client now have each others’ SDP messages, which is sufficient to begin establishing a direct WebRTC PeerConnection. If any step of this process took too long, the requests safely time out, and the snowflake peers retry.

Furthermore, this exchange of signalling messages assumes a pathway that is also highly resistant to filters. In particular, direct connections to the Broker from the client in the filtered region are assumed to be blocked by the adversary, without impeding the functionality of the broker. This is possible due to Domain Fronting.

3.3.2 Domain Fronting

Much like Meek, another Pluggable Transport, Snowflake uses Domain Fronting. Domain Fronting is a collateral-freedom based method of circumvention. It takes advantage of HTTPS and the behavior of large third party web services. Large internet companies such as Google, Amazon, and Microsoft offer web services using CDNs (Content Delivery Networks) tailored to their needs. These CDNs serve not only their own web services, but also services that users may host on their platforms, such as App Engine. Snowflake currently hosts the Broker on App Engine, but will also do so on other services.

To illustrate using App Engine, let’s say say a Snowflake Broker is located at snowflake-123.appspot.com and let’s assume the censor already blocks direct connections to it. So, when snowflake client wishes to communicate to this Broker, it opens a TLS connection not to snowflake-123.appspot.com, but to a valid root domain instead, google.com. However, appengine instance is provided in just the host header of the HTTP request instead, so the HTTP request looks like this:

GET / HTTP/1.1
host: snowflake-123.appspot.com

When Google’s serving infrastructure receives this request, it recognizes that it can serve the desired App Engine instance. (Typically, if the host header contains some arbitrary address that is not available through this domain, it would return some sort of error like 403 Forbidden). Amazon and Microsoft can do something similar with their respective services as well. Since the HTTP request is sent over TLS, the censor cannot see the host header, so the request looks like an innocuous request just to google.com. This implies the censor cannot block the broker without blocking all of Google, or all of Amazon, hence collateral freedom.

Refer here for a more comprehensive view of Domain Fronting.

Furthermore, since Snowflake uses Domain Fronting only for Rendezvous, rather the transport itself as Meek does, resource utilization is far lower, as brief signalling messages to the Broker consists of far fewer bytes than all of user traffic. This greatly decreases third-party costs and CDN fees, which affords the ability to scale the tool up to support far more users. This is one of the two main new benefits Snowflake provides in response to older circumvention tools.

3.4 NAT Traversal

The second main advantage of Snowflake is its approach to NAT traversal. One of the assumptions in Snowflake is that both the client and proxy are behind NAT (Network Address Translation), and that this is Snowflake’s responsibility.

Most devices are behind some router which implement NAT, which is widely deployed around the world. Though NAT provides various benefits including overcoming the address space limitation of IPv4, it also introduces a barrier in establishing peer-to-peer connections, making it more difficult to determine a direct pathway between peers.

Snowflake discovers this pathway without requiring the user to manually configure port forwarding. Since NAT traversal is now automatic and not within the realm of any user’s responsibility, this addresses previous usability issues which limited adoption in previous circumvention tools.

Automatic NAT traversal in Snowflake is possible due to ICE negotiation.

3.4.1 Ice Negotiation

Snowflake approaches NAT traversal using WebRTC’s ICE negotiation (Interactive Connectivity Establishment).

When a peer engages in ICE, it first gathers ICE candidates through a series of fallbacks. Each ICE candidate is a local or translated public IP which could potentially allow other devices to reach it either directly, through UDP hole punching via STUN, or via a TURN relay as a last resort. ICE then expects these ICE candidates to reach the remote peer somehow. In WebRTC, these ICE candidates are a component of the SDP messages, whose delivery is handled by the Rendezvous process via the Broker as mentioned above. Once both peers have each others ICE candidates, the peers try each ICE candidate until they are able to establish a P2P connection.

3.4.2 Caveats for STUN and TURN

The ICE negotiation process leads to further implications about the availability of STUN and TURN servers on the public internet, as the circumvention process now depends on the availability of at least one of these servers. STUN servers are public and fairly inexpensive, so many are available on the public internet. TURN servers are more rare, and required for about 10% of peer combinations where STUN does not work, specifically in symmetric NAT cases which prevent normal UDP hole punching.

Right now, Snowflake is only configured to utilize STUN by default. Including TURN servers in the configuration is trivial when available.

While it is quite possible for the censor to also block STUN and TURN servers, these servers are typically required for every flavor of peer-to-peer connectivity establishment common to various other domains and applications. This means there is some amount of collateral freedom already involved in STUN and TURN, but there are no guarantees there. Providing a highly available population of unblocked, performant STUN and TURN servers to the client remains a question to approach as deployment proceeds. The addresses of the STUN and TURN servers may also be provided through a domain fronted channel.

More substantial future work with TURN and Snowflake is likely required to provide a good experience for this remaining 10% of users.

3.5 Recovery and Multiplexing

Once snowflake has prepared WebRTC and engaged as a Pluggable Transport, it also needs to be able to recover quickly and reliably upon disconnect. This is especially important in every case, as the snowflake proxies are assumed to be ephemeral.

For the snowflake client, there are two primary ways for the transport to fail:

The remote snowflake proxy closes, disconnects, or has an error because the volunteer navigates away from the page, closes the tab, loses connectivity, or some other scenario which ends the WebRTC DataChannel remotely.
There is a local error on the SOCKS side. This is much rarer.

In any case, snowflake seeks to maintain high reliability and connectivity, and a high quality browsing experience for the user in the Tor Browser use-case, by having snowflake clients and proxies multiplex each other. When an individual WebRTC DataChannel fails, the snowflake client renews with a new WebRTC peer. If the WebRTC DataChannel was actively in-use as the transport, the snowflake client triggers a renewed SOCKS handler which switches the transport to a different WebRTC DataChannel.

The number of multiplexed WebRTC DataChannels to seek out and maintain on the snowflake client can be configured using the -max N flag.

4 Contribution

Snowflake is under active development, and there is plenty of work required for the foreseeable future. Much of this is described in the updated Snowflake OTF proposal. This includes but is not limited to reproducible builds, auditing, metrics on usage and adoption, traffic fingerprintability, a headless snowflake proxy implementation, a browser extension implementation or integration with existing extensions like Cupcake, and other additions or independent implementations for a wide variety of use cases.

If you are reading this and are excited or curious about pushing Snowflake’s approach towards Internet Freedom forward, please feel free to contact Serene or send a pull request.

4.1 Source Code

All components of Snowflake are free software under FOSS-style licenses and the source code is available in these two primary locations:

4.2 Building with Tor Browser Bundle

This section is a work in progress, as the process for reproducibly building snowflake with Tor is currently not fully determined. Information on this is available on

https://trac.torproject.org/projects/tor/wiki/doc/Snowflake#IntegrationwithTorBrowser

History

Snowflake began in 2015 as a collaboration between Serene Han, Arlo Breault, and David Fifield. It was made possible due to the support of Open Tech Fund’s Information Controls Fellowship Program, with many thanks to the folks there, at CDT, Vern Paxson of ICSI, the Tor Project, and others. It is currently under active development.

Contact

serene [at] torproject [dot] org

Contents