WebRTC: Peer-to-Peer Media Without a Media Server — Blog

What WebRTC is and isn't

WebRTC enables real-time audio, video, and data exchange directly between browsers (or native clients) without routing media through a central server. Call signaling still requires a server — WebRTC doesn't specify how peers find each other. What it does specify: how they negotiate media format, how they punch through NAT, and how media flows once connected.

The media path in a WebRTC call is peer-to-peer. The server only facilitates setup.

The WebRTC protocol stack

ConceptReal-Time Networking

WebRTC is not a single protocol — it is a stack of protocols working together. Each layer solves a different problem: ICE for network path discovery, DTLS for encryption, SRTP for media transmission, SCTP for data channels.

Prerequisites

UDP basics
NAT and private IP addresses
TLS/DTLS
browser APIs

Key Points

ICE (Interactive Connectivity Establishment): discovers viable network paths between peers using STUN and TURN servers.
DTLS (Datagram TLS): encrypts the connection. All WebRTC media is encrypted — there is no opt-out.
SRTP (Secure RTP): carries audio and video over UDP with timing and sequencing information.
SCTP over DTLS: carries data channel traffic (text, binary). Provides optional reliability per-message.

The connection flow: signaling and ICE

WebRTC connection setup has two phases: signaling (out-of-band negotiation) and ICE (network path discovery).

Phase 1: Signaling

Signaling is how two peers exchange session descriptions before they can communicate. WebRTC does not define the signaling mechanism — you build it using WebSockets, HTTP, or anything else. The content is SDP (Session Description Protocol), which describes codecs, resolutions, and network candidates.

// Caller side
const pc = new RTCPeerConnection(iceConfig);
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

// Send offer.sdp to the other peer via your signaling channel (WebSocket, etc.)
signalingChannel.send(JSON.stringify({ type: 'offer', sdp: offer.sdp }));

// Callee side
await pc.setRemoteDescription({ type: 'offer', sdp: receivedSdp });
const answer = await pc.createAnswer();
await pc.setLocalDescription(answer);

// Send answer back via signaling channel
signalingChannel.send(JSON.stringify({ type: 'answer', sdp: answer.sdp }));

The SDP offer/answer exchange is called the JSEP (JavaScript Session Establishment Protocol) model. It encodes:

Media capabilities: codecs supported (Opus for audio, VP8/H.264 for video)
Directions: sendrecv, sendonly, recvonly
ICE credentials and fingerprints for DTLS authentication

Phase 2: ICE

ICE discovers how to actually reach the remote peer. Both peers gather ICE candidates — possible network paths — and exchange them through the signaling channel.

Candidate types:
  host:      local IP/port (works on the same network)
  srflx:     server-reflexive — the peer's public IP/port as seen from a STUN server
  relay:     a TURN server relay (last resort when direct paths fail)

pc.onicecandidate = (event) => {
    if (event.candidate) {
        // Send this candidate to the remote peer via signaling
        signalingChannel.send(JSON.stringify({
            type: 'candidate',
            candidate: event.candidate
        }));
    }
};

ICE uses connectivity checks (STUN binding requests) to test each candidate pair. The best working path is selected. If direct connection fails due to NAT, it falls back to a TURN relay.

NAT traversal: the hard part

Most browsers are behind NAT — they have private IPs (192.168.x.x, 10.x.x.x) not reachable from the public internet. Two peers behind different NATs cannot simply connect to each other's private IPs.

STUN (Session Traversal Utilities for NAT): a peer contacts a STUN server on the public internet. The server reflects back the peer's public IP and port as it appears to the internet. The peer includes this as a srflx candidate. If both peers have compatible NAT types, they can reach each other's public IP/port directly.

Peer A (behind NAT A) ──→ STUN server → learns public IP:port → includes in SDP
Peer B (behind NAT B) ──→ STUN server → learns public IP:port → includes in SDP

Peer A attempts to connect to Peer B's public IP:port
NAT B: does it recognize this inbound packet?
  Yes (if Peer B sent something to Peer A first) → connection succeeds
  No → connection fails, fall back to TURN

STUN works for the majority of NAT types. Symmetric NAT (common in corporate networks, some mobile carriers) blocks direct connection even with STUN — each outbound connection gets a different external port mapping.

TURN (Traversal Using Relays around NAT): a relay server that both peers connect to outbound. Media flows through the TURN server rather than peer-to-peer. Latency increases (media makes two hops instead of zero), and the TURN server bears the bandwidth cost.

Peer A ──→ TURN server ←── Peer B
           (relay)

TURN is a fallback but a required one — without it, ~10-15% of calls fail (symmetric NAT environments).

⚠TURN server bandwidth costs in production

TURN relays full bidirectional media. A video call at 1 Mbps produces 2 Mbps through the TURN server (inbound + outbound). At scale, TURN bandwidth costs dominate WebRTC infrastructure costs.

Rough estimate: 10,000 concurrent video calls, 15% needing TURN, 1 Mbps average = 1,500 concurrent Mbps = 1.5 Gbps through TURN. At AWS data transfer rates (~$0.09/GB), that is significant.

Strategies to control TURN costs:

Run TURN on AWS EC2 (coturn is the standard open-source server) in the same region as users
Set TURN credentials with short expiry (30 min) to prevent credential reuse
Monitor relay vs srflx candidate usage — high relay % means network conditions or NAT issues worth investigating

Media transport: why UDP

WebRTC uses UDP for media (via SRTP) rather than TCP. For real-time audio and video, stale data is worse than missing data. A retransmitted video frame from 100ms ago is useless — the frame deadline has passed. Better to display a degraded frame using forward error correction than to wait for a retransmit.

TCP's head-of-line blocking is especially damaging for real-time media: a single lost packet holds up all subsequent data in the stream until the retransmit arrives.

For the data channel (arbitrary data between peers), WebRTC uses SCTP over DTLS. SCTP supports per-message reliability settings — you can send messages unreliably (like UDP) or reliably (like TCP) on the same channel, depending on the use case.

// Create a data channel with optional reliability settings
const reliableChannel = pc.createDataChannel('chat', {
    ordered: true,
    // No maxRetransmits = reliable delivery
});

const unreliableChannel = pc.createDataChannel('game-state', {
    ordered: false,
    maxRetransmits: 0  // fire-and-forget, like UDP
});

RTP and adaptive bitrate

Audio and video travel in RTP (Real-time Transport Protocol) packets. Each RTP packet carries:

Payload type (which codec)
Sequence number (detect lost packets)
Timestamp (synchronize audio/video playback)
SSRC (identifies the stream source)

The receiving peer sends RTCP feedback: reception reports (packet loss rate, jitter), PLI (picture loss indication — request a keyframe), and REMB/TWCC (bandwidth estimation).

The sender uses this feedback to adjust bitrate in real time. Packet loss spikes → reduce bitrate. Bandwidth headroom → increase bitrate. This adaptive bitrate control (simulcast + SFU for multi-party calls, or direct between peers for 1:1) is what makes WebRTC calls usable on variable networks.

Direct WebRTC (P2P) vs SFU (Selective Forwarding Unit)

For 1:1 calls, direct P2P is simple and has no server bandwidth cost for media. For group calls, an SFU becomes necessary to avoid the N² media stream problem.

Direct P2P

Media goes directly between browsers — no server bandwidth for media
Works for 1:1 or very small groups
N participants = N-1 upload streams per participant
5+ participants: browser upload bandwidth becomes the bottleneck
No server-side media processing (recording, transcoding)

SFU (Janus, mediasoup, Livekit)

Each participant uploads once to the SFU; SFU forwards to each recipient
Supports large groups (50+ participants)
Enables simulcast: participants send multiple quality layers, SFU forwards the right layer per recipient
Server bears bandwidth cost; enables recording and transcoding
More infrastructure complexity and cost

Verdict

For 1:1 calls: P2P. For 3-4 participants: P2P is still viable if upload bandwidth is sufficient. For 5+ participants: SFU is required. Open-source SFUs (mediasoup, Janus) or managed services (Livekit, Daily.co) are both reasonable options depending on operational preference.

In a WebRTC call between two users, one behind a home router and one on a corporate network, the call works for the home user but fails to connect for users on the corporate network. What is the most likely cause?

medium

The application uses STUN for NAT traversal but has not configured a TURN server. Both users are on modern browsers.

AWebRTC is blocked by HTTPS requirement on corporate networks
Incorrect.WebRTC requires HTTPS for getUserMedia, but this applies to both networks equally. It would not explain why corporate networks specifically fail.
BCorporate firewalls use symmetric NAT, which STUN cannot traverse
Correct!Symmetric NAT assigns a different external port for each outbound connection destination. STUN discovers the external address for the STUN server connection, but that port mapping is only valid for traffic to the STUN server — not to the peer. Without TURN, the ICE negotiation fails to find a working path. Corporate NAT is commonly symmetric; home routers typically use full-cone or restricted-cone NAT which STUN can traverse.
CCorporate networks block UDP entirely, so RTP cannot be established
Incorrect.Some corporate networks do block UDP, which would require TURN over TCP (port 443). But the question states TURN is not configured — the more fundamental issue is the absence of a fallback relay.
DThe SDP offer is rejected because corporate browsers have different codec support
Incorrect.Codec negotiation failures result in media not flowing after connection, not in connection failures. And codec support in modern browsers is largely standardized.

Hint:Think about what STUN can and cannot do for different NAT types, and what the fallback mechanism would be.