VOIPQUALITYTECHNICAL

VoIP Quality: Jitter, Latency & MOS Explained

SIPNEX ·

When your agents say “the calls sound bad” or customers complain about choppy audio, the problem is almost never your dialer software, your carrier’s network, or your phone system. It is the network path between your system and the carrier — specifically the jitter, latency, and packet loss on that path. Understanding these three metrics and knowing how to measure and fix them is the difference between an operation that sounds professional and one that loses conversations to audio problems.

SIPNEX is an FCC-licensed carrier that provides SIP trunks optimized for high-volume calling. Our media infrastructure is built for predictive dialer workloads with hundreds of concurrent calls. But our network quality only matters from our edge to the PSTN — the path from your server to our edge is your network, and that is where most quality problems originate.

The three metrics that determine call quality

Latency is the time it takes for a voice packet to travel from your system to the carrier (one-way) or from your system to the carrier and back (round-trip). Measured in milliseconds (ms). For voice, one-way latency under 150ms is acceptable. Under 80ms is good. Over 200ms creates noticeable delay — the conversation starts to feel like a satellite call where you talk over each other because of the delay.

Latency is caused by physical distance (packets traveling across the country take longer than across the city), routing hops (each router in the path adds processing delay), congestion (packets queue at overloaded routers), and encoding/decoding time (the codec processing on each end).

Jitter is the variation in latency between packets. If packet 1 arrives in 30ms, packet 2 in 45ms, and packet 3 in 25ms, the jitter is the variance — 20ms in this example. Voice communication requires packets to arrive at regular intervals. Jitter disrupts this regularity, causing the receiving end’s jitter buffer to either drop packets (causing audio gaps) or introduce additional delay to smooth out the variation.

Jitter under 30ms is acceptable for voice. Under 15ms is good. Over 50ms produces noticeable audio quality degradation — choppy speech, words cut off, robotic-sounding audio.

Packet loss is the percentage of voice packets that are sent but never arrive at the destination. Voice uses UDP (User Datagram Protocol), which does not retransmit lost packets. When a voice packet is lost, it is gone — the audio for that 20ms frame is missing. The receiving end either inserts silence (noticeable gap) or uses packet loss concealment (PLC) algorithms to interpolate the missing audio (works for occasional loss, degrades at higher rates).

Packet loss under 1 percent is acceptable. Under 0.5 percent is good. Over 3 percent produces clearly degraded audio — gaps, stuttering, and the audio quality that people associate with “bad VoIP.”

MOS score: the single quality number

MOS (Mean Opinion Score) is a composite metric that expresses voice quality on a scale from 1 (unintelligible) to 5 (excellent). Originally, MOS was determined by having humans listen to call samples and rate them. Today, MOS is calculated algorithmically using models like PESQ (Perceptual Evaluation of Speech Quality) or POLQA (Perceptual Objective Listening Quality Assessment) that simulate human perception based on the audio signal characteristics.

For VoIP, the most commonly referenced MOS variant is the estimated MOS-CQ (Conversational Quality), calculated from network metrics:

  • MOS 4.0 to 4.5: Excellent. Equivalent to landline quality or better. Achievable on G.711 codec with good network conditions (latency under 80ms, jitter under 15ms, packet loss under 0.5%).
  • MOS 3.5 to 4.0: Good. Acceptable for business communication. Some listeners may notice slight quality reduction compared to landline. Typical for G.729 codec or G.711 with moderate network impairment.
  • MOS 3.0 to 3.5: Fair. Noticeable quality reduction. Calls are intelligible but listeners are aware of degradation. Common on networks with 1 to 2 percent packet loss or jitter above 30ms.
  • Below 3.0: Poor. Significant degradation — choppy audio, missing words, difficulty understanding the speaker. Calls at this quality level damage your professional image and reduce conversion rates.

For call center operations, target MOS 4.0 or above. Agent conversations at MOS 3.5 are intelligible but sound unprofessional. At MOS 3.0, agents start losing calls because customers hang up due to poor audio.

Codec impact on quality

The codec you use determines the baseline audio quality before network effects:

G.711u (ulaw): 64 kbps uncompressed PCM audio. MOS baseline: 4.4 (excellent). This is the standard for call centers and the codec SIPNEX recommends. Each call uses approximately 85 kbps with IP overhead. The highest quality and the best for AMD accuracy on VICIdial.

G.729: 8 kbps compressed audio. MOS baseline: 3.9 (good). Uses about 32 kbps with overhead. Saves bandwidth at the cost of slightly lower audio quality. The compression artifacts are subtle but can affect AMD accuracy and may be noticeable to listeners on extended calls. Use only when bandwidth is severely constrained.

Opus: Adaptive bitrate from 6 to 128 kbps. At higher bitrates, Opus exceeds G.711 quality. At lower bitrates, it degrades gracefully. Not universally supported on all carrier and PBX platforms, but increasingly common. SIPNEX supports Opus where your system does.

Diagnosing quality problems

When call quality degrades, diagnose systematically:

Step 1: Measure your network. Run a VoIP-specific quality test (not a generic speed test — you need jitter, packet loss, and latency measurements, not just download/upload speed). Tools: run ping with 100+ packets to your carrier’s SIP proxy to measure latency and packet loss. Use iperf for bandwidth and jitter testing. Use a VoIP-specific testing service that simulates real call traffic.

Step 2: Check if the problem is consistent or intermittent. Consistent degradation (every call sounds bad) usually indicates a bandwidth or configuration problem. Intermittent degradation (some calls bad, others fine) usually indicates network congestion, jitter, or a routing issue that affects some paths but not others.

Step 3: Identify the segment. Is the problem between your server and your internet provider? Between your ISP and the carrier? Between the carrier and the terminating network? Use traceroute to your carrier’s SIP proxy to identify where latency or packet loss increases along the path. If the degradation is on your local network, it is your problem to fix. If it is between ISPs, you may need a different internet provider or a dedicated voice circuit.

Step 4: Check for bandwidth saturation. If your internet connection is shared between voice and data traffic, data usage (file uploads, video streaming, backups) can consume bandwidth and starve voice packets. Monitor your bandwidth utilization during quality events. If utilization exceeds 70 to 80 percent of your connection capacity during the bad periods, bandwidth saturation is the cause.

Fixing quality problems

Implement QoS (Quality of Service). Configure your router to prioritize voice traffic (SIP and RTP packets) over data traffic. QoS ensures that voice packets are processed first, even when the connection is congested. This is the single most effective fix for quality problems on shared internet connections.

Dedicate bandwidth for voice. If QoS is not sufficient, dedicate a separate internet connection for voice traffic. A 10 to 20 Mbps dedicated connection for a 100-agent operation with G.711 is modest and inexpensive compared to the cost of lost conversations due to poor audio.

Reduce jitter. Jitter is most commonly caused by network congestion and competing traffic. QoS helps. Wired connections (Ethernet) produce less jitter than wireless (WiFi). If your VoIP equipment is on WiFi, switch to wired Ethernet. If your connection traverses many routing hops, consider a more direct path (different ISP, or a network provider that offers SIP-optimized routing).

Address packet loss. Packet loss above 1 percent on a wired connection usually indicates a hardware problem (failing network interface, bad cable, overloaded router/switch) or ISP issue. Run packet loss tests during quality events and compare to off-hours baselines. If loss is consistent, check physical infrastructure. If loss is only during peak hours, it is congestion-related (ISP or local network).

Check codec settings. Verify your PBX is using the codec you expect. Codec mismatches or unwanted transcoding can degrade quality. On VICIdial with SIPNEX, verify allow=ulaw is first in your sip.conf codec order and that no transcoding is occurring at the carrier level (SIPNEX passes through G.711 natively).

Frequently asked questions

What is a good MOS score for VoIP?

For call center operations, target MOS 4.0 or above. MOS 4.0 to 4.5 is excellent — equivalent to landline quality. MOS 3.5 to 4.0 is good — acceptable for business communication. Below 3.5, callers notice quality degradation. Below 3.0, audio quality is poor enough to impact conversations and conversion rates. MOS is influenced by codec choice (G.711 = 4.4 baseline, G.729 = 3.9 baseline), network conditions (latency, jitter, packet loss), and equipment quality. With G.711 on a network with under 80ms latency, under 15ms jitter, and under 0.5% packet loss, MOS will consistently be above 4.0.

What causes choppy VoIP audio?

Choppy audio is caused by jitter (variation in packet arrival timing) and packet loss. When voice packets arrive irregularly, the receiving end’s jitter buffer either drops late packets (creating gaps) or introduces buffering delay. When packets are lost entirely, the audio for those frames is missing. Common causes: shared internet connection with competing data traffic (solution: QoS), WiFi interference (solution: wired Ethernet), ISP congestion during peak hours (solution: dedicated voice connection), and network equipment issues (solution: check router, switch, cables). Measure jitter and packet loss to diagnose — jitter over 30ms or loss over 1% will produce audible degradation.

How much bandwidth do I need per VoIP call?

G.711 codec: approximately 85 kbps per call (64 kbps audio + 21 kbps IP/UDP/RTP overhead). Plan for 100 kbps per call to account for signaling and other overhead. G.729: approximately 32 kbps per call. Multiply by your peak concurrent call count: 100 concurrent G.711 calls = 10 Mbps dedicated to voice in each direction. More important than raw bandwidth is quality — a 50 Mbps connection with 3% packet loss is worse for voice than a 10 Mbps connection with 0.1% packet loss. Prioritize connection quality (low jitter, low loss, low latency) over raw speed.

Does my carrier affect VoIP quality?

Your carrier affects quality on the path from their network to the PSTN — codec support, media gateway capacity, peering arrangements, and network architecture all matter. SIPNEX’s media infrastructure is built for high-concurrency predictive dialing with G.711 pass-through (no transcoding) and low PDD. However, the path from your server to the carrier’s edge is your network, and that is where most quality problems originate. If your call quality is poor, diagnose your local network first (bandwidth, jitter, packet loss, QoS configuration) before investigating carrier-side issues. If local metrics are good and quality is still poor, contact your carrier to investigate their media path.


SIPNEX provides SIP trunks with carrier-grade media infrastructure — G.711 pass-through, no transcoding, low-latency media paths optimized for high-concurrency predictive dialing. We handle the carrier side. You handle the local network. Get started or see our rates.

SIPNEX

FCC-licensed carrier with its own STIR/SHAKEN SP certificate. Operator-owned. SIP trunks built for operators who dial at volume.