WebSocket Traffic Management on EKS
WebSocket Traffic Management on EKS
WebSocket provides a persistent connection between the server and client. This is different from protocols working with "request-response" logic like HTTP and requires special attention regarding scaling.
Challenges and solutions when managing WebSocket traffic on Amazon EKS:
1. Load Balancer Setting (Stickiness)
During WebSocket connection establishment (Handshake), client and server must shake hands. If the Load Balancer sends traffic to another pod in the middle of the handshake, the connection drops.
- Solution: You can ensure a client always talks to the same pod by enabling Sticky Sessions (Session Affinity) on Application Load Balancer (ALB). However, since WebSocket is inherently a single TCP connection, stickiness is not needed after the connection is established.
2. Connection Timeouts
Load Balancers usually have a default idle timeout of 60 seconds. If no data flows over WebSocket for 60 seconds, LB cuts the connection.
- Solution: Increase ALB Idle Timeout (e.g., 3600 seconds). Also, keep the line alive by setting up a "Heartbeat" (Ping/Pong) mechanism in your application.
3. Scaling Difficulty
It is hard to shut down (scale down) a WebSocket server (Pod A) with 10k users and move users to Pod B. Because connections are live.
- Solution: When the pod is shutting down (receives
SIGTERMsignal), it should stop accepting new connections, wait for existing connections to finish, or direct them to other pods by sending a "Disconnect and Reconnect" message to clients.
4. Ingress Annotations
If using NGINX Ingress, add these annotations for WebSocket support:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
WebSocket is indispensable for the modern web and can support millions of concurrent connections with correct configuration on EKS.