Graceful Shutdown: Dying Without Dropping Requests

You deploy a new version. Kubernetes kills the old pod. A user’s request was mid-flight. They see a 502.

Your deploy just caused an error. Not a bug in your code. Just bad timing.

Graceful shutdown is the fix. Stop accepting new work, finish what you started, then die.

The Kill Sequence#

When Kubernetes (or any orchestrator) wants to stop your pod:

SIGTERM: “Please shut down.” Your app should start cleanup.
Grace period: Default 30 seconds. Time to finish up.
SIGKILL: “Die now.” Forced termination. No cleanup possible.

If you ignore SIGTERM, you get 30 seconds of doing nothing, then a hard kill. In-flight requests die mid-execution.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#000000','primaryTextColor':'#00ff00','primaryBorderColor':'#00ff00','lineColor':'#00ff00','secondaryColor':'#000000','tertiaryColor':'#000000'}}}%% sequenceDiagram autonumber participant K as Kubernetes participant App as Your App participant LB as Load Balancer K->>App: SIGTERM Note over App: Stop accepting new requests K->>LB: Remove from endpoints Note over App: Finish in-flight requests Note over App: Close connections, flush buffers App->>K: Exit 0 Note over K: If timeout: SIGKILL

The Race Condition#

Here’s what catches people: SIGTERM and load balancer update happen in parallel. Not sequentially.

Kubernetes sends SIGTERM to your pod. Kubernetes also starts removing the pod from the Service endpoints. But the load balancer might still send traffic for a few seconds while it catches up.

If your app stops accepting connections immediately on SIGTERM, those last few requests get connection refused.

The fix: preStop hook.

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

The preStop hook runs before SIGTERM. A 5-second sleep gives the load balancer time to stop sending traffic. Then SIGTERM fires. Then you shut down.

It’s a hack. But it works.

Spring Boot Graceful Shutdown#

Spring Boot 2.3+ has built-in support:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

When SIGTERM arrives:

Stop accepting new requests
Wait up to 30 seconds for in-flight requests to complete
Shut down

@Bean
public GracefulShutdown gracefulShutdown() {
    return new GracefulShutdown();
}

// Or handle it yourself
@PreDestroy
public void onShutdown() {
    log.info("Shutting down, finishing in-flight work...");
    // flush queues, close connections, etc.
}

What to Clean Up#

In-flight HTTP requests. Let them finish. Don’t kill mid-response.

Database connections. Return them to the pool. Close the pool cleanly.

Message consumers. Stop polling for new messages. Finish processing current ones. Commit offsets.

Scheduled tasks. Don’t start new ones. Let running ones complete.

Background threads. Signal them to stop. Wait for them. Don’t leave work half-done.

@PreDestroy
public void cleanup() {
    // Stop accepting new work
    executor.shutdown();

    // Wait for in-flight work
    try {
        if (!executor.awaitTermination(25, TimeUnit.SECONDS)) {
            executor.shutdownNow();
        }
    } catch (InterruptedException e) {
        executor.shutdownNow();
    }

    // Close resources
    dataSource.close();
}

The Timeout Trap#

Your grace period is 30 seconds. Your longest request takes 60 seconds.

Those requests will never finish. They’ll get SIGKILL at 30 seconds.

Options:

Increase grace period. Match your longest expected request.
Add request timeouts. Don’t let requests run forever.
Accept some loss. For very long operations, maybe a hard cut is okay.

terminationGracePeriodSeconds: 60

At Oracle, we had a batch endpoint that could run for minutes. We moved it to a separate deployment with a 5-minute grace period. Regular API pods kept the 30-second default.

Health Check Coordination#

One more piece: tell the load balancer you’re unhealthy before you stop.

private volatile boolean shuttingDown = false;

@GetMapping("/health/ready")
public ResponseEntity<String> readiness() {
    if (shuttingDown) {
        return ResponseEntity.status(503).body("Shutting down");
    }
    return ResponseEntity.ok("Ready");
}

@PreDestroy
public void startShutdown() {
    shuttingDown = true;
    // Wait for LB to notice
    Thread.sleep(5000);
}

Readiness probe fails. Kubernetes stops sending traffic. Then you shut down. Clean.

What I’m Learning#

Graceful shutdown is where all the pieces connect. Connection pools, timeouts, health checks, load balancing. They all matter when you’re trying to die cleanly.

The insight that stuck: deployments are controlled failures. Every deploy kills processes. If you’re not handling that gracefully, you’re dropping requests on every release. Your “zero downtime deployment” has downtime you’re just not measuring.

How often do you deploy? Have you ever measured errors during deployments?