You have 200 microservices. Each produces metrics. How do those metrics reach your monitoring system? Two fundamentally different approaches, and the choice affects service discovery, failure modes, and scalability.

Pull Model (Prometheus-Style)#

Each service exposes a /metrics endpoint. The monitoring system knows about all services (via service discovery) and scrapes each one on a schedule: every 15 seconds, hit /metrics, parse the response, store the data.

// Spring Boot Actuator exposes metrics automatically
// GET /actuator/prometheus returns:
// http_server_requests_seconds_count{method="GET",uri="/api/users"} 1523
// http_server_requests_seconds_sum{method="GET",uri="/api/users"} 45.7
// jvm_memory_used_bytes{area="heap"} 234881024

The collector is in control. It decides how often to scrape, what to scrape, and can detect when a service is down (scrape fails). If a service is unhealthy, the collector knows immediately.

Push Model (StatsD/Datadog Style)#

Each service actively sends metrics to the collector. Fire and forget. The service calls the collector’s API or sends a UDP packet.

// Push metrics from the service
public void recordRequest(String endpoint, long durationMs) {
    metricsClient.gauge("request.duration", durationMs,
        "endpoint:" + endpoint, "service:order-service");
    metricsClient.increment("request.count",
        "endpoint:" + endpoint, "service:order-service");
}

The service is in control. It decides what to report and when. No service discovery needed on the collector side. Services behind firewalls or NATs can push out without the collector needing inbound access.

graph TD subgraph Pull Model C1["Collector"] -->|Scrape /metrics| S1["Service A"] C1 -->|Scrape /metrics| S2["Service B"] C1 -->|Scrape /metrics| S3["Service C"] end subgraph Push Model S4["Service A"] -->|Send metrics| C2["Collector"] S5["Service B"] -->|Send metrics| C2 S6["Service C"] -->|Send metrics| C2 end style C1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S1 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S3 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C2 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S4 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S5 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style S6 fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

Trade-offs That Actually Matter#

Service discovery is the big one. Pull requires the collector to know where every service instance lives. In Kubernetes, this is straightforward (pod discovery). In a static environment with services behind load balancers, it’s harder.

Failure detection: pull gives you this for free. If the scrape fails, the service is probably down. Push requires a separate mechanism: “if we haven’t received metrics from service X in 60 seconds, alert.”

Backpressure behaves differently too. With pull, if the collector is overwhelmed, it just scrapes less frequently. No data loss on the service side, the metrics are still there on /metrics. With push, an overwhelmed collector drops incoming metrics. Services might not even know their data was lost.

Short-lived jobs are awkward with pull. A batch job runs for 30 seconds and terminates. If the collector scrapes every 15 seconds, it might miss the job entirely. Push handles this naturally: the job sends its metrics before exiting.

In Practice: Both#

Most production systems use both. Pull for long-running services (the Prometheus model works well here). Push for batch jobs, lambdas, and anything ephemeral. A push gateway bridges the gap: short-lived jobs push to the gateway, the collector scrapes the gateway.

At Salesforce, we chose between push and pull for our config validation results. The validator ran as a batch process across 4,000+ service configs. Initially we used push: each validation result was sent to the collector as it completed. The problem? When the validator processed a large batch, it would overwhelm the metrics collector with thousands of results in seconds. Classic backpressure problem. Switched to a hybrid: the validator exposes a /validation-status endpoint with aggregated results (pass/fail counts, latest run timestamp). The collector scrapes it every 30 seconds. Individual failure details are still pushed, but only failures, which are rare thanks to the validation framework (80% reduction in review cycles means fewer failures to report).

What I’m Learning#

The push vs pull decision seems small but ripples through your entire monitoring stack. Pull gives you better control and failure detection. Push gives you better firewall friendliness and short-lived job support. Most teams end up using both. The key insight: the choice affects how you handle failure in your monitoring itself. If your monitoring can’t monitor, you’re flying blind.

Does your team use push, pull, or both for metrics collection?