Understanding and Reducing False Positives

Understanding and Reducing False Positives

A false positive is an alert that fires when your service is actually healthy — a check failed briefly but the service recovered on its own. False positives erode trust in your monitoring because teams start ignoring alerts.

This article explains what causes false positives and what you can do to reduce them.


What Causes False Positives

Transient Network Issues

The internet has brief, random failures. A single packet loss, a DNS hiccup, or a momentary routing problem can cause one check to fail even when your service is perfectly healthy. These failures typically last one check cycle and then disappear.

Server Hiccups

Application servers sometimes take slightly longer to respond during garbage collection, connection pool saturation, or brief CPU spikes. A single check during that window may time out even though the service recovers immediately.

DNS Propagation

If you recently updated DNS records, different resolvers may return different results temporarily. Checks from PulseAPI's infrastructure may hit a resolver that hasn't propagated the new record yet.

Deployments and Restarts

During a rolling deploy or server restart, there's a brief window where some checks may fail as processes come up and down.


How to Reduce False Positives

Option 1: Use a Status Code Check (Most Effective)

If you're alerting on response time thresholds, consider whether the alert is necessary at the configured threshold. A single spike to 2.5s doesn't necessarily mean your service is down. Consider raising the threshold or switching to a status code rule (which only fires on actual HTTP failures, not latency spikes).

Option 2: Adjust the Rule Cooldown

If a false positive fires and then immediately recovers, the cooldown period on your alert rule determines how quickly a notification is sent. A short cooldown (e.g., 1 minute) will always notify on the first failure. Consider whether the trade-off is acceptable for your team.

Option 3: Use Uptime Percentage Rules Instead of Per-Check Rules

Instead of alerting on every failed check, create a rule that fires when uptime drops below a percentage over a time window (e.g., uptime < 99% over the last 24 hours). This smooths out transient blips — you only get alerted if there's a sustained degradation, not a one-off hiccup.

To create this type of rule, see Alert Rule Conditions: Uptime Percentage.

Option 4: Use the Alert Accuracy Dashboard (Pro/Team)

On Professional and Team plans, the Alert Accuracy Dashboard tracks your false positive rate over time and can suggest threshold adjustments. You can also use the Auto-Tune feature to let PulseAPI automatically adjust thresholds based on your historical data.

See Alert Accuracy Dashboard and Auto-Tuning Alert Thresholds.


When to Accept Some False Positives

For mission-critical services, it's often better to accept a low false positive rate than to miss a real failure. Tune aggressively if you need to, but don't tune so aggressively that real incidents go undetected.

The right balance depends on your tolerance for alert noise vs. your tolerance for missing real failures. High-severity rules (Critical, High) should have tighter thresholds and accept some false positives. Lower-severity rules can be tuned more conservatively.


Related Articles


Still have questions? Contact support.

    • Related Articles

    • My Monitor Shows Down But My Site Is Up

      If PulseAPI reports your monitor as down but you can access the site normally in your browser, there are several common explanations. This article walks through the most likely causes. Most Common Causes 1. PulseAPI's Check Location vs. Your Location ...
    • Glossary of Terms

      This glossary defines the key terms used throughout PulseAPI and this Help Center. Alert Rule A condition you configure that tells PulseAPI when to create an incident and send a notification. Each rule has a condition type (status code, response ...
    • Understanding Response Times

      Every check PulseAPI performs records how long the HTTP request took. This article explains what response time means, what affects it, and how to interpret the data. What PulseAPI Measures Response time is measured from the moment PulseAPI initiates ...
    • Understanding Teams

      A team is the top-level workspace in PulseAPI. All monitors, projects, incidents, alert rules, and notification channels belong to a team. This article explains what teams are and when to use multiple teams. What a Team Is When you create a PulseAPI ...
    • Understanding Uptime Percentage

      Uptime percentage is the most commonly used metric for describing service reliability. This article explains how PulseAPI calculates it and how to interpret the numbers. How Uptime Percentage Is Calculated Uptime percentage = (successful checks ÷ ...