Alerts¶

Since the control plane has access to real-time PgDog metrics, it can detect outliers and trigger automatic alerts.

How it works¶

The control plane evaluates the metrics it receives from all PgDog instances on a continuous loop. When they exceed pre-configured thresholds, an incident is created using one of the configured integrations.

Integrations¶

The following incident management providers are supported:

Provider	Description
incident.io	The control plane will create an incident with the highest available priority.

Configuration¶

Alerts are disabled by default. To enable them, configure a provider's API key and set thresholds for each supported metric you would like to monitor:

values.yaml

control:
  config:
    alerts:
      evaluation_window_secs: 300
      thresholds:
        clients_waiting: 10
        cpu: 90.0
        memory: 2048
        server_connections: 100
      incident_io:
        api_key: inc_live_xxx

Parameters¶

Disabled by default

Thresholds have no defaults. If a threshold is not configured, its metric will not be monitored.

The following parameters are configurable via the Helm chart:

Parameter	Description	Unit
`evaluation_window_secs`	The metrics evaluation window. Metrics are averaged over this period to produce an alert signal.	Seconds, e.g., `300` (5 minutes)
`clients_waiting`	Average number of clients waiting for a connection from a pool.	Clients, e.g., `10`
`cpu`	Average CPU utilization of each PgDog pod.	Percentage, e.g., `90.0` (90%)
`memory`	Average memory utilization of each PgDog pod.	MiB, e.g., `2048` (2048 MiB)
`server_connections`	Average number of open connections from PgDog to Postgres.	Connections, e.g., `100`

Evaluation window¶

Longer evaluation windows reduce the chance of false positives in spiky workloads. Shorter windows will be evaluated quicker and will trigger faster notifications in case of an incident.