- 首先,创建一个包含时间序列数据的表。例如,创建一个名为 time_series_data 的表,其中包含时间戳(timestamp)和数值(value)两个字段:
CREATE TABLE time_series_data (
timestamp TIMESTAMP NOT NULL,
value NUMERIC NOT NULL
);
- 使用窗口函数计算移动平均值和移动标准差。例如,可以使用以下查询计算每个数据点的 5 个时间单位的移动平均值(moving_avg)和移动标准差(moving_stddev):
SELECT
timestamp,
value,
AVG(value) OVER (ORDER BY timestamp ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg,
STDDEV(value) OVER (ORDER BY timestamp ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_stddev
FROM
time_series_data;
- 计算每个数据点的 Z 分数(z_score),以确定其与移动平均值的距离。可以使用以下查询计算 Z 分数:
WITH moving_stats AS (
SELECT
timestamp,
value,
AVG(value) OVER (ORDER BY timestamp ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg,
STDDEV(value) OVER (ORDER BY timestamp ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_stddev
FROM
time_series_data
)
SELECT
timestamp,
value,
(value - moving_avg) / moving_stddev AS z_score
FROM
moving_stats;
- 根据 Z 分数判断数据点是否异常。例如,可以认为 Z 分数大于 3 或小于 - 3 的数据点是异常值。可以使用以下查询筛选出异常数据点:
WITH moving_stats AS (
SELECT
timestamp,
value,
AVG(value) OVER (ORDER BY timestamp ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg,
STDDEV(value) OVER (ORDER BY timestamp ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_stddev
FROM
time_series_data
),
z_scores AS (
SELECT
timestamp,
value,
(value - moving_avg) / moving_stddev AS z_score
FROM
moving_stats
)
SELECT
timestamp,
value,
z_score
FROM
z_scores
WHERE
z_score > 3 OR z_score < -3;
- 这个方法基于统计学原理,假设时间序列数据符合正态分布。你可以根据实际情况调整移动窗口的大小(例如,改变 ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)以及异常判断的阈值(例如,改变 z_score > 3 OR z_score < -3)。