I build a VERY simple model with only a time series and a data field of always 1 to find anomalies.
CREATE OR REPLACE MODEL `mytest.dummy`
OPTIONS(
model_type='arima_plus',
TIME_SERIES_DATA_COL='cnt',
TIME_SERIES_TIMESTAMP_COL='ts',
DATA_FREQUENCY='HOURLY',
DECOMPOSE_TIME_SERIES=TRUE
)
AS
select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts;
Model works fine unless I use a custom select query to find anomalies. Even if the query is the exact same that was used to create the model.
SELECT *
FROM ML.DETECT_ANOMALIES(
MODEL `mytest.dummy`,
STRUCT (0.9 AS anomaly_prob_threshold),
(select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts)
)
Result
| Row | ts | cnt | is_anomaly | lower_bound | upper_bound | anomaly_probability |
|---|---|---|---|---|---|---|
| 1 | 2022-05-01 00:00:00 UTC | 1.0 | null | null | null | null |
| 2 | 2022-05-01 01:00:00 UTC | 1.0 | null | null | null | null |
| 3 | .... |
Does anyone know what I need to do to get expected results of is_anomaly = false.
CodePudding user response:
After a closer look into the documentation I found out that anomaly detection works only outside of the training range - at least for new queries and only as far as as the HORIZONgoes (by the time of writing the default is 1.000).
Historical data can be also classified, but only without a query and only if the parameter DECOMPOSE_TIME_SERIES is set to true.
The example above would look like this:
CREATE OR REPLACE MODEL `mytest.dummy`
OPTIONS(
model_type='arima_plus',
TIME_SERIES_DATA_COL='cnt',
TIME_SERIES_TIMESTAMP_COL='ts'
)
AS
select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-01', '2022-05-02', INTERVAL 1 HOUR)) as ts;
The query with the next days
SELECT *
FROM ML.DETECT_ANOMALIES(
MODEL `mytest.dummy`,
STRUCT (0.9 AS anomaly_prob_threshold),
(select ts, 1 cnt
from UNNEST(GENERATE_TIMESTAMP_ARRAY('2022-05-03', '2022-05-04', INTERVAL 1 HOUR)) as ts)
)
Result
| Row | ts | cnt | is_anomaly | lower_bound | upper_bound | anomaly_probability |
|---|---|---|---|---|---|---|
| 1 | 2022-05-03 00:00:00 UTC | 1.0 | false | 1.0 | 1.0 | 0.0 |
| 2 | 2022-05-04 01:00:00 UTC | 1.0 | false | 1.0 | 1.0 | 0.0 |
| 3 | .... |
