mirror of
https://github.com/louislam/uptime-kuma.git
synced 2026-03-02 22:57:00 -05:00
Race condition in push monitors #937
Labels
No labels
A:accessibility
A:api
A:cert-expiry
A:core
A:dashboard
A:deployment
A:documentation
A:domain expiry
A:incidents
A:maintenance
A:metrics
A:monitor
A:notifications
A:reports
A:settings
A:status-page
A:ui/ux
A:user-management
Stale
ai-slop
blocked
blocked-upstream
bug
cannot-reproduce
dependencies
discussion
duplicate
feature-request
feature-request
good first issue
hacktoberfest
help
help wanted
house keeping
invalid
invalid-format
invalid-format
question
releaseblocker 🚨
security
spam
type:enhance-existing
type:new
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/uptime-kuma#937
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kaysond on GitHub (Mar 27, 2022).
⚠️ Please verify that this bug has NOT been raised before.
🛡️ Security Policy
Description
There is a race condition in the code for push monitors that can cause repeat notifications within the same second or shortly thereafter. (see some background in #922)

http logs show that the GET calls are generally 60s apart, but because of discrepancies in system time down to the ms level, it can appear that they happen a second apart, even if they're very close together (ie.e 17:58:05.999 to 17:59:06.000)
👟 Reproduction steps
Repro is tricky, but it could be possible by monitoring uptime kuma debug logs and synchronizing to them. Basically, the push URL calls have to happen just shortly after the monitor's calls to
beat()orsafe_beat()👀 Expected behavior
Uptime kuma should be robust against very small clock discrepancies
😓 Actual Behavior
Get double notifications
🐻 Uptime-Kuma Version
1.12.1
💻 Operating System and Arch
Docker on Ubuntu 20.04
🌐 Browser
Firefox
🐋 Docker Version
Latest
🟩 NodeJS Version
No response
📝 Relevant log output
No response
@kaysond commented on GitHub (Mar 27, 2022):
The race condition is caused because the monitor is running on a
setTimeoutthat is totally independent of the call of the push URL. That means thatbeat()androuter.get("/api/push/:pushToken")can both get called at the same time in seconds (e.g. both at 17:15:02), but ifbeat()happens first, it will mark the monitor as down, then it will immediately mark it as up whenrouter.get()is run.This is partly an issue because
setTimeoutis not a guaranteed execution time. Node will run the function at least as late as the timeout, but it could be later. This means that the time at whichbeat()/safeBeat()are run will drift! Suppose we start withbeat()being called at 16:15:30, 16:16:30, and so on, androuter.get()being called at 16:15:50, 16:16:50, and so on. This won't cause any issues because every call tobeat()will see the previousheartbeat.Because of the behavior of
setTimeout,beat()will start being called later and later (or if you just have bad luck the calls could get synchronized somehow on startup). At some point,beat()will also start getting called at xx:xx:50, causing the problem.github.com/louislam/uptime-kuma@12237dec6e/server/model/monitor.js (L309-L329)github.com/louislam/uptime-kuma@12237dec6e/server/model/monitor.js (L468-L475)github.com/louislam/uptime-kuma@12237dec6e/server/routers/api-router.js (L20-L84)I would recommend following the approach of healthchecks. Healthchecks waits for the first "ping" (aka push URL call), then 60s after that ping, checks again to make sure another ping has happened. There is also a grace period to accommodate clock differences. For example, if you have a 5s grace time, the monitor won't be marked as down unless 65s has elapsed since the last ping.
I think the right thing to do, though it may be a little difficult with the way you have the architecture set up, is to do the first
setTimeoutonmonitor.start(), but every othersetTimeoutshould be run fromrouter.get("/api/push:pushToken"). The timeout would be set to interval + grace period, likesetTimeout(safebeat, (interval + gracePeriod)*1000).@kaysond commented on GitHub (Apr 11, 2022):
technically this is still not finished yet until #1428 is finished and gets merged!
@louislam commented on GitHub (Apr 11, 2022):
It was closed automatically. Sounds horrible because it could be cross repo.