mirror of
https://github.com/louislam/uptime-kuma.git
synced 2026-03-02 22:57:00 -05:00
Make RADIUS timeout value configurable #2203
Labels
No labels
A:accessibility
A:api
A:cert-expiry
A:core
A:dashboard
A:deployment
A:documentation
A:domain expiry
A:incidents
A:maintenance
A:metrics
A:monitor
A:notifications
A:reports
A:settings
A:status-page
A:ui/ux
A:user-management
Stale
ai-slop
blocked
blocked-upstream
bug
cannot-reproduce
dependencies
discussion
duplicate
feature-request
feature-request
good first issue
hacktoberfest
help
help wanted
house keeping
invalid
invalid-format
invalid-format
question
releaseblocker 🚨
security
spam
type:enhance-existing
type:new
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/uptime-kuma#2203
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @aLTeReGo-SWI on GitHub (May 22, 2023).
⚠️ Please verify that this feature request has NOT been suggested before.
🏷️ Feature Request Type
UI Feature
🔖 Feature description
The RADIUS monitor appears to have a hard-coded 2500ms timeout, though it could be two 1-second and another 30-second timeout.
We have instances where RADIUS requests can take as much as 10 seconds to respond. It's not performant, but it isn't 'down' either. Making this value configurable would alleviate a lot of the false positives I'm seeing.
✔️ Solution
Add a new UI element to monitor that allows for the input of a user-defined integer timeout value
❓ Alternatives
Increase the hard coded timeout values to be higher. Not a good solution, but it is an alternative.
📝 Additional Context
No response
@CommanderStorm commented on GitHub (May 22, 2023):
Could you further, how such a high ping could happen?
For a user who has nothing to do with radius: Is this expected behaviour to have such abnormally high latency?
@I71d0r commented on GitHub (May 23, 2023):
@CommanderStorm for basic scenarios the Radius will verify access quickly using internal means.
However, the Radius implementation allows more advanced scenarios to verify identity against external services like Active Directory, Okta, Google Workspace etc. Typically such information would be cached, but the cache may be expired or invalidated on purpose.
This may cause spikes that are evaluated as failures, although eventually the requests would succeed with delay. To distinguish whether the service is sluggish or not working a fine tuning of request timeout is essential to minimize the false positives.
@CommanderStorm commented on GitHub (May 23, 2023):
So basically the avg number you would expect for Laltency is below the current value, right?
Is the Usecase you are talking about not better solved via the
RetriesOption?What you say would be a good helptext in the monitor setup to distinguish between
TimeoutandRetries?@aLTeReGo-SWI commented on GitHub (May 24, 2023):
@I71d0r is 100% spot on. While most RADIUS requests should take less than 2.5 seconds to complete, there are instances where this simply takes more time. It's not 'Down', as the response is eventually set. Sometimes that takes as much as 10 seconds, but this is normal and expected behavior, even if it's not optimal.
That means you shouldn't receive an alert for something that is normal/expected behavior. That's what causes alert fatigue and causes people to ignore alerts because they're not confident they are accurate.
Retries as I understand them aren't going to solve the problem if the response is going to take 10 seconds to complete. What retries are doing is 'continue retrying X number of times, or until the response takes only 2.5 seconds' That's not the same thing as a configurable timeout value. Especially for other instances where the normal average response time is greater than 2.5 seconds. You could retry forever, but it might not ever complete in 2.5 seconds.
@CommanderStorm commented on GitHub (May 25, 2023):
@aLTeReGo-SWI please answer all my questions
So basically, the avg number you would expect for Latency of
RADIUSis below the current value, right? (as in Latency>2.5s is the absolute exception?)Is the Usecase (cache miss => long latency) you are talking about not better solved via the
RetriesOption?What you say would be a good helptext in the monitor setup to distinguish between
TimeoutandRetries?@aLTeReGo-SWI commented on GitHub (May 29, 2023):
@CommanderStorm Latency varies based on the request. If a request comes in that has cached data, the response is relatively quick. Less than a second on average. Requests that are not cached take longer to be served. Upwards of ~10 seconds
Increasing the retries simply results in hammering the same request stacking up these requests in the queue, causing further delaying the response.
A 'retry' is.. this was down.. E.G. it exceeded the timeout value. That timeout value right now is 2.5 seconds, but it might come back so try again.
A 'timeout' is how long should I wait for my request to be responded to before giving up, and retrying if a retry value is configured.
Also, I may be mistaken but the little bits of yellow on my availability charts suggest that retries count against overall availability. An extended timeout value should not if the request was serviced within the user-definable timeout period.
@CommanderStorm commented on GitHub (May 30, 2023):
Linking a few PRs/Issues:
Current state of timeouts: #2142
Timeouts are generally tracked in #877
⇒ once #2142 and #3188 are merged, adding a timeout to radius is eazy