Make RADIUS timeout value configurable #2203

Closed
opened 2026-02-28 02:46:29 -05:00 by deekerman · 7 comments
Owner

Originally created by @aLTeReGo-SWI on GitHub (May 22, 2023).

⚠️ Please verify that this feature request has NOT been suggested before.

  • I checked and didn't find similar feature request

🏷️ Feature Request Type

UI Feature

🔖 Feature description

The RADIUS monitor appears to have a hard-coded 2500ms timeout, though it could be two 1-second and another 30-second timeout.

We have instances where RADIUS requests can take as much as 10 seconds to respond. It's not performant, but it isn't 'down' either. Making this value configurable would alleviate a lot of the false positives I'm seeing.

2023-05-22_10-09-53

image

✔️ Solution

Add a new UI element to monitor that allows for the input of a user-defined integer timeout value

Alternatives

Increase the hard coded timeout values to be higher. Not a good solution, but it is an alternative.

📝 Additional Context

No response

Originally created by @aLTeReGo-SWI on GitHub (May 22, 2023). ### ⚠️ Please verify that this feature request has NOT been suggested before. - [X] I checked and didn't find similar feature request ### 🏷️ Feature Request Type UI Feature ### 🔖 Feature description The RADIUS monitor appears to have a hard-coded 2500ms timeout, though it could be two 1-second and another 30-second timeout. We have instances where RADIUS requests can take as much as 10 seconds to respond. It's not performant, but it isn't 'down' either. Making this value configurable would alleviate a lot of the false positives I'm seeing. ![2023-05-22_10-09-53](https://github.com/louislam/uptime-kuma/assets/26488241/da22d003-9a9b-495d-9382-947b167687ee) ![image](https://github.com/louislam/uptime-kuma/assets/26488241/fd49ab29-5abd-4764-8d6a-934e3074e088) ### ✔️ Solution Add a new UI element to monitor that allows for the input of a user-defined integer timeout value ### ❓ Alternatives Increase the hard coded timeout values to be higher. Not a good solution, but it is an alternative. ### 📝 Additional Context _No response_
deekerman 2026-02-28 02:46:29 -05:00
Author
Owner

@CommanderStorm commented on GitHub (May 22, 2023):

Could you further, how such a high ping could happen?
For a user who has nothing to do with radius: Is this expected behaviour to have such abnormally high latency?

@CommanderStorm commented on GitHub (May 22, 2023): Could you further, how such a high ping could happen? For a user who has nothing to do with radius: Is this expected behaviour to have such abnormally high latency?
Author
Owner

@I71d0r commented on GitHub (May 23, 2023):

@CommanderStorm for basic scenarios the Radius will verify access quickly using internal means.
However, the Radius implementation allows more advanced scenarios to verify identity against external services like Active Directory, Okta, Google Workspace etc. Typically such information would be cached, but the cache may be expired or invalidated on purpose.
This may cause spikes that are evaluated as failures, although eventually the requests would succeed with delay. To distinguish whether the service is sluggish or not working a fine tuning of request timeout is essential to minimize the false positives.

@I71d0r commented on GitHub (May 23, 2023): @CommanderStorm for basic scenarios the Radius will verify access quickly using internal means. However, the Radius implementation allows more advanced scenarios to verify identity against external services like Active Directory, Okta, Google Workspace etc. Typically such information would be cached, but the cache may be expired or invalidated on purpose. This may cause spikes that are evaluated as failures, although eventually the requests would succeed with delay. To distinguish whether the service is sluggish or not working a fine tuning of request timeout is essential to minimize the false positives.
Author
Owner

@CommanderStorm commented on GitHub (May 23, 2023):

So basically the avg number you would expect for Laltency is below the current value, right?

Is the Usecase you are talking about not better solved via the Retries Option?
What you say would be a good helptext in the monitor setup to distinguish between Timeout and Retries?

@CommanderStorm commented on GitHub (May 23, 2023): So basically the avg number you would expect for Laltency is below the current value, right? Is the Usecase you are talking about not better solved via the `Retries` Option? What you say would be a good helptext in the monitor setup to distinguish between `Timeout` and `Retries`?
Author
Owner

@aLTeReGo-SWI commented on GitHub (May 24, 2023):

@I71d0r is 100% spot on. While most RADIUS requests should take less than 2.5 seconds to complete, there are instances where this simply takes more time. It's not 'Down', as the response is eventually set. Sometimes that takes as much as 10 seconds, but this is normal and expected behavior, even if it's not optimal.

That means you shouldn't receive an alert for something that is normal/expected behavior. That's what causes alert fatigue and causes people to ignore alerts because they're not confident they are accurate.

Retries as I understand them aren't going to solve the problem if the response is going to take 10 seconds to complete. What retries are doing is 'continue retrying X number of times, or until the response takes only 2.5 seconds' That's not the same thing as a configurable timeout value. Especially for other instances where the normal average response time is greater than 2.5 seconds. You could retry forever, but it might not ever complete in 2.5 seconds.

@aLTeReGo-SWI commented on GitHub (May 24, 2023): @I71d0r is 100% spot on. While most RADIUS requests should take less than 2.5 seconds to complete, there are instances where this simply takes more time. It's not 'Down', as the response is eventually set. Sometimes that takes as much as 10 seconds, but this is normal and expected behavior, even if it's not optimal. That means you shouldn't receive an alert for something that is normal/expected behavior. That's what causes alert fatigue and causes people to ignore alerts because they're not confident they are accurate. Retries as I understand them aren't going to solve the problem if the response is going to take 10 seconds to complete. What retries are doing is 'continue retrying X number of times, or until the response takes only 2.5 seconds' That's not the same thing as a configurable timeout value. Especially for other instances where the normal average response time is greater than 2.5 seconds. You could retry forever, but it might not ever complete in 2.5 seconds.
Author
Owner

@CommanderStorm commented on GitHub (May 25, 2023):

@aLTeReGo-SWI please answer all my questions

So basically, the avg number you would expect for Latency of RADIUS is below the current value, right? (as in Latency>2.5s is the absolute exception?)
Is the Usecase (cache miss => long latency) you are talking about not better solved via the Retries Option?

What you say would be a good helptext in the monitor setup to distinguish between Timeout and Retries?

@CommanderStorm commented on GitHub (May 25, 2023): @aLTeReGo-SWI please answer all my questions So basically, the avg number you would expect for Latency of `RADIUS` is below the current value, right? (as in Latency>2.5s is the absolute exception?) Is the Usecase (cache miss => long latency) you are talking about not better solved via the `Retries` Option? What you say would be a good helptext in the monitor setup to distinguish between `Timeout` and `Retries`?
Author
Owner

@aLTeReGo-SWI commented on GitHub (May 29, 2023):

@CommanderStorm Latency varies based on the request. If a request comes in that has cached data, the response is relatively quick. Less than a second on average. Requests that are not cached take longer to be served. Upwards of ~10 seconds

Increasing the retries simply results in hammering the same request stacking up these requests in the queue, causing further delaying the response.

A 'retry' is.. this was down.. E.G. it exceeded the timeout value. That timeout value right now is 2.5 seconds, but it might come back so try again.

A 'timeout' is how long should I wait for my request to be responded to before giving up, and retrying if a retry value is configured.

Also, I may be mistaken but the little bits of yellow on my availability charts suggest that retries count against overall availability. An extended timeout value should not if the request was serviced within the user-definable timeout period.

@aLTeReGo-SWI commented on GitHub (May 29, 2023): @CommanderStorm Latency varies based on the request. If a request comes in that has cached data, the response is relatively quick. Less than a second on average. Requests that are not cached take longer to be served. Upwards of ~10 seconds Increasing the retries simply results in hammering the same request stacking up these requests in the queue, causing further delaying the response. A 'retry' is.. this was down.. E.G. it exceeded the timeout value. That timeout value right now is 2.5 seconds, but it might come back so try again. A 'timeout' is how long should I wait for my request to be responded to before giving up, and retrying if a retry value is configured. Also, I may be mistaken but the little bits of yellow on my availability charts suggest that retries count against overall availability. An extended timeout value should not if the request was serviced within the user-definable timeout period.
Author
Owner

@CommanderStorm commented on GitHub (May 30, 2023):

Linking a few PRs/Issues:

Current state of timeouts: #2142
Timeouts are generally tracked in #877

⇒ once #2142 and #3188 are merged, adding a timeout to radius is eazy

@CommanderStorm commented on GitHub (May 30, 2023): Linking a few PRs/Issues: Current state of timeouts: #2142 Timeouts are generally tracked in #877 ⇒ once #2142 and #3188 are merged, adding a timeout to radius is eazy
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/uptime-kuma#2203
No description provided.