hashchecking on multiple disks at once at the same time #15587

Closed
opened 2026-02-22 02:16:52 -05:00 by deekerman · 9 comments
Owner

Originally created by @ghost on GitHub (Mar 16, 2024).

qBittorrent & operating system versions

winver 26058.1300

qbt 4.6.3

Qt: 6.4.3
Libtorrent: 2.0.9.0
Boost: 1.83.0
OpenSSL: 1.1.1w
zlib: 1.3

What is the problem?

hashchecking on multiple disks at once at the same time

checking it with task manager performance the disks do not get properly saturated with reads the reads switch between the disks so only one runs at a time OR there is high random IO which makes the process slow when the advanced settings are increased

Steps to reproduce

load 300+ torrents from different places and restart the client gracefully few times which triggered auto recheck which can not be stopped on some of the torrents

Additional context

Its a huge seedbox that will only get bigger.

Log(s) & preferences file(s)

Logs show nothing related to the issues.

Originally created by @ghost on GitHub (Mar 16, 2024). ### qBittorrent & operating system versions winver 26058.1300 qbt 4.6.3 Qt: 6.4.3 Libtorrent: 2.0.9.0 Boost: 1.83.0 OpenSSL: 1.1.1w zlib: 1.3 ### What is the problem? hashchecking on multiple disks at once at the same time checking it with task manager performance the disks do not get properly saturated with reads the reads switch between the disks so only one runs at a time OR there is high random IO which makes the process slow when the advanced settings are increased ### Steps to reproduce load 300+ torrents from different places and restart the client gracefully few times which triggered auto recheck which can not be stopped on some of the torrents ### Additional context Its a huge seedbox that will only get bigger. ### Log(s) & preferences file(s) Logs show nothing related to the issues.
deekerman 2026-02-22 02:16:52 -05:00
  • closed this issue
  • added the
    Duplicate
    label
Author
Owner

@HanabishiRecca commented on GitHub (Mar 17, 2024):

Preferences > BitTorrent > Max active checking torrents

P.S. Keep your mental disorders in check.

@HanabishiRecca commented on GitHub (Mar 17, 2024): `Preferences` > `BitTorrent` > `Max active checking torrents` P.S. Keep your mental disorders in check.
Author
Owner

@thalieht commented on GitHub (Mar 17, 2024):

Duplicate of #12210

@thalieht commented on GitHub (Mar 17, 2024): Duplicate of #12210
Author
Owner

@glassez commented on GitHub (Mar 17, 2024):

@infr-automation
Until you stop trying to resurrect old Issues without providing enough details about your current case, you are unlikely to get any solution. The behavior of qBittorrent in terms of processing missing files at startup has changed significantly since then, so it makes no sense to just post "me too" in all similar topics.

@glassez commented on GitHub (Mar 17, 2024): @infr-automation Until you stop trying to resurrect old Issues without providing enough details about your current case, you are unlikely to get any solution. The behavior of qBittorrent in terms of processing missing files at startup has changed significantly since then, so it makes no sense to just post "me too" in all similar topics.
Author
Owner

@HanabishiRecca commented on GitHub (Mar 17, 2024):

I also suggest you some free advice: GTFO, your majesty. Noone cares about your "precious" time and your seedbox. Especially after you being mean to devs (and deleting offensive comments won't help you). This is a free open source software, nobody ows you anything.

If you think you are smart enough to implement it (which I doubt, there is clearly some Dunning-Kruger effect in action, as https://github.com/qbittorrent/qBittorrent/issues/12210#issuecomment-602191511 describes lots of possible edge cases) - prove it by contributing the code.

@HanabishiRecca commented on GitHub (Mar 17, 2024): I also suggest you some free advice: GTFO, your majesty. Noone cares about your "precious" time and your seedbox. Especially after you being mean to devs (and deleting offensive comments won't help you). This is a free open source software, nobody ows you anything. If you think you are smart enough to implement it (which I doubt, there is clearly some Dunning-Kruger effect in action, as https://github.com/qbittorrent/qBittorrent/issues/12210#issuecomment-602191511 describes lots of possible edge cases) - prove it by contributing the code.
Author
Owner

@ghost commented on GitHub (Mar 17, 2024):

Sorry bro I was rude.

@ghost commented on GitHub (Mar 17, 2024): Sorry bro I was rude.
Author
Owner

@ghost commented on GitHub (Mar 17, 2024):

Well at least it's sequential performance that is needed.

What do you think about this: the user tags the torrents per drive and the qbt serializes the recheck requests to the tag groups?

For NAS, lvm, more complex situations where the blocks are spread, I can suggest throughput discovery algorithm similar to TCP and uTP. Input signals are the speed of recheck stream, if the speed of one stream is too affected by activating another recheck stream, which will then let the recheck function auto pause a stream if it affects another more than 10%.

It's about probabilities eventually if the speed of recheck is high the probability is high the blocks are on different media, if the speed is low the probability is high the blocks are on the same media.

Similar storage blocks management ZFS and vmware does maybe something can be visible from there and adapted further to maximize seedbox efficiency.

@ghost commented on GitHub (Mar 17, 2024): Well at least it's sequential performance that is needed. What do you think about this: the user tags the torrents per drive and the qbt serializes the recheck requests to the tag groups? For NAS, lvm, more complex situations where the blocks are spread, I can suggest throughput discovery algorithm similar to TCP and uTP. Input signals are the speed of recheck stream, if the speed of one stream is too affected by activating another recheck stream, which will then let the recheck function auto pause a stream if it affects another more than 10%. It's about probabilities eventually if the speed of recheck is high the probability is high the blocks are on different media, if the speed is low the probability is high the blocks are on the same media. Similar storage blocks management ZFS and vmware does maybe something can be visible from there and adapted further to maximize seedbox efficiency.
Author
Owner

@HanabishiRecca commented on GitHub (Mar 17, 2024):

That's more adequate reasoning. I'm inclined to think that the manual approach is the way to go here, being it tied to tags, categories or something else. Having controllable and well-defined behavior is better than trying to outsmart the user with heuristics, imo.

As for your problem in #20562, it is probably caused by a premature process kill. When you exit the client gracefully, it does not end the process immediately. It takes some time to save all resume data and send stop requests to all trackers. With lots of active torrents it could be a fairly long time, so you just need to be sure that the process has exited before you perform a reboot/shutdown/etc. Yes, this involves opening the task manager and monitoring the process until it disappears. There is no better way atm unfortunately.

@HanabishiRecca commented on GitHub (Mar 17, 2024): That's more adequate reasoning. I'm inclined to think that the manual approach is the way to go here, being it tied to tags, categories or something else. Having controllable and well-defined behavior is better than trying to outsmart the user with heuristics, imo. As for your problem in #20562, it is probably caused by a premature process kill. When you exit the client gracefully, it does not end the process immediately. It takes some time to save all resume data and send stop requests to all trackers. With lots of active torrents it could be a fairly long time, so you just need to be sure that the process has exited before you perform a reboot/shutdown/etc. Yes, this involves opening the task manager and monitoring the process until it disappears. There is no better way atm unfortunately.
Author
Owner

@ghost commented on GitHub (Mar 18, 2024):

So the minimal logic would be:

-try to not start rechecking because there is no real reason to - the actual files did not get corrupted in any way, even if resume metadata was somehow cut in flight on forced exit, the files that we seed no one tampered with them, we don't have any indication to suspect the files, so we should not start rechecking

So let's say we decide not to mess with that code and the new sqlite experimental code, we could in turn add another indicator of completeness. Set ARCHIVE flag on filesystem as well as READONLY flag. This can be the last IO operation on the freshly completed torrent and should be the easiest to implement.

If we decide the files are suspect we can do:

( I don't know what feedback data we get when we do recheck, do we get MB/s from libtorrent? )

Here’s a heuristic recursive logic in English for managing sequential file reads:

Start Reading: Begin reading the first file.
Measure Speed: Record the speed of the read operation.
Initiate Next Read: Start reading the next file while the previous read is still ongoing.
Evaluate Impact: Compare the speed of the current read operation with the previous one.
Decision Making:
If the current read does not affect the speed of the previous read, continue both reads.
If the current read slows down the previous one, pause the current read.
Recursive Logic: Apply the same logic for each new file read operation.
Resume Paused Reads: Once the previous read operation finishes, resume any paused reads and re-evaluate their impact on the ongoing operations.
Continue Until Done: Repeat this process until all files have been read.
This approach ensures that multiple files can be read in sequence without any single operation hindering the speed of the others. It’s a balance between progressing with new tasks and maintaining the efficiency of ongoing ones.

The heuristic part of the logic is the decision-making process that involves evaluating the impact of the current read operation on the previous one and then choosing whether to continue or pause the current read. This is heuristic because it’s based on trial and error, observing the system’s performance, and making judgments accordingly, rather than following a strict algorithmic rule. It’s a practical method to find a satisfactory solution that may not be perfect but works well enough in most cases.

However people reported using external NAS which spreads blocks across disks, so some a bit more granular heuristic would be nice indeed we can do it.

@ghost commented on GitHub (Mar 18, 2024): So the minimal logic would be: -try to not start rechecking because there is no real reason to - the actual files did not get corrupted in any way, even if resume metadata was somehow cut in flight on forced exit, the files that we seed no one tampered with them, we don't have any indication to suspect the files, so we should not start rechecking So let's say we decide not to mess with that code and the new sqlite experimental code, we could in turn add another indicator of completeness. Set ARCHIVE flag on filesystem as well as READONLY flag. This can be the last IO operation on the freshly completed torrent and should be the easiest to implement. If we decide the files are suspect we can do: ( I don't know what feedback data we get when we do recheck, do we get MB/s from libtorrent? ) Here’s a heuristic recursive logic in English for managing sequential file reads: **Start Reading**: Begin reading the first file. **Measure Speed:** Record the speed of the read operation. **Initiate Next Read**: Start reading the next file while the previous read is still ongoing. **Evaluate Impact**: Compare the speed of the current read operation with the previous one. **Decision Making:** If the current read does not affect the speed of the previous read, continue both reads. If the current read slows down the previous one, pause the current read. **Recursive Logic**: Apply the same logic for each new file read operation. **Resume Paused Reads**: Once the previous read operation finishes, resume any paused reads and re-evaluate their impact on the ongoing operations. **Continue Until Done**: Repeat this process until all files have been read. This approach ensures that multiple files can be read in sequence without any single operation hindering the speed of the others. It’s a balance between progressing with new tasks and maintaining the efficiency of ongoing ones. The heuristic part of the logic is the decision-making process that involves evaluating the impact of the current read operation on the previous one and then choosing whether to continue or pause the current read. This is heuristic because it’s based on trial and error, observing the system’s performance, and making judgments accordingly, rather than following a strict algorithmic rule. It’s a practical method to find a satisfactory solution that may not be perfect but works well enough in most cases. However people reported using external NAS which spreads blocks across disks, so some a bit more granular heuristic would be nice indeed we can do it.
Author
Owner

@ghost commented on GitHub (Mar 19, 2024):

@HanabishiRecca anyway I cut it it's about a finetuned variation of a throttling algorithm such as the μTP algorithm (LEDBAT) which must be applied to file read speed. With preference on high throughput less on low latency.

This is because when we have massive data archives on cost-effective systems, we also want to do data analysis, content recognition, tagging, indexing, video to text, on it, not just recheck, and this algorithm will help with that as well.

Every OS has IO optimizations in it so many ideas can be adapted.

Storage systems also have interesting optimization algorithms that we could get insight from.

I wonder how this can be simulated for many scenarios to test variations of such algorithms on ?

For sure the massive userbase is an asset to enable this technology.

@ghost commented on GitHub (Mar 19, 2024): @HanabishiRecca anyway I cut it it's about a finetuned variation of a throttling algorithm such as the μTP algorithm (LEDBAT) which must be applied to file read speed. With preference on high throughput less on low latency. This is because when we have massive data archives on cost-effective systems, we also want to do data analysis, content recognition, tagging, indexing, video to text, on it, not just recheck, and this algorithm will help with that as well. Every OS has IO optimizations in it so many ideas can be adapted. Storage systems also have interesting optimization algorithms that we could get insight from. I wonder how this can be simulated for many scenarios to test variations of such algorithms on ? For sure the massive userbase is an asset to enable this technology.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/qBittorrent#15587
No description provided.