mirror of
https://github.com/qbittorrent/qBittorrent.git
synced 2026-03-02 22:57:32 -05:00
hashchecking on multiple disks at once at the same time #15587
Labels
No labels
Accessibility
AppImage
Bounty
Build system
CI
Can't reproduce
Code cleanup
Confirmed bug
Confirmed bug
Core
Crash
Data loss
Discussion
Docker
Documentation
Duplicate
Feature
Feature request
Feature request
Feature request
Filters
Flatpak
GUI
Has workaround
I2P
Invalid
Libtorrent
Look and feel
Meta
NSIS
Network
Not an issue
OS: *BSD
OS: Linux
OS: Windows
OS: macOS
PPA
Performance
Project management
Proxy/VPN
Qt bugs
Qt6 compat
RSS
Search engine
Security
Temp folder
Themes
Translations
Triggers
Waiting diagnosis
Waiting info
Waiting upstream
Waiting web implementation
Watched folders
WebAPI
WebUI
autoCloseOldIssue
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/qBittorrent#15587
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ghost on GitHub (Mar 16, 2024).
qBittorrent & operating system versions
winver 26058.1300
qbt 4.6.3
Qt: 6.4.3
Libtorrent: 2.0.9.0
Boost: 1.83.0
OpenSSL: 1.1.1w
zlib: 1.3
What is the problem?
hashchecking on multiple disks at once at the same time
checking it with task manager performance the disks do not get properly saturated with reads the reads switch between the disks so only one runs at a time OR there is high random IO which makes the process slow when the advanced settings are increased
Steps to reproduce
load 300+ torrents from different places and restart the client gracefully few times which triggered auto recheck which can not be stopped on some of the torrents
Additional context
Its a huge seedbox that will only get bigger.
Log(s) & preferences file(s)
Logs show nothing related to the issues.
@HanabishiRecca commented on GitHub (Mar 17, 2024):
Preferences>BitTorrent>Max active checking torrentsP.S. Keep your mental disorders in check.
@thalieht commented on GitHub (Mar 17, 2024):
Duplicate of #12210
@glassez commented on GitHub (Mar 17, 2024):
@infr-automation
Until you stop trying to resurrect old Issues without providing enough details about your current case, you are unlikely to get any solution. The behavior of qBittorrent in terms of processing missing files at startup has changed significantly since then, so it makes no sense to just post "me too" in all similar topics.
@HanabishiRecca commented on GitHub (Mar 17, 2024):
I also suggest you some free advice: GTFO, your majesty. Noone cares about your "precious" time and your seedbox. Especially after you being mean to devs (and deleting offensive comments won't help you). This is a free open source software, nobody ows you anything.
If you think you are smart enough to implement it (which I doubt, there is clearly some Dunning-Kruger effect in action, as https://github.com/qbittorrent/qBittorrent/issues/12210#issuecomment-602191511 describes lots of possible edge cases) - prove it by contributing the code.
@ghost commented on GitHub (Mar 17, 2024):
Sorry bro I was rude.
@ghost commented on GitHub (Mar 17, 2024):
Well at least it's sequential performance that is needed.
What do you think about this: the user tags the torrents per drive and the qbt serializes the recheck requests to the tag groups?
For NAS, lvm, more complex situations where the blocks are spread, I can suggest throughput discovery algorithm similar to TCP and uTP. Input signals are the speed of recheck stream, if the speed of one stream is too affected by activating another recheck stream, which will then let the recheck function auto pause a stream if it affects another more than 10%.
It's about probabilities eventually if the speed of recheck is high the probability is high the blocks are on different media, if the speed is low the probability is high the blocks are on the same media.
Similar storage blocks management ZFS and vmware does maybe something can be visible from there and adapted further to maximize seedbox efficiency.
@HanabishiRecca commented on GitHub (Mar 17, 2024):
That's more adequate reasoning. I'm inclined to think that the manual approach is the way to go here, being it tied to tags, categories or something else. Having controllable and well-defined behavior is better than trying to outsmart the user with heuristics, imo.
As for your problem in #20562, it is probably caused by a premature process kill. When you exit the client gracefully, it does not end the process immediately. It takes some time to save all resume data and send stop requests to all trackers. With lots of active torrents it could be a fairly long time, so you just need to be sure that the process has exited before you perform a reboot/shutdown/etc. Yes, this involves opening the task manager and monitoring the process until it disappears. There is no better way atm unfortunately.
@ghost commented on GitHub (Mar 18, 2024):
So the minimal logic would be:
-try to not start rechecking because there is no real reason to - the actual files did not get corrupted in any way, even if resume metadata was somehow cut in flight on forced exit, the files that we seed no one tampered with them, we don't have any indication to suspect the files, so we should not start rechecking
So let's say we decide not to mess with that code and the new sqlite experimental code, we could in turn add another indicator of completeness. Set ARCHIVE flag on filesystem as well as READONLY flag. This can be the last IO operation on the freshly completed torrent and should be the easiest to implement.
If we decide the files are suspect we can do:
( I don't know what feedback data we get when we do recheck, do we get MB/s from libtorrent? )
Here’s a heuristic recursive logic in English for managing sequential file reads:
Start Reading: Begin reading the first file.
Measure Speed: Record the speed of the read operation.
Initiate Next Read: Start reading the next file while the previous read is still ongoing.
Evaluate Impact: Compare the speed of the current read operation with the previous one.
Decision Making:
If the current read does not affect the speed of the previous read, continue both reads.
If the current read slows down the previous one, pause the current read.
Recursive Logic: Apply the same logic for each new file read operation.
Resume Paused Reads: Once the previous read operation finishes, resume any paused reads and re-evaluate their impact on the ongoing operations.
Continue Until Done: Repeat this process until all files have been read.
This approach ensures that multiple files can be read in sequence without any single operation hindering the speed of the others. It’s a balance between progressing with new tasks and maintaining the efficiency of ongoing ones.
The heuristic part of the logic is the decision-making process that involves evaluating the impact of the current read operation on the previous one and then choosing whether to continue or pause the current read. This is heuristic because it’s based on trial and error, observing the system’s performance, and making judgments accordingly, rather than following a strict algorithmic rule. It’s a practical method to find a satisfactory solution that may not be perfect but works well enough in most cases.
However people reported using external NAS which spreads blocks across disks, so some a bit more granular heuristic would be nice indeed we can do it.
@ghost commented on GitHub (Mar 19, 2024):
@HanabishiRecca anyway I cut it it's about a finetuned variation of a throttling algorithm such as the μTP algorithm (LEDBAT) which must be applied to file read speed. With preference on high throughput less on low latency.
This is because when we have massive data archives on cost-effective systems, we also want to do data analysis, content recognition, tagging, indexing, video to text, on it, not just recheck, and this algorithm will help with that as well.
Every OS has IO optimizations in it so many ideas can be adapted.
Storage systems also have interesting optimization algorithms that we could get insight from.
I wonder how this can be simulated for many scenarios to test variations of such algorithms on ?
For sure the massive userbase is an asset to enable this technology.