mirror of
https://github.com/qbittorrent/qBittorrent.git
synced 2026-03-02 22:57:32 -05:00
qbt with lt 1.1 has worse torrent rechecking/creating speed on Windows #7391
Labels
No labels
Accessibility
AppImage
Bounty
Build system
CI
Can't reproduce
Code cleanup
Confirmed bug
Confirmed bug
Core
Crash
Data loss
Discussion
Docker
Documentation
Duplicate
Feature
Feature request
Feature request
Feature request
Filters
Flatpak
GUI
Has workaround
I2P
Invalid
Libtorrent
Look and feel
Meta
NSIS
Network
Not an issue
OS: *BSD
OS: Linux
OS: Windows
OS: macOS
PPA
Performance
Project management
Proxy/VPN
Qt bugs
Qt6 compat
RSS
Search engine
Security
Temp folder
Themes
Translations
Triggers
Waiting diagnosis
Waiting info
Waiting upstream
Waiting web implementation
Watched folders
WebAPI
WebUI
autoCloseOldIssue
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/qBittorrent#7391
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @airium on GitHub (Jun 6, 2018).
This issue extends #8181 (the only I found relevant). It identifies a probably libtorrent-side performance degradation wrt torrent rechecking on Windows. Maybe this one should be forwarded to libtorrent, but I think it is better be forwarded by qbt main devs or I will do it later.
qBittorrent version and Operating System
Windows 10 1709/1803, but I think it is general to at least Windows 7+
qBittorrent 3.3.16, 4.0.3, 4.0.4, 4.1.0, 4.1.1, but I think it's general to qbt 3.3+ and 4+
libtorrent: 1.0.11 (at 62c9679) and 1.1.7 but I think it's general to 1.0.6+ and 1.1+
What is the problem
I made a benchmark on torrent rechecking/creating wrt qbt 4.1.1 + lt 1.0.11/1.1.7, additionally with uT2.2.1 for comparison. From the result below, qBittorrent 4.1.1 will have a worse torrent rechecking/creating performance using libtorrent 1.1.7. This slower speed is actually general to any qBittorrent 3.3+ and 4+ built against libtorrent 1.1.x. If built against libtorrent 1.0.x, e.g. 1.0.11, qBittorrent will have a much better speed on that. The potential disk read and hash capability of libtorrent 1.0.11 should be much higher than uTorrent, as seen by 1000MB/s creating speed on NVMe SSD, but there is a degradation on rechecking, and a further drive-dependent degradation as seen on the SATA3 SSD, not seen by uTorrent.
speed / active time
SM951
X400
ST8000DM
ST2000LM
100%
~60%
~80%
~60%
100%
~85%
~97%
~96%
100%
~90%
90-95%
~95%
speed / active time
SM951
X400
ST8000DM
ST2000LM
100%
~60%
~80%
~60%
~30%
~65%
~76%
~85%
100%
~90%
90-95%
~95%
*The performance here is strange, but I tested it multiple times and the value is true. I have no idea why qbt4.1.1+lt1.1.7 should be slower on ST8000 than ST2000.
4.1.1+1.1.7 = the qbt official x64 build of qbt 4.1.1, lt 1.1.7, boost 1.67.0, Qt 5.10.1
4.1.1+1.0.11 = my x64 build of qbt 4.1.1, lt 1.0.11 (at 62c9679), boost 1.65.1, Qt 5.10.1
2.2.1 = uTorrent 2.2.1 build 25302
all using default configuration
Sequential read capability (the outmost cylinder if HDD):
SM951 = Samsung SM951 512G MLC M.2 NVMe SSD, ~1500MB/s, internal PCIe 3.0 4X
X400 = SanDisk X400 1TB TLC M.2 SATA3 SSD, ~500MB/s, internal SATA3
ST8000 = Seagate ST8000DM004, 3.5" 8T 5425R 256M, ~200MB/s, mounted via USB3.0
ST2000 = Seagate ST2000LM003, 2.5" 2T 5400R 32M, ~135MB/s, internal SATA3
CPU is 6700K, RAM 48GB
Test is based on 30GB large files (typ. 2GB each file), in 4MiB piece size
Speed and active time is the value shown in Task Manager
Both HDDs are reformated before test
What is the expected behaviour
Libtorrent 1.1.x should at least have a same disk perf on creating/rechecking torrent as 1.0.x.
Maybe qBittorrent should re-consider the libtorrent version to be used before libtorrent 1.1 becomes better.
Steps to reproduce
You need at least an SSD (assume you are not using some "flash disk" level SSD of 300MB/s or lower). For comparison, lt 1.1x group can include any official qbt 4.x builds (which are all built again lt 1.1.x), and lt 1.0.x group can include the official qbt 3.3.16 build (which is on lt 1.0.11) and my builds of qbt 4.1.1 against lt 1.0.11. You can also build other qbt+lt combination. My note here is that lt 1.0.11 at 4e90eb1 (the one in the releases page) or lower seems not to compile with boost 1.65.0 or above, but 1.64.0 is fine; lt 1.0.11 at 62c9679 does not compile with boost 1.66.0 or above, but 1.65.1 is fine.
It should also be noted that due to windows system read cache, before creating/rechecking you should do enough other read/write jobs to wipe the torrent content off the memory, otherwise operation will directly hit cache from memory at a high speed.
Extra info(if any)
I can upload all screenshots later (maybe in one or two days) if necessary.
As for Linux, I did not perform any similar test as I always compile libtorrent 1.0.11 on it, but if necessary maybe I can test it on my seedboxes, but this might need 1 week or more.
Previously I noticed the discussion about Windows IO and cache issue, so I also conducted some investigation on related setting about libtorrent session (typically becomes settings_pack in 1.1 branch), I altered settings including
enable_os_cache,low_priority_disk,coalesce_reads(by qBittorrent GUI or modifying qbt source code) but result does not show any change. This needs further investigation.I have been trying to fix this issue, but I don't think I can do it faster or better than current main qbt and lt devs. If main devs are willing to fix this issue, I can help run benchmarks.
@thalieht commented on GitHub (Jun 7, 2018):
Thanks for the benchmarks! Although i have nothing to comment i'd like to say that maybe you want to benchmark libtorrent master which (i think) can be used with this qBt repo: https://github.com/zeule/qbit
But maybe you should first ask arvidn if there are any improvements in that regard in libtorrent master.
@Seeker2 commented on GitHub (Jun 7, 2018):
There may have been a change to make force rechecking less "hard" on drives and CPUs.
@Chocobo1 commented on GitHub (Jun 7, 2018):
@arvidn
Ping, you might want to see this.
@arvidn commented on GitHub (Jun 7, 2018):
@airium what is the percent figure in the table?
My first guess at the factor dominating this throughput is the
settings_pack::checking_mem_usageconfiguration option. This essentially caps the buffer size of pieces in-flight, being checked. It's specified in blocks of 16kiB, and defaults to 256 blocks (i.e. 4 MiB). If the bandwidth delay product exceeds this, you will experience degraded performance. In libtorrent 1.1.x, this delay includes the communication between the main thread and the disk thread(s), which likely explains the degraded performance; the bandwidth delay product is larger.Perhaps the default should be 8 MiB. @airium would you mind doubling or quadrupling this setting and see how it affects the checking throughput?
@airium commented on GitHub (Jun 7, 2018):
@arvidn
It is simply the disk utilisation percentage shown in Task Manager. I just logged it but this value might not really indicate something. Empirically, a 100% does not necessarily say the disk throughput has been exhausted.

Of course, I will upload the result maybe later today.
@airium commented on GitHub (Jun 9, 2018):
@arvidn Sorry for my delay. Here is the result I put on Google Sheet.
https://docs.google.com/spreadsheets/d/1q5eL3j4HVGU_wZel23M7bvcRomJRdAnNylo0yqYHKCQ/edit?usp=sharing
Data in the table are all re-run. I also add some enties of official deluge 1.3.15 build for comparison and it is on top of libtorrent 1.0.11. It seems for qbt 4.1.1+1.1.7 the piece size doesn't really matter though a slight difference is witnessed. But for libtorrent 1.0.11, both qbt and deluge see some improvement on smaller piece size.
@airium commented on GitHub (Jun 9, 2018):
Also sorry I don't have time to fill every entry in the table but if you want some specific entries just tell me.
@arvidn commented on GitHub (Jun 9, 2018):
I'm primarily interested in whether
settings_pack::checking_mem_usageimproves checking throughput or not. I don't see this being reflected in the table.@airium commented on GitHub (Jun 9, 2018):
@arvidn Oh sorry I misunderstood your reply. Now I will change
settings_pack::checking_mem_usage.Stand by and I will renew the result in one hour.
@airium commented on GitHub (Jun 9, 2018):
@arvidn So this time I alter
settings_pack::checking_mem_usagetoSET(checking_mem_usage, 1024, 0)i.e. 4x of the default valueSET(checking_mem_usage, 32, 0)i.e. 1/8 of the defaultSET(checking_mem_usage, 256000, 0)i.e. 1000x of the default, as an aggressive oneHowever, the 3 builds with 32, 1024 and 256000 show no observable difference on torrent rechecking/creaing speed compared with the default build with 256.
Here are my builds: https://drive.google.com/open?id=16cerOCIJOAW2ksQhu8PgslDJxWLZvUx_
@khnielsen commented on GitHub (Jun 12, 2018):
I am actually seeing the same thing here, it is not only limited to checking speed but also actual torrent speeds are also much faster and more stable when libtorrent 1.0.11 is used rather than 1.1.X.
Arvidn I have been emailing you about this, as I can reproduce this not only on Windows but also on Linux (Using deluge), 1.0.11 will outperform the newer versions in all of my testings.
I have tried nearly everything I can think of, nearly all options using "ltconfig" on my linux installs and I have simply not been able to get 1.1.X to go anywhere near the speeds of 1.0.11, thats in terms of network throughput but also in terms of creating and checking torrents.
@arvidn commented on GitHub (Jun 13, 2018):
@khnielsen would you be able to enable session stats logging (in the alert mask) and post the logs here?
@khnielsen commented on GitHub (Jun 13, 2018):
I would if you could provide some documentation on how to do that, I am a bit lost on how to proceed, my findings on linux is not supported by anything but testing and messing around with it, but I have very little knowledge of how to get the proper logs for you.
@Chocobo1 commented on GitHub (Jun 16, 2018):
@airium
I'm not sure if this is relevant but have you tried bumping
settings_pack::aio_threads?https://github.com/arvidn/libtorrent/issues/3005
@arvidn commented on GitHub (Jun 16, 2018):
I run some tests to see what I can find on MacOS with a spinning disk. creating and checking torrents are slightly different. For checking, there currently is logic to concentrate the checking jobs to a single thread (as long as there are 4 or more disk threads).
There's an effect where spreading out the checking over more threads slows it down, presumably because of disk I/O requests not being perfectly sequential. It's possible that having
aio_threads< 4 primarily makes it worse by spreading out the hashing across 3 threads.However, when creating a torrent, the hashing happens in a single thread.
In my testing, the CPU used for hashing in insignificant compared to the time of the disk read operations. This is on a spinning disk on macOS. People that experience this problem may have a system with different characteristics, like an SSD for instance.
@arvidn commented on GitHub (Jun 16, 2018):
looking closer at the code, I think the main reason for the poor performance is probably that libtorrent reads 16kiB at a time from the disk. I think the original motivation for this is to keep the code simple and to not use excessive amounts of memory when the piece size is large.
I will try to improve this.
@arvidn commented on GitHub (Jun 16, 2018):
anyone wants to give this patch a try? https://github.com/arvidn/libtorrent/pull/3112
@arvidn commented on GitHub (Jun 17, 2018):
iirc, especially windows is very sensitive to small read calls. This reads entire pieces at a time, rather than 16kiB, but it still falls back to the generic (16 kiB at a time) logic in case it fails to allocate enough space in the cache
@airium commented on GitHub (Jun 18, 2018):
Sorry for my delay in response.
@Chocobo1 Thank you for your advice, but actually this setting does not matter. You could find my test packages in the link below, in which the ones with "aio_thread_1" and "aio_thread_64" are.
@arvidn I tried this one and it works! I noticed the creating/rechecking speed on SSD is almost 2x of that of current
RC_1_1, close to the same speed ofR_1_0. As for the previously noticed strange observation, i.e. the abnormally slow speed on my 3.5" HDD, now I identified it is a my-side fault, so nothing needed for it.Here is the test package, the one with "fix_hash_perf_1.1": test builds
@arvidn commented on GitHub (Jun 18, 2018):
thanks for testing and making these builds! This will be part of the 1.1.8 release
@airium commented on GitHub (Jun 18, 2018):
@arvidn btw, before I close this issue, do you have any immediate idea to further improve this performance? As we can see uTorrent still outperforms libtorrent in some cases (e.g. 500MB/s vs 200-300MB/s on SATA3 SSD), I am still of interest in beating uT and plan to look into how libtorrent implements this part, maybe you could offer a start point.
@arvidn commented on GitHub (Jun 18, 2018):
the only other thing I can think of would be to bump the
settings_pack::checking_mem_usage@khnielsen commented on GitHub (Jun 18, 2018):
What kind of items changed in the past arvid? I can concur that the patch improves performance but from my testings it is still a ~250MB\s 1.1.7 with patch vs 420MB\s 1.0.11 which is a massive difference.
I would assume that newer versions include better performance, and not something that goes the other way - From my testings then 1.0.11 is generally outperforming the 1.1.x branch in basically everything, from connecting to peers, network throughput on 10G interfaces to hashing speed.
@arvidn commented on GitHub (Jun 18, 2018):
@khnielsen a lot of things changed, perhaps the most important one was to support more than one disk I/O thread.
would you like to build and run performance regression tests, to avoid changes that negatively impact performance in the future?
You can try the
masterbranch today, and there's even anew-disk-iobranch which overhauls the disk I/O subsystem in an attempt to make it simpler and faster.@airium commented on GitHub (Jun 18, 2018):
@arvidn I have some new findings by chance. Now on the top of branch
hash-job-performance-1.1, a large increase onaio_threadsandchecking_memory_usageboth at once can effectively boost disk throughput in torrent rechecking. As an example, now when I increaseaio_threadsto 64 (=16x default) andchecking_memory_usageto 25600 (=100x default), the speed on SM951 reaches 1.5+GB/s and X400 550MB/s, both at their theoretical max bandwidth, fully utilised.Screenshots first:

*columns are on different torrents as 1.7GB/s is very quick to check any torrent, noticeably the CPU usage is correspondingly high
*stable ~550MB/s on SATA3 SSD
However, a single increase on one of the two has no effect, i.e. only increase
aio_threadsor only increasechecking_memory_usage. Apart from this, an identical change (4->64 and 256->25600) but without the patch shows an weaker improvement (~250MB/s -> 300~400MB/s). My further test shows that samller increase does not show a similar effect, e.g. only doublingaio_threadsandchecking_memory_usageto 8 and 512 respectively gives nearly no improvement. At last, the change has no effect on creating (maybe because creating torrent only uses 1 thread as said)Windows binaries for test: the same google drive link
I also noticed that in the past hours a little more io related patch is introduced, but I have no time to test either it or other changes in next days. Sorry in advance.
And just a note here: I also find some other problems during the whole tests. One is that libtorrent 1.1 executes all queued torrent rechecking jobs simultaneously, not likely on 1.0. This might not be expected especially on HDD as HDDs are slower in the case. Another one is that, when creating torrent, clicking on "Cancel" actually does not stop the job immediately. Hard disk is still being read until the job finishes (but gives no torrent). These 2 problems should be discussed as new issues.
@arvidn commented on GitHub (Jun 19, 2018):
It's not supposed to do this by default. It suggests the
active_checkingsetting is greater than 1. Another possible explanation is that force started torrents also apply to checking. They will be force checked so to speak.@arvidn commented on GitHub (Jun 19, 2018):
@ssiloti I can't recall the motivation for concentrating hash jobs into specific threads. I imagine it would have benefits of making disk I/O more sequential when checking a whole torrent (by serialising them all in a single thread). But I can also imagine a benefit being that it prevents hash jobs from starving out other disk jobs, just because there are so many of them.
I get the feeling though that in this case, with an SSD, the bottleneck is not the disk I/O, but the CPU of SHA-1. Does that sound reasonable? I can't think of a good architecture that would satisfy both the I/O bound and CPU bound case.
As for the
checking_memory_usagesetting, I believe this is a direct consequence of moving out the loop over the pieces into the network thread (out of the disk thread as it was in 1.0). Perhaps there should be some more sophisticated logic to determine that number of outstanding hash jobs, based on the bandwidth delay product. Or maybe not, maybe the limit should just be raised 100x. As long hash jobs are posted to dedicated threads and a dedicated job queue, it's not like there any significant cost of having a long backlog of hash jobs in there. They don't allocate any memory up-front.@Chocobo1 commented on GitHub (Jun 19, 2018):
Just want to add that qbt doesn't touch that parameter, it should be at the default
1.@ssiloti commented on GitHub (Jun 20, 2018):
I think the splitting of hash jobs into their own threads was before my time, but preventing them from starving other jobs seems like the most plausible justification.
A 6700K CPU is capable of computing SHA1 at well over 2GB/s with four hash threads, so it shouldn't be a bottleneck even with the NVMe drive as long as
aio_threadsis set to at least 16.I suspect the biggest drag on performance here is insufficient pipelining of hash jobs combined with insufficient parallelism. SSDs, especially high speed NVMe drives, really need to have multiple requests outstanding to hit their peak performance, so it makes sense that @airium needed both an increased number of active jobs and multiple hash threads to get the most out of those drives.
I think increasing
aio_threadsis probably a good idea, at least on master where we have a scalable thread pool. We should also probably increase the minimum number of active hash jobs when checking to 4 or 8 or maybe even 16. I think increasingchecking_memory_usage100x is probably overkill, our latency should not be high enough to need that many active jobs. In fact I wonder if we even need thechecking_memory_usagesetting. Calculating the number of active checking jobs by applying a fixed multiplier toaio_threadsshould Just Work for any reasonable situation I can think of.@ssiloti commented on GitHub (Jun 20, 2018):
Actually nevermind about increasing the minimum active jobs and removing
checking_memory_usage, I think it does make sense the way it is. I still think the number of active jobs should be multiplied by the number of hash threads though.@arvidn commented on GitHub (Jun 24, 2018):
could someone make a new test build with latest
RC_1_1of libtorrent? I believe the situation have been improved now.@airium commented on GitHub (Jun 29, 2018):
Sorry I was busy in my life. Now here is my compilation of qb 4.1.1 + lt 1.1.8 both at release: Google Drive
As far as I noticed, this one makes no difference with the previous one on
hash-job-performance-1.1. In other words, the performance on disk reading / torrent rechecking / creating is similar to that of 1.0.11 (maybe still 15% lower than the latter, but much better than 1.1.7-)This time I remind of using the
statisticpanel of qbittorrent. I found that on 1.1.8, during rechecking it showsqueued I/O jobsat 3 andtotal_buffer_sizeat 4MB, then gives 200-300MB/s on SATA3 SSD. While older versions with default 1.1.7 settings showqueued I/O jobsat 1 andtotal_buffer_sizeat 4MB, with 70-140MB/s. And the one I compilated withaio_threads = 64orchecking_memory_usage = 25600makes it to 84 and sereral hundred MB, giving 500+MB/s (and amazingly 2.2GB/s on NVMe SSD if also give a cache size larger than 64MB). Maybe this gives some hints for further improvement (e.g. there is just too less IO jobs). So I am curious about libtorrent's internal mechasim that leads to differentqueued I/O jobs, but I cannot spare time looking into it.And this time I still encounter the concurrent rechecking problem. To be specific, when queued multiple torrents to recheck, normally it should run one by one while the others are stalled at the
queued for checkingstatus. However, currently with libtorrent 1.1.x no one enters this queued state. I noticed #9120 discussed the rechecking behaviour, maybe a better policy may be only allowing concurrent rechecking on different hard disk while for torrents on the same disk they can only be run one by one. I don't know if this is feasible by libtorrent. A similar symptom should be noted here, that moving torrent are also executed concurrently. That is, if one moves serveral torrents, now all these torrents are moved in the same time instead of one by one, which is especially not expected when moving to HDDs. I have no idea if this problem comes from libtorrent or qbittorrent.At last, some of my friends reported crash issue with 1.1.8 against either deluge and qbittorrent. Maybe I also happened to encounter once yesterday on one my seedbox, but we didn't collect any trace info.
@seiferflo commented on GitHub (Jul 10, 2018):
I just want to inform I tested the last build of airium and it's day and night. Before, I was having many error / checking issues constantly, mainly with big files >30GB to a NAS. Now I've got almost no issue.
Download speed in much better as well, almost maxed out, though I need to keep playing with cache / connections per torrent as I get speed drop after a couple of hours or with more than 2 files.
Also when I exit qbittorent, it really stops now. Before it used to keep running for 5min in the background... Launching qbittorent again led to "checking" on all file.
So thanks for fixing all this, I was about to drop qbittorrent. Nice one !
@airium commented on GitHub (Aug 2, 2018):
Considering so far the disk performance has been much better, I am then closing this issue.
Thank you devs for improving things.