mirror of
https://github.com/qbittorrent/qBittorrent.git
synced 2026-03-02 22:57:32 -05:00
Torrent rechecks to 100% despite appended data - files are not truncated #7990
Labels
No labels
Accessibility
AppImage
Bounty
Build system
CI
Can't reproduce
Code cleanup
Confirmed bug
Confirmed bug
Core
Crash
Data loss
Discussion
Docker
Documentation
Duplicate
Feature
Feature request
Feature request
Feature request
Filters
Flatpak
GUI
Has workaround
I2P
Invalid
Libtorrent
Look and feel
Meta
NSIS
Network
Not an issue
OS: *BSD
OS: Linux
OS: Windows
OS: macOS
PPA
Performance
Project management
Proxy/VPN
Qt bugs
Qt6 compat
RSS
Search engine
Security
Temp folder
Themes
Translations
Triggers
Waiting diagnosis
Waiting info
Waiting upstream
Waiting web implementation
Watched folders
WebAPI
WebUI
autoCloseOldIssue
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/qBittorrent#7990
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Rne163 on GitHub (Oct 29, 2018).
Please provide the following information
qBittorrent version and Operating System
qBittorrent v4.1.1 Windows 10 64-bit (10.0.1xxxx)
What is the problem
When a small file (around 1KB in size) is modified, the torrent rechecks to 100%, with the modified file not being replaced. Deleting the file correctly downloads the original file again
What is the expected behavior
The torrent should not recheck to 100% and attempt to download and replace the modified file.
Steps to reproduce
Extra info(if any)
I specifically noticed this issue on .sfv files. I cannot reproduce it on larger files.
@erasmux commented on GitHub (Oct 29, 2018):
Tested changing one letter in a small nfo file (78 bytes) and v4.1.3 correctly rechecks to 99% complete with the said file at 0%. Restoring the original file contents and doing another recheck restores 100% complete.
Check with the latest version v4.1.3, possibly this is an issue that has already been resolved.
EDIT: adding a few characters at the end of the file does exhibit the behavior you describe. I am not sure if thats a bug or a feature (i.e. in my case the first 78 bytes of the file are indeed correct, and I certainly don't want qbittorrent to truncate files on recheck like other clients do - this is one of the things I like about qbittorrent, you can recheck without fear that i'll will damage files).
@sandersaares commented on GitHub (Nov 8, 2018):
What is the scenario where you do not want qB to fix the file (e.g. by truncation)?
@FranciscoPombal commented on GitHub (Nov 8, 2018):
@Chocobo1 @glassez
This sounds like a pretty serious bug in the rechecking code. Can any of you take a look at it?
Is it part of Qbittorrent or libtorrent code?EDIT: after taking a quick look at qBittorrent's code, this seems to be an issue with libtorrent.
@arvidn Can you take a look at this? Rechecking should fail for a file if that file has extra bytes appended to it. However, I do not think it should fail for a directory if there are extra files in it, as it would break cross-seeding functionality. Perhaps this last behavior could depend on a "strict recheck" option.
@arvidn commented on GitHub (Nov 8, 2018):
I see three possible behaviors:
libtorrent used to implement (2), but people were experiencing problems when they accidentally started another torrent downloading to the same filename, causing loss of data. I think there should be a thread either on qbt or deluge to dig up on this issue. This prompted a change in libtorrent to implement (1). @FranciscoPombal I assume you're suggesting (3).
In your opinion, how should the rechecking fail? should some pieces be missing (even though they're not)? Should all pieces be missing? If not, should the existing pieces be able to be seeded? Perhaps it should put the torrent in upload-only mode?
Or should there be some new kind of failure that just blocks the torrent from seeding and downloading until the user, manually, truncates the files and start the checking over again?
@erasmux commented on GitHub (Nov 8, 2018):
This is exactly what I was talking about. I had this behavior back when I was using utorrent and it was really annoying when a simple typo or selecting the wrong destination folder and then running a supposedly non-destructive recheck on a paused torrent would truncate the file, resulting in loss of data.
Is there a way that the recheck could mark it as 100% complete and yet leave the file in some incomplete error state (or some other error state)? Then when we actually start the torrent truncating the file is not only acceptable but desirable.
Another "workaround" I can think of: Mark the last piece of the file as missing if the file on the disk is bigger. Conceptually you can think of it as the file including a EOF marker and that marker is a mismatch.
The important part is that the recheck alone does not alter the existing file and only when it is started can the file can be truncated (if this ends up redownloading one piece of the torrent this seems like a decent compromise to me).
@FranciscoPombal commented on GitHub (Nov 8, 2018):
@arvidn
I would actually prefer approach (1).
In that case, shouldn't the program warn the user "Hey, if you save to this directory, you will be overwriting existing files/folders, are you sure you want to proceed?"(*) instead of silently overwriting/appending/whaterver? Whatever the decision, since it is silent, it would probably not be someone would expect to happen.
Besides, when would people legitimately want to download another torrent to some existing filename? I think this is a very very niche use case that should not be expected by default.
Even in the cross-seeding scenario, what happens is that two torrents have similar folders, but one has extra files. So you just symlink one to the other, add both torrents and you're good to go.
If users really want data appended, they should have to say so explicitly, in my opinion.
Note on (*): libtorrent/qBittorrent would probably have to distinguish between two cases:
1 - New torrent appends data to existing files from another torrent.
2 - New torrent adds files in existing directory belonging to another torrent.
@sandersaares commented on GitHub (Nov 8, 2018):
I see a 1:1 relationship between files on disk and files in torrent.
When adding a torrent, conflicts with existing non-torrent files should notify the user that existing content will be overwritten.
When adding a torrent, conflicts with files from other known torrents should reject the add and require the user to specify a new location (or automatically do so by adding a subdirectory, as is done in case of multi-file torrents that are not wrapped in a directory).
When a torrent is active, it owns the file and the client should do everything necessary to ensure that file contents match torrent 100%.
@erasmux commented on GitHub (Nov 8, 2018):
I can tell you that I experienced loss of data using utorrent's approach (1) in exactly this scenario. In cross-seeding the file names don't always match, so I rename/link the file to match and rely on the recheck to tell me if the content do actually match (in qB you don't even have to make any change on the disk, you can rename the torrent files in the qB interface).
Why should I live in fear that a recheck on a paused torrent would cause data loss?
What could be acceptable, is if there were an additional popup during the recheck claiming the file sizes don't match and asking the user to approve or deny truncating the files. Personally I would probably prefer other solutions (like the ones I suggested above).
@arvidn commented on GitHub (Nov 8, 2018):
Well, I can only speak to what libtorrent should do. And I don't think it can help that much in this scenario. Also, keep in mind that the normal way one resumes a torrent is by "overwriting" existing files.
For example, there's no check to make sure that pieces we don't have are all zeroes, so there may be some data there that's overwritten. On the other hand, the normal behavior of downloading a piece that fails the hash check is to just leave it there (not overwrite it by zeroes, as that would not be very efficient).
I don't think it was legitimate, I think it was accidental.
what do you mean by having data appended? libtorrent never appends data.
when I hear "append", I think adding bytes at the end of a file. But you mean adding additional files to a directory?
The problem of figuring out whether the files torrent A is writing to also belongs to torrent B is not solved in libtorrent. I believe there's a poor-man's check that if B already has a file open and A tries to open it, some error is generated.
@FranciscoPombal commented on GitHub (Nov 8, 2018):
@arvidn
Well, more reason then to simply warn the user, with a dialog or something, which is probably done at the application level and not library level.
If I understood your previous comment correctly, libtorrent would append data if users added a torrent with the same filename as an existing one.
I really mean adding bytes at the end of the file.
I think @sandersaares expressed the idea more clearly:
So, for example, Suppose you have torrent1 and torrent2.
torrent1 references a file, file1 with the following content:
012345torrent2 references a file, file1 (same name), with the following content:
012345DEADBEEFIf torrent1 is already added and downloaded, and I try to add torrent2, qBittorrent should warn me that proceeding can result in data loss for torrent1 (this is only related to qBittorrent's code, I think). If I ignore the error and later recheck torrent1, the extra
DEADBEEFshould get truncated (this is the part related to libtorrent). Of course, this results in data loss with respect to torrent2, but it is the users fault for ignoring this. Rechecking and resuming torrent2 would then trigger the redownload of the missing pieces.Again, I think this is the key point:
There could be any number of reasons for files to get (garbage) data appended to them. Files may be rendered inoperable because of that, and the user would never know why.
More importantly, they would also fail checksums, which would be really confusing for everyone given that libtorrent/qBittorrent said they passed the recheck. This could potentially have security implications.
@arvidn commented on GitHub (Nov 8, 2018):
I think overwrite is a better description. It just won't truncate the existing files, but it may extend it. But if none of the existing bytes in the file match the piece hashes, they will all be replaced.
That's right, except libtorrend won't truncate it. And if the two torrents happen to try to open the files at the same time, there will be an error. libtorrent just doesn't keep an extra data structure for the sole purpose of detecting conflicts on files.
from libtorrent's point of view, it would be the client warning about this, not libtorrent. There is a cost associated with keeping this extra data structure, that I don't think is relevant to all users of libtorrent.
@arvidn commented on GitHub (Nov 8, 2018):
would it make more sense to always truncate a file when writing the last piece to it? That could still cause data loss, but not from a paused torrent that's just being checked.
@erasmux commented on GitHub (Nov 8, 2018):
This would be catastrophic and would disable any option to cross-seed without creating an actual copy on the disk (even symbolic links won't help if qB would lock the file).
qB's current cross-seeding behavior is near perfect (imho), I can and do cross-seed a torrent which adds additional files while actively seeding a torrent with less files, which results in the last piece of a file in the first torrent to be redownloaded by the second one. I have no idea how it works but it works flawlessly.
Could this be abused to cause one torrent to cause an error on a different one? yes. But I don't see a problem with this. The behavior is clean and simple and as far as I know inline with other torrent clients.
Please don't break this relatively important feature for some very rare edge case.
Is there a logical scenario which brings to the situation we are discussing (rechecking a file which has the exact correct contents just with a few extra characters)?
The problem is that currently rechecking such a file would result in 100% complete and the torrent would start seeding without redownloading the last piece. Like I offered before, I think that failing the check on the last piece of a file if it is longer than the torrent data could be a good solution/workaround (of course coupled with truncating the file when the last piece is redownloaded).
@FranciscoPombal commented on GitHub (Nov 8, 2018):
@erasmux
No, it wouldn't (besides, nobody said anything about "locking" the file?). What me and @sandersaares were saying was that qBittorrent wouldn't append bytes to existing files, but would let you save extra files to already existing folders. This way, cross seeding would work just fine.
@arvidn
Wouldn't you agree that
Is a problem?
EDIT:
This is exactly what I am talking about. This would cause said files to fail checksums
@FranciscoPombal commented on GitHub (Nov 8, 2018):
Sure, I agree the part about keeping track of conflicts between files is at most a responsibility of the application.
However, what I am saying is libtorrent not truncating files on recheck (in case they have bytes appended to them) is most likely not the behavior that people expect.
@erasmux commented on GitHub (Nov 8, 2018):
qB / libtorrent causing data loss (i.e. by truncating a file) is definitely not the behavior anyone could expect from a recheck on a paused torrent.
@arvidn commented on GitHub (Nov 8, 2018):
@FranciscoPombal yeah, I think people's model of what it means to check a torrent is to ask the question: do I have the data? i.e. that it should be a read-only operation
@FranciscoPombal commented on GitHub (Nov 8, 2018):
@erasmux
It's not data loss, it is discarding incorrect data with respect to a certain torrent. Note that I am not saying that files should be deleted, just that they should be truncated.
@arvidn
Maybe it is just me, but my expectation for the recheck functionality is a bit higher; it answers the question: Do I have the correct data? A file that has garbage appended to it should be truncated when answering this question.
@FranciscoPombal commented on GitHub (Jul 31, 2020):
I think my opinion about this behavior has changed since my previous posts.
The current behavior of ignoring appended data (i.e., treating recheck as a read-only operation) is the better one for cross-seeding and a purely file-sharing standpoint - libtorrent will never send the appended data to any peer anyway.
However, I think ignoring appended data may subvert some expectations of file integrity/verification. I think that people expect that a rechecked torrent also means "if I run these files through
sha256sum, I will always get the same result". This will not happen in general if arbitrary data can be appended.So here's what I'm thinking: perhaps the recheck feature could be extended to put the torrent in some kind of warning state/category of "data appended" (this could be some kind of per-torrent flag that libtorrent sets). This would mean "the data is all there and the torrent is able to upload/download correctly, but be warned that, from a file-level integrity standpoint, there are extra bytes here and there". Perhaps libtorrent could also expose which files in a given torrent have data appended and how much.
Additionally, a control to "delete all appended data" would probably be nice (this could be a context-menu option in qBIttorrent, similar to "force recheck"), otherwise the user is always forced to delete the appended data manually, if they decide they no longer needed.
@arvidn thoughts?
@arvidn commented on GitHub (Jul 31, 2020):
I'm not sure about a state, but an alert might make sense. Perhaps there should be a separate operation than "recheck" too, something like "repair".
I can also imagine a flag to
force_recheck(), similar tomove_storage(), that enables truncation of files.@FranciscoPombal commented on GitHub (Aug 4, 2020):
@arvidn
libtorrent could fire an alert, and qBittorrent could change states based on that (internal qBittorrent states, nothing to do with libtorrent). This alert could fire for a certain torrent:
👍
Do you mean that would be a possible way to implement the "repair" operation? Sounds good.
@jbruchon commented on GitHub (Aug 26, 2020):
This seems pretty straightforward. Give the user the choice over the behavior. If they want truncation or want appended data to be allowed then they can make that choice in the preferences.
There also needs to be an option to auto-rename when there is a file collision between two torrents in the same program. I have regularly attempted to download separate files that have the same name only for the program to then merge them naively. The current behavior doesn't make sense and needs to be addressed. I was shocked to discover that this issue is so old.
I don't know what is meant by "cross-seeding" but I assume it means loading multiple torrents with overlapping data. If the files differ in size then they are not duplicates. Sure, the smaller one could be identical to the same sized segment at the start of the larger one, but to always act like this is the case is dangerous. If you download two differently sized MP4 videos, for example, they're not the same file with an extremely high degree of certainty. In the case of MP4 files in particular, the smaller file's moov atom (start or end) can get nuked by the larger file data, and that renders the entire smaller file unusable. Likewise, if the header is the one for the smaller file, the larger file becomes unusable even if the moov atom is at the end.
tl;dr: differently sized files are not identical and shouldn't be treated as such by default.
@FranciscoPombal commented on GitHub (Aug 26, 2020):
Cross-seeding means downloading torrent A from tracker X, and seeding it to both tracker X and another tracker Y. If tracker Y contains torrent B, such that A is a superset of B, A is also good for seeding B. In this context, "superset" means "additional files/directories", and very rarely "bytes appended to files". This latter case is the most relevant to this discussion, as I wouldn't expect a truncation function to actually delete additional files, just additional bytes in certain files. But this could also be configurable, e.g. "No truncation" | "Truncate files" | "Truncate directory contents" | "Truncate files and directory contents".
@escape0707 commented on GitHub (Feb 5, 2022):
Wow, actually got hit by this bug this week. Can't rehash to get an updated working copy of a program.
Really wish this will be fix in some way. Half of my week was ruined and wasted on this... Have to use other clients like aria2 for this now:
@escape0707 commented on GitHub (Feb 5, 2022):
I'd suggest to implement some flag / queue for truncation that only take place when the user really want to download / modifying.
Sometimes I like to I add a torrent stopped and manually force a recheck. After the recheck qbittorrent just report me with what it think is wrong / missing without actually modifying them. Modification only happens when I say "download" later.
I like to have this kind of safety and I hope truncation only happens after user confirmation of some kind of change could happen.
But I do think that silently ignoring appending bits which actually changes the checksums of a file when "rehashing" / "rechecking" is extremely counterintuitive.
@escape0707 commented on GitHub (Feb 5, 2022):
Not the feature we deserve, but the feature we need.
@arvidn commented on GitHub (Feb 5, 2022):
there are very compelling arguments against truncating files and discard data as well.
An option in libtorrent seems like a much more complex solution than it needs to be. iterating the files in a torrent and checking the file sizes on disk, and truncate, doesn't need to be part of the normal download flow.
That could just be a function, separate from downloading, could it not?
@escape0707 commented on GitHub (Feb 5, 2022):
Torrent-wise flag seems too complicated for this rare use case. I think a dedicated function in lib and a separated button for enforcing hash-perfection is okay.
Although other BT clients don't seem to even bother and just truncate.Aria2 just truncate, but Transmission BT will behave the same as libtorrent.I do think an alert should be implemented both in the lib and the client.
@arvidn commented on GitHub (Feb 7, 2022):
how does this look? https://github.com/arvidn/libtorrent/pull/6726
@glassez commented on GitHub (Feb 8, 2022):
Like what?
The only argument, IMO, is that some file can be valid as a whole, or some continuous part of it from the beginning to a certain position, so that it can be in one torrent completely, and in another - in a truncated form. But the question is, what types of files (with the exception of plain text) are suitable for this? And how likely is the situation with torrents described above?
Another question is how libtorrent behaves if it checks a file that has a suitable (or smaller size), while some of its initial part satisfies the torrent hashes, and the subsequent part does not? Why not make the behavior consistent?
Of course, no changes should be applied at the "checking" stage, but at the "downloading" stage.
@escape0707 commented on GitHub (Feb 8, 2022):
Can't understand why people would download load through torrenting and expect to get files different from listed in the torrent.
https://cs.rin.ru/forum/viewtopic.php?p=2537298#p2537298
@arvidn commented on GitHub (Feb 8, 2022):
There's a long thread somewhere on github on this issue. my recollection of the main gist is that you may have files you care about in your download folder, and one torrent (that perhaps shouldn't be trusted) can wipe them out. Without truncating files, the damage done is proportional to how long it takes you to realize it's a bad/malicious torrent.
@escape0707 commented on GitHub (Feb 8, 2022):
I think trying to update with rehashing and failed to do so because the bt client doesn't do checksum perfect recheck is a lot harder to realize than downloading a bad torrent without cautious.
If the latter happened, the user will think like "damn, my fault". But when the former happened, the user will have no idea why the update won't work as expected until they happened / managed to get a file verification other than the torrent it self, and have to switch to another bt client for the update.
@arvidn commented on GitHub (Feb 9, 2022):
Just a note on the suggestion in the ticket; rechecking needs to be read-only. It needs to support read-only media for instance. That said, a higher level (user facing) function could still combine rechecking with trimming/truncating.
@glassez commented on GitHub (Feb 9, 2022):
Of course.
But why not provide more checking? IMO, it should provide more detections for inconsistencies between torrent and existing data and prevent torrent from being started in case if any (unless user explicitly tell it to overwrite it, at least by setting default policy about it). I mean the following error cases:
@arvidn commented on GitHub (Feb 9, 2022):
You mean something like this? https://github.com/arvidn/libtorrent/pull/6726
It's not obvious to me that the checking of torrents needs more options. It seems reasonable for clients to compose the various features in libtorrent into higher level functionality. I'm open to be convinced it belongs in the rechecking logic.
I would be a bit uneasy of failing rechecking because of appended garbage. That would prevent seeding the content, which I think would be a disservice to the swarm. Having some user visible notification might make sense though.
@glassez commented on GitHub (Feb 9, 2022):
It just truncates. So you need either apply it unconditionally or check files by yourself.
It's acceptable in case if it is possible in a more or less convenient way.
If we're talking about oversized files then we can check them at application level, notify the user and truncate them if needed. There is no problem if such files remain untruncated for some time.
But how about the files with the same names as in torrent (and maybe even the same size) but different content? If someone points such a torrent to this content by mistake (or maybe there is malicious torrent) how libtorrent will behave? I would expect the data to remain untouched unless I explicitly allow it to overwrite.
@arvidn commented on GitHub (Feb 9, 2022):
Maybe it would make sense to issue an alert during checking when oversized files are encountered.
It depends on the flags. Generally it will silently overwrite (once it starts downloading). There's an option to fully trust the resume data and not check the files at all (also deferring any disk issues until they are touched for the first time).
But generally, there isn't a good solution to different torrents sharing a file on disk. It could get quite expensive to maintain an index to check against.
@glassez commented on GitHub (Feb 11, 2022):
I think Yes. At least this will save application form checking the files again.
@glassez commented on GitHub (Feb 11, 2022):
I don't talk about such a case.
I talk about adding torrent and pointing it to the location with existing data which, according to the user, should match the torrent, but in fact do not match. It is not difficult to imagine such a case, even without taking into account possible malicious torrents. For example, I want to support some kind of distribution by having (as it seems to me) the corresponding file on my disk, but it is actually different (for example, this is a video file encoded a little differently).
@FimbulvetrOW commented on GitHub (Mar 2, 2023):
The simple example: I have Folder with old content. There is a torrent with new content. I want to use my existing data, rehash it, and download only what I'm missing and everything that changed, leaving things that didn't change intact - which ideally means I have to download a lot less than the entire new torrent.
This fails with this client because if there are files that are now smaller in the new torrent, they will not get truncated, garbage data at the end remains. Depending on the file, this can cause no problems at all, or cause catastrophic failure, or anything in between.
How is "truncate files on re-check torrent" not an option yet?
@hormoz1 commented on GitHub (May 11, 2023):
I agree that this behavior is unexpected and can cause problems when you are trying to recheck/update torrent files from another place, in which you are expecting the files to be 100% identical when the recheck says 100%, not to have a bunch of data at the end.
Qbittorrent should optimally warn about this and allow you to correct it.
@AetherMagee commented on GitHub (Oct 7, 2023):
Am I misunderstanding something? How is this not fixed yet? I understand that there are, in fact, use cases for such behaviour, but it clearly creates way more problems than it solves
@escape0707 commented on GitHub (Oct 7, 2023):
This is theoretically already fixed in the library side by the introduction of a new dedicated truncate function mentioned previously: https://github.com/arvidn/libtorrent/pull/6726
But I don't think this function can use the mismatch info found in the hashing phase to optimize out redundant recheck of file size.
I also don't know if qBT is currently using it or not, but a simple search of the function name in this repo should do the trick.
@USSWashington commented on GitHub (Mar 5, 2024):
@escape0707, any updates on whether qBit has implemented this feature?
@escape0707 commented on GitHub (Mar 5, 2024):
@NonVolatileHuman Sadly, I just did a search of this repo's code, and
truncate_filesfunction is not used anywhere. So no, they didn't. I do think this is an important issue, tho.@SmilingSpectre commented on GitHub (Dec 10, 2024):
Sorry for being late to the event, but I realized the problem just now, and cannot understand fully your point of view.
How I see the issue: