encoding issue in directory name with Cyrillic letter й #14935

Open
opened 2026-02-22 01:35:15 -05:00 by deekerman · 11 comments
Owner

Originally created by @kambala-decapitator on GitHub (Sep 2, 2023).

qBittorrent & operating system versions

qBittorrent: 4.5.5 x64
Operating system: Windows 10 Pro 22H2 (10.0.19045) x64
Qt: 5.15.10
libtorrent-rasterbar: 1.2.19

What is the problem?

I have a torrent that contains a directory named Мод + русификатор одним файлом. I already have all the files on disk, so I checked "skip hash check" when adding the torrent. However, I received message that not all files are present on disk, so I checked hash and started download, also noted which files were absent, as that was strange. After finishing download I went to the directory to check the files and got quite surprised: I ended up with 2 identically named directories! Such stuff should definitely be impossible normally, so I went to Powershell to check directory contents there, and then it got clear (pasting with text doesn't make the issue clear):

image

The strangely-looking filename was created today (see the date) by qBt. Originally I created this directory manually as I'm the author if this torrent, I did it in macOS. The disk is exFAT if it matters.

Steps to reproduce

  1. prepare directory structure like in the attached torrent manually
  2. start downloading torrent
  3. observe that duplicate directory is created

Additional context

Looks like the issue is the letter й because it can be forged in 2 ways: the letter itself or using https://en.wikipedia.org/wiki/Combining_character Unicode feature.

Also noticed the same issue in another folder of the same torrent:

image

Never had such issue on macOS either with qBt or Transmission.

I tried creating directory with й letter using Explorer & Total Commander and they use the letter itself, not the combined version.

It's also strange that on the above SS the freshly created directory is listed in the very end instead of alphabetical order.

My torrent: Median XL 2017 + Legacy MXL mods.zip

Looks related to #12506

Log(s) & preferences file(s)

No response

Originally created by @kambala-decapitator on GitHub (Sep 2, 2023). ### qBittorrent & operating system versions qBittorrent: 4.5.5 x64 Operating system: Windows 10 Pro 22H2 (10.0.19045) x64 Qt: 5.15.10 libtorrent-rasterbar: 1.2.19 ### What is the problem? I have a torrent that contains a directory named `Мод + русификатор одним файлом`. I already have all the files on disk, so I checked "skip hash check" when adding the torrent. However, I received message that not all files are present on disk, so I checked hash and started download, also noted which files were absent, as that was strange. After finishing download I went to the directory to check the files and got quite surprised: I ended up with 2 identically named directories! Such stuff should definitely be impossible normally, so I went to Powershell to check directory contents there, and then it got clear (pasting with text doesn't make the issue clear): ![image](https://github.com/qbittorrent/qBittorrent/assets/1557784/8365381b-7b57-472d-9fe5-71af0002734a) The strangely-looking filename was created today (see the date) by qBt. Originally I created this directory manually as I'm the author if this torrent, I did it in macOS. The disk is exFAT if it matters. ### Steps to reproduce 1. prepare directory structure like in the attached torrent manually 2. start downloading torrent 3. observe that duplicate directory is created ### Additional context Looks like the issue is the letter `й` because it can be forged in 2 ways: the letter itself or using https://en.wikipedia.org/wiki/Combining_character Unicode feature. Also noticed the same issue in another folder of the same torrent: ![image](https://github.com/qbittorrent/qBittorrent/assets/1557784/c9c93916-b2bf-4322-b8a8-44ff0e5d4ca9) Never had such issue on macOS either with qBt or Transmission. I tried creating directory with й letter using Explorer & Total Commander and they use the letter itself, not the combined version. It's also strange that on the above SS the freshly created directory is listed in the very end instead of alphabetical order. My torrent: [Median XL 2017 + Legacy MXL mods.zip](https://github.com/qbittorrent/qBittorrent/files/12503402/Median.XL.2017.%2B.Legacy.MXL.mods.zip) Looks related to #12506 ### Log(s) & preferences file(s) _No response_
Author
Owner

@glassez commented on GitHub (Sep 2, 2023):

Originally I created this directory manually as I'm the author if this torrent, I did it in macOS. The disk is exFAT if it matters.

Looks like the issue is the letter й because it can be forged in 2 ways: the letter itself or using https://en.wikipedia.org/wiki/Combining_character Unicode feature.

So what does qBittorrent have to do with it? It just creates files with exactly the same names as specified in the torrent file.
This is simply an incompatibility of the ways in which some characters are encoded between different operating systems. (If there are no problems somewhere, then the guys from Apple will definitely come up with something.)

@glassez commented on GitHub (Sep 2, 2023): > Originally I created this directory manually as I'm the author if this torrent, I did it in macOS. The disk is exFAT if it matters. > Looks like the issue is the letter `й` because it can be forged in 2 ways: the letter itself or using https://en.wikipedia.org/wiki/Combining_character Unicode feature. So what does qBittorrent have to do with it? It just creates files with exactly the same names as specified in the torrent file. This is simply an incompatibility of the ways in which some characters are encoded between different operating systems. (If there are no problems somewhere, then the guys from Apple will definitely come up with something.)
Author
Owner

@kambala-decapitator commented on GitHub (Sep 2, 2023):

the torrent was created by Transmission/2.94 as I see in the header. I can't tell you whether it's some specific Transmission behavior/bug or macOS or "guys from Apple" or even Windows issue or w/e else, but saying "we don't care about compatibility" doesn't sound like a healthy approach, especially since qBt on macOS doesn't have this issue.

I don't think it's a big deal to perform unicode normalization on file names (which kinda makes sense I'd say).

@kambala-decapitator commented on GitHub (Sep 2, 2023): the torrent was created by Transmission/2.94 as I see in the header. I can't tell you whether it's some specific Transmission behavior/bug or macOS or "guys from Apple" or even Windows issue or w/e else, but saying "we don't care about compatibility" doesn't sound like a healthy approach, especially since qBt on macOS doesn't have this issue. I don't think it's a big deal to perform unicode normalization on file names (which kinda makes sense I'd say).
Author
Owner

@glassez commented on GitHub (Sep 2, 2023):

saying "we don't care about compatibility" doesn't sound like a healthy approach, especially since qBt on macOS doesn't have this issue.

qBt has no issue according to your report. What "compatibility" do you talk about? There is only incompatibility between operating systems so they create different characters when you type й on your keyboard. For qBt, Transmission etc. they are different file names.

@glassez commented on GitHub (Sep 2, 2023): > saying "we don't care about compatibility" doesn't sound like a healthy approach, especially since qBt on macOS doesn't have this issue. qBt has no issue according to your report. What "compatibility" do you talk about? There is only incompatibility between operating systems so they create different characters when you type `й` on your keyboard. For qBt, Transmission etc. they are different file names.
Author
Owner

@glassez commented on GitHub (Sep 2, 2023):

I don't think it's a big deal to perform unicode normalization on file names (which kinda makes sense I'd say).

What kind of "unicode normalization" are you talking about? Do you mean some well-known method? Or do you want to offer your own algorithm?

@glassez commented on GitHub (Sep 2, 2023): > I don't think it's a big deal to perform unicode normalization on file names (which kinda makes sense I'd say). What kind of "unicode normalization" are you talking about? Do you mean some well-known method? Or do you want to offer your own algorithm?
Author
Owner

@glassez commented on GitHub (Sep 2, 2023):

There is only incompatibility between operating systems so they create different characters when you type й on your keyboard.

I really believe it is "bug" to represent Russian letter й with two characters since such a way is intended to add diacritics to the letters but й is independent letter in Russian (not и with diacritic mark ˘).

@glassez commented on GitHub (Sep 2, 2023): > There is only incompatibility between operating systems so they create different characters when you type `й` on your keyboard. I really believe it is "bug" to represent Russian letter `й` with two characters since such a way is intended to add diacritics to the letters but `й` is independent letter in Russian (not `и` with diacritic mark `˘`).
Author
Owner

@kambala-decapitator commented on GitHub (Sep 2, 2023):

saying "we don't care about compatibility" doesn't sound like a healthy approach, especially since qBt on macOS doesn't have this issue.

qBt has no issue according to your report. What "compatibility" do you talk about? There is only incompatibility between operating systems so they create different characters when you type й on your keyboard. For qBt, Transmission etc. they are different file names.

qBt behaves inconsistently on macOS and Windows. Why do you think it's not a bug? If macOS version of qBt had created me this duplicate folder with combined й character just like the Windows version has done, the behvaior would have been consistent (albeit unexpected). Maybe it's a Qt bug, I don't know.

I don't think it's a big deal to perform unicode normalization on file names (which kinda makes sense I'd say).

What kind of "unicode normalization" are you talking about? Do you mean some well-known method? Or do you want to offer your own algorithm?

yes, there're standard algorithms. For example, NSString from macOS SDK can utilize it. https://unicode.org/reports/tr15/

There is only incompatibility between operating systems so they create different characters when you type й on your keyboard.

I really believe it is "bug" to represent Russian letter й with two characters since such a way is intended to add diacritics to the letters but й is independent letter in Russian (not и with diacritic mark ˘).

it might be a bug in Transmission that it produces 2 characters instead of one. But the process itself is not a bug, it's a feature of Unicode, please see the link given above.

@kambala-decapitator commented on GitHub (Sep 2, 2023): > > saying "we don't care about compatibility" doesn't sound like a healthy approach, especially since qBt on macOS doesn't have this issue. > > qBt has no issue according to your report. What "compatibility" do you talk about? There is only incompatibility between operating systems so they create different characters when you type `й` on your keyboard. For qBt, Transmission etc. they are different file names. qBt behaves inconsistently on macOS and Windows. Why do you think it's not a bug? If macOS version of qBt had created me this duplicate folder with combined й character just like the Windows version has done, the behvaior would have been consistent (albeit unexpected). Maybe it's a Qt bug, I don't know. > > I don't think it's a big deal to perform unicode normalization on file names (which kinda makes sense I'd say). > > What kind of "unicode normalization" are you talking about? Do you mean some well-known method? Or do you want to offer your own algorithm? yes, there're standard algorithms. For example, `NSString` from macOS SDK can utilize it. https://unicode.org/reports/tr15/ > > There is only incompatibility between operating systems so they create different characters when you type `й` on your keyboard. > > I really believe it is "bug" to represent Russian letter `й` with two characters since such a way is intended to add diacritics to the letters but `й` is independent letter in Russian (not `и` with diacritic mark `˘`). it might be a bug in Transmission that it produces 2 characters instead of one. But the process itself is not a bug, it's a feature of Unicode, please see the link given above.
Author
Owner

@glassez commented on GitHub (Sep 2, 2023):

qBt behaves inconsistently on macOS and Windows. Why do you think it's not a bug? If macOS version of qBt had created me this duplicate folder with combined й character just like the Windows version has done, the behvaior would have been consistent (albeit unexpected).

According to OP, qBittorrent doesn't create two folders, it creates the single one, the second one does already exist. So why I should think it is a bug? Of course, I have not seen the contents of the .torrent file in question, but it is unlikely that qBittorrent replaces the й character contained there with a combination of и and ˘.
You don't have duplicate folder on macOS since the names of files on disk are exactly the same as the ones in .torrent file. I suppose you could safely delete on Windows the folder with mismatched name.

@glassez commented on GitHub (Sep 2, 2023): > qBt behaves inconsistently on macOS and Windows. Why do you think it's not a bug? If macOS version of qBt had created me this duplicate folder with combined й character just like the Windows version has done, the behvaior would have been consistent (albeit unexpected). According to OP, qBittorrent doesn't create two folders, it creates the single one, the second one does already exist. So why I should think it is a bug? Of course, I have not seen the contents of the .torrent file in question, but it is unlikely that qBittorrent replaces the `й` character contained there with a combination of `и` and `˘`. You don't have duplicate folder on macOS since the names of files on disk are exactly the same as the ones in .torrent file. I suppose you could safely delete on Windows the folder with mismatched name.
Author
Owner

@glassez commented on GitHub (Sep 2, 2023):

it might be a bug in Transmission that it produces 2 characters instead of one.

Does macOS use Unicode currently? If so, Transmission should just get filenames from OS as-is.

@glassez commented on GitHub (Sep 2, 2023): >it might be a bug in Transmission that it produces 2 characters instead of one. Does macOS use Unicode currently? If so, Transmission should just get filenames from OS as-is.
Author
Owner

@luzpaz commented on GitHub (Sep 17, 2023):

@kambala-decapitator any follow-up ?

@luzpaz commented on GitHub (Sep 17, 2023): @kambala-decapitator any follow-up ?
Author
Owner

@kambala-decapitator commented on GitHub (Oct 7, 2023):

I have not seen the contents of the .torrent file in question

it's provided in the OP

it might be a bug in Transmission that it produces 2 characters instead of one.

Does macOS use Unicode currently? If so, Transmission should just get filenames from OS as-is.

yes, macOS uses Unicode


Made some tests with qBt and Transmission on macOS and Windows. Looks like the main difference is the processing of filesystem paths inside an app: qBt on both platforms uses Qt as well as Transmission for Windows, but Transmission for macOS uses native macOS SDK (Foundation framework I believe). So, any qBt and Transmission for Windows always create torrent file with a single й, but the latest Transmission for macOS creates torrent with the character decomposed into 2. Since even Transmission for different platforms is incompatible with each other, I'd rather view this is as Transmission bug.

This is how torrent created in Transmission for macOS looks in Transmission for Windows:
Screenshot 2023-09-20 081620

I also noticed that even terminal output of ls under macOS uses decomposed character:

> ls
й.txt*

> ls | xxd
00000000: d0b8 cc86 2e74 7874 0a                   .....txt.

maybe that's simply how NSString (or probably filesystem driver itself) stores unicode characters under the hood.

but anyway, it'd be great to have such compatibility between qBt for non-macOS and Transmission for macOS.

test torrent files for reference: torrents.zip

@kambala-decapitator commented on GitHub (Oct 7, 2023): > I have not seen the contents of the .torrent file in question it's provided in the OP > > it might be a bug in Transmission that it produces 2 characters instead of one. > > Does macOS use Unicode currently? If so, Transmission should just get filenames from OS as-is. yes, macOS uses Unicode <hr> Made some tests with qBt and Transmission on macOS and Windows. Looks like the main difference is the processing of filesystem paths inside an app: qBt on both platforms uses Qt as well as Transmission for Windows, but Transmission for macOS uses native macOS SDK (Foundation framework I believe). So, any qBt and Transmission for Windows always create torrent file with a single `й`, but the latest Transmission for macOS creates torrent with the character decomposed into 2. Since even Transmission for different platforms is incompatible with each other, I'd rather view this is as Transmission bug. This is how torrent created in Transmission for macOS looks in Transmission for Windows: ![Screenshot 2023-09-20 081620](https://github.com/qbittorrent/qBittorrent/assets/1557784/cf0603d9-5370-4efc-8f40-ad779d71c189) I also noticed that even terminal output of `ls` under macOS uses decomposed character: ```bash > ls й.txt* > ls | xxd 00000000: d0b8 cc86 2e74 7874 0a .....txt. ``` maybe that's simply how NSString (or probably filesystem driver itself) stores unicode characters under the hood. but anyway, it'd be great to have such compatibility between qBt for non-macOS and Transmission for macOS. test torrent files for reference: [torrents.zip](https://github.com/qbittorrent/qBittorrent/files/12837671/torrents.zip)
Author
Owner

@luzpaz commented on GitHub (Aug 14, 2024):

@glassez 👆

@luzpaz commented on GitHub (Aug 14, 2024): @glassez :point_up_2:
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/qBittorrent#14935
No description provided.