[Enhancement]: More Flexible Library Structure #1524

Open
opened 2026-02-20 02:02:23 -05:00 by deekerman · 18 comments
Owner

Originally created by @AshbyGeek on GitHub (Oct 11, 2023).

Describe the feature/enhancement

My library REALLY didn't scan well. I keep all my audiobooks as single files (almost all .m4b, but there are a couple short stories that are just an mp3 file. Should probably convert them to .m4b too). I also greatly prefer not to create directories that are going to hold a single file.

This means that my directory structure is something like this:

  • Brandon Sanderson
    • Cosmere
      • Stormlight Archive
        • "Book 1 - The Way of Kings.m4b"
        • "Book 2 - Words of Radiance.m4b"
        • "Book 3 - Oathbringer.m4b"
        • etc
      • Mistborn
        • "Book 1 - The Final Empire.m4b"
        • "Book 2 - The Well of Ascension.m4b"
        • etc
      • "The Sunlit Man.m4b"
      • "Warbreaker.m4b"
    • Rechoners
      • "Book 1 - Steelhear.m4b"
      • "Book 2 - Firefight.m4b"
      • "Book 3 - Calamity.m4b"
    • "Snapshot.m4b"
    • "The Frugal Wizards Handbook for Surviving Medieval England.m4b"
  • [more authors]
  • [single books]

The Scanner took a look at my folder structure for all of this and decided that Brandon Sanderson has a single book. Under the 'files' tab that book lists all 41 of Brandon Sanderson's books (and short stories) in my collection.

Eventually I figured out that putting .m4b files into folders pretty well sorted things out, but it would be REALLY nice to not need that.

The relevant function seems to be groupFileItemsIntoLibraryItemDirs in server/utils/scandir.js. Could that function be changed to be aware of .m4b files and return them as individual books always? That should get most people with a library like mine up and running faster.

Long term though it seems like a generally more flexible system would be useful; and possibly also adding the ability to modify the relationship between books and audio files after the fact in the web-ui. Or perhaps with a .config type file in folders that are intended to contain multiple files intended to be part of the same book.

Originally created by @AshbyGeek on GitHub (Oct 11, 2023). ### Describe the feature/enhancement My library REALLY didn't scan well. I keep all my audiobooks as single files (almost all .m4b, but there are a couple short stories that are just an mp3 file. Should probably convert them to .m4b too). I also greatly prefer not to create directories that are going to hold a single file. This means that my directory structure is something like this: - Brandon Sanderson - Cosmere - Stormlight Archive - "Book 1 - The Way of Kings.m4b" - "Book 2 - Words of Radiance.m4b" - "Book 3 - Oathbringer.m4b" - etc - Mistborn - "Book 1 - The Final Empire.m4b" - "Book 2 - The Well of Ascension.m4b" - etc - "The Sunlit Man.m4b" - "Warbreaker.m4b" - Rechoners - "Book 1 - Steelhear.m4b" - "Book 2 - Firefight.m4b" - "Book 3 - Calamity.m4b" - "Snapshot.m4b" - "The Frugal Wizards Handbook for Surviving Medieval England.m4b" - [more authors] - [single books] The Scanner took a look at my folder structure for all of this and decided that Brandon Sanderson has a single book. Under the 'files' tab that book lists all 41 of Brandon Sanderson's books (and short stories) in my collection. Eventually I figured out that putting .m4b files into folders pretty well sorted things out, but it would be REALLY nice to not need that. The relevant function seems to be `groupFileItemsIntoLibraryItemDirs` in [server/utils/scandir.js](https://github.com/advplyr/audiobookshelf/blob/753ae3d7dc18e821ddf3d49a49aa3a9015fb5947/server/utils/scandir.js#L133). Could that function be changed to be aware of .m4b files and return them as individual books always? That should get most people with a library like mine up and running faster. Long term though it seems like a generally more flexible system would be useful; and possibly also adding the ability to modify the relationship between books and audio files after the fact in the web-ui. Or perhaps with a .config type file in folders that are intended to contain multiple files intended to be part of the same book.
Author
Owner

@nichwall commented on GitHub (Oct 11, 2023):

This can get complicated if metadata files (either ABS generated or from somewhere else) or cover art is stored in the same directory. For the most part it can probably be inferred that anything with the same beginning part of the name is the same item, but many times it's just a cover.[ext] or metadata.[ext], with no way to tell which library item it goes to. Some people store multiple covers of the same book as well.

The documentation shows each library item in its own directory, but this could be more clear.

Sort of relevant issue: https://github.com/advplyr/audiobookshelf/issues/528

@nichwall commented on GitHub (Oct 11, 2023): This can get complicated if metadata files (either ABS generated or from somewhere else) or cover art is stored in the same directory. For the most part it can probably be inferred that anything with the same beginning part of the name is the same item, but many times it's just a `cover.[ext]` or `metadata.[ext]`, with no way to tell which library item it goes to. Some people store multiple covers of the same book as well. The documentation shows each library item in its own directory, but this could be more clear. Sort of relevant issue: https://github.com/advplyr/audiobookshelf/issues/528
Author
Owner

@AshbyGeek commented on GitHub (Oct 11, 2023):

Ok, I get that. It still could be done but would require essentially 2 significantly different ways of scanning a directory. Those would have to be backed by documentation and decision logic. Gets to be quite a bit of work and still leaves a rather inflexible scanning system.

I do like suggestion #528. I've seen systems like that; some that worked well and some that seemed unnecessarily finicky. A good implementation would resolve some of the underlying rigidness of the scanner. That issue linked a tool that only allows for one pattern though, which still limits the system's flexibility. If instead it worked more like a .gitignore allowing multiple lines of patterns regex style, that would be dang awesome.

What of using a .config type system? Add a few audiobookshelf.config files to your library directories to override default scan settings for that folder and subfolders. Things like:

  • specifying whether files in the current directory all belong to a single book or multiple individual books
  • specifying whether directories in the current directory are expected to be authors, series, book titles, or ignored
  • specifying file/directory name/extension patterns that should be ignored or treated as metadata or cover art
    Yeah, this takes a lot of inspiration from .editorconfig and .gitignore files.

P.S. Should I move these comments to that thread and discuss them there?

@AshbyGeek commented on GitHub (Oct 11, 2023): Ok, I get that. It still could be done but would require essentially 2 significantly different ways of scanning a directory. Those would have to be backed by documentation and decision logic. Gets to be quite a bit of work and still leaves a rather inflexible scanning system. I do like suggestion #528. I've seen systems like that; some that worked well and some that seemed unnecessarily finicky. A good implementation would resolve some of the underlying rigidness of the scanner. That issue linked a tool that only allows for one pattern though, which still limits the system's flexibility. If instead it worked more like a .gitignore allowing multiple lines of patterns regex style, that would be dang awesome. What of using a .config type system? Add a few `audiobookshelf.config` files to your library directories to override default scan settings for that folder and subfolders. Things like: - specifying whether files in the current directory all belong to a single book or multiple individual books - specifying whether directories in the current directory are expected to be authors, series, book titles, or ignored - specifying file/directory name/extension patterns that should be ignored or treated as metadata or cover art Yeah, this takes a lot of inspiration from .editorconfig and .gitignore files. P.S. Should I move these comments to that thread and discuss them there?
Author
Owner

@AshbyGeek commented on GitHub (Oct 11, 2023):

Also, I agree that a quick touch-up of the documentation would help relieve the initial issue, which was my misunderstanding of the system.

@AshbyGeek commented on GitHub (Oct 11, 2023): Also, I agree that a quick touch-up of the documentation would help relieve the initial issue, which was my misunderstanding of the system.
Author
Owner

@nichwall commented on GitHub (Oct 11, 2023):

I agree with you that more configuration would be nice, but also not sure how long it would be until it's implemented. More options means more edge cases, and I haven't put much thought into something like that (not sure if advplyr or someone else has either with how many moving parts there are on the project).

P.S. Should I move these comments to that thread and discuss them there?

This thread is probably fine. I think there's an issue that has more discussion, but I can't find it so I just linked to that one.

Also, I agree that a quick touch-up of the documentation would help relieve the initial issue, which was my misunderstanding of the system.

Yeah. A lot of these little things need to be clarified for newer people, so thanks for working on that :)

@nichwall commented on GitHub (Oct 11, 2023): I agree with you that more configuration would be nice, but also not sure how long it would be until it's implemented. More options means more edge cases, and I haven't put much thought into something like that (not sure if advplyr or someone else has either with how many moving parts there are on the project). > P.S. Should I move these comments to that thread and discuss them there? This thread is probably fine. I think there's an issue that has more discussion, but I can't find it so I just linked to that one. > Also, I agree that a quick touch-up of the documentation would help relieve the initial issue, which was my misunderstanding of the system. Yeah. A lot of these little things need to be clarified for newer people, so thanks for working on that :)
Author
Owner

@AshbyGeek commented on GitHub (Oct 11, 2023):

Let me know if you have suggestions for the exact wording. I'm not exactly a word-smith.

@AshbyGeek commented on GitHub (Oct 11, 2023): Let me know if you have suggestions for the exact wording. I'm not exactly a word-smith.
Author
Owner

@advplyr commented on GitHub (Oct 12, 2023):

Another thing to consider is that some audiobooks are comprised of multiple m4b files so we couldn't assume that an m4b file is an individual book.

@advplyr commented on GitHub (Oct 12, 2023): Another thing to consider is that some audiobooks are comprised of multiple m4b files so we couldn't assume that an m4b file is an individual book.
Author
Owner

@alderete commented on GitHub (Jan 14, 2024):

I've run into a similar issue with my own library of audiobooks. My organization comes from 15+ years of using iTunes to organize my audiobooks. If you have the "Keep iTunes Media folder organized" setting enabled, iTunes puts all books by the same author into a single folder. (Music gets grouped into albums, which are translated into separate folders, but iTunes doesn't didn't do that for audiobooks.)

Multi-part books are "grouped" by the Title field (which I think is actually Album in the actual ID3 tags). Exact same value for Title means you're part of the same book. Differentiation between parts and sequence control is done by either the Name field (Chapter in ID3 tags?), or disk and track numbers. (I forget some details here, but could research it if this is a serious avenue of investigation.)

The thing is, most of my books are a single .m4b file, with chapter markers, and my metadata is fantastic. Books without DRM have been merged into a single file, and I've been diligent over the years in making sure the metadata is correct and consistent for authors, titles, series, and so on.

What I really want is a setting in AudiobookShelf "Ignore folder structure in favor of embedded metadata". I was hoping that disabling the "Folder structure" item in the library Scanner settings would do the trick, but apparently not. Sub-folder grouping into a book happens whether that's enabled or not. (That seems a bit counter-intuitive.)

The documentation makes it seem like, once you start down the road of sub-folders, you have to go all the way down that road, or you end up with a mess like mine. Which means there's two options for folder organization:

  1. Fully structured (labor intensive) folders, sub-folders, sub-sub-folders, etc., or
  2. An undifferentiated mess of all audiobooks in one top folder.

Really would be nice if there was a middle option. Grouping by author in folders is both very natural, and matches how iTunes did it for more than a decade. (Not that iTunes was ever the perfect audiobooks library manager. On the other hand, haven't found anything better in the years since Apple moved to the Music app, either...)

@alderete commented on GitHub (Jan 14, 2024): I've run into a similar issue with my own library of audiobooks. My organization comes from 15+ years of using iTunes to organize my audiobooks. If you have the "Keep iTunes Media folder organized" setting enabled, iTunes puts **all** books by the same author into a single folder. (Music gets grouped into albums, which are translated into separate folders, but iTunes ~doesn't~ didn't do that for audiobooks.) Multi-part books are "grouped" by the Title field (which I think is actually Album in the actual ID3 tags). Exact same value for Title means you're part of the same book. Differentiation between parts and sequence control is done by either the Name field (Chapter in ID3 tags?), or disk and track numbers. (I forget some details here, but could research it if this is a serious avenue of investigation.) The thing is, most of my books are a single .m4b file, with chapter markers, and my metadata is fantastic. Books without DRM have been merged into a single file, and I've been diligent over the years in making sure the metadata is correct and consistent for authors, titles, series, and so on. What I really want is a setting in AudiobookShelf "Ignore folder structure in favor of embedded metadata". I was hoping that disabling the "Folder structure" item in the library Scanner settings would do the trick, but apparently not. Sub-folder grouping into a book happens whether that's enabled or not. (That seems a bit counter-intuitive.) The documentation makes it seem like, once you start down the road of sub-folders, you have to go _all the way_ down that road, or you end up with a mess like mine. Which means there's two options for folder organization: 1. Fully structured (labor intensive) folders, sub-folders, sub-sub-folders, etc., or 2. An undifferentiated mess of all audiobooks in one top folder. Really would be nice if there was a middle option. Grouping by author in folders is both very natural, and matches how iTunes did it for more than a decade. (Not that iTunes was ever the perfect audiobooks library manager. On the other hand, haven't found anything better in the years since Apple moved to the Music app, either...)
Author
Owner

@advplyr commented on GitHub (Jan 14, 2024):

What I really want is a setting in AudiobookShelf "Ignore folder structure in favor of embedded metadata". I was hoping that disabling the "Folder structure" item in the library Scanner settings would do the trick, but apparently not. Sub-folder grouping into a book happens whether that's enabled or not. (That seems a bit counter-intuitive.)

That is the order of precedence for setting metadata on the book, it doesn't have to do with how the books are grouped. In the book scanner guide the first step is to group files into books https://www.audiobookshelf.org/guides/book-scanner

I'm not sure I understand what your suggestion is to handle both single file books and multi-file books in the same directory structure.

@advplyr commented on GitHub (Jan 14, 2024): > What I really want is a setting in AudiobookShelf "Ignore folder structure in favor of embedded metadata". I was hoping that disabling the "Folder structure" item in the library Scanner settings would do the trick, but apparently not. Sub-folder grouping into a book happens whether that's enabled or not. (That seems a bit counter-intuitive.) That is the order of precedence for setting metadata on the book, it doesn't have to do with how the books are grouped. In the book scanner guide the first step is to group files into books https://www.audiobookshelf.org/guides/book-scanner I'm not sure I understand what your suggestion is to handle both single file books and multi-file books in the same directory structure.
Author
Owner

@ElDubsNZ commented on GitHub (Jan 14, 2024):

I'm not sure I understand what your suggestion is to handle both single file books and multi-file books in the same directory structure.

Metadata. It should be able to override the folder structure rules.

So if I've got the seven Harry Potter audiobooks in one directory, by default audiobookshelf should identify them as a multi-file book as usual. But this should be overridden if ABS identifies metadata that makes clear they're distinct books. Such as album name on mp3 files, or title on opf/nfo files etc...

This means ABS can work with the way we store our files, rather than us storing our files in a way that works for ABS. Just creates flexibility.

@ElDubsNZ commented on GitHub (Jan 14, 2024): > I'm not sure I understand what your suggestion is to handle both single file books and multi-file books in the same directory structure. Metadata. It should be able to override the folder structure rules. So if I've got the seven Harry Potter audiobooks in one directory, by default audiobookshelf should identify them as a multi-file book as usual. But this should be overridden if ABS identifies metadata that makes clear they're distinct books. Such as album name on mp3 files, or title on opf/nfo files etc... This means ABS can work with the way we store our files, rather than us storing our files in a way that works for ABS. Just creates flexibility.
Author
Owner

@alderete commented on GitHub (Jan 14, 2024):

That is the order of precedence for setting metadata on the book, it doesn't have to do with how the books are grouped. In the book scanner guide the first step is to group files into books https://www.audiobookshelf.org/guides/book-scanner

I understand this. This is why I wrote that, once you start down the sub-folders route, you have to go all in on it. (Which I do not want to do.)

Question for you: Do you not consider the title of a book to be a piece of metadata? I certainly do. It's the most fundamental piece of metadata, but it's still metadata. I don't see the purpose of forcing the use of folder name to set the book title (and thus group/separate files into books).

I'm not sure I understand what your suggestion is to handle both single file books and multi-file books in the same directory structure.

My suggestion is, ignore the directory structure. Don't group on it before you start processing. Use the embedded metadata to set the title, and group on that.

I get that it's not for everyone. I'd be shocked if folks who have already done all that work to create the elaborate folder structure would want to undo that work. I certainly don't want to repeat all the work I did to get my metadata correct!

But using all of these nested folders feels like a "lowest common denominator" solution that's ignoring the value of embedded metadata. Why would you want to separate the metadata about a title from the actual files? Embedding all of the details in the folder names, in a folder structure that might look familiar to someone using a computer in the 90s (or 70s) seems like a step backwards. We have this great solution, separate, structured metadata fields, a significant improvement over just using semi-structured string file and folder names. Why not use it? Or, at least let folks who want to use it still be able to use AudiobookShelf?

But, in the end, it's your project. If you don't see the value, not going to fight about it.

@alderete commented on GitHub (Jan 14, 2024): > That is the order of precedence for setting metadata on the book, it doesn't have to do with how the books are grouped. In the book scanner guide the first step is to group files into books https://www.audiobookshelf.org/guides/book-scanner I understand this. This is why I wrote that, once you start down the sub-folders route, you have to go all in on it. (Which I do not want to do.) Question for you: Do you _not_ consider the title of a book to be a piece of metadata? I certainly do. It's the most fundamental piece of metadata, but it's still metadata. I don't see the purpose of *forcing* the use of folder name to set the book title (and thus group/separate files into books). > I'm not sure I understand what your suggestion is to handle both single file books and multi-file books in the same directory structure. My suggestion is, _ignore_ the directory structure. Don't group on it before you start processing. Use the embedded metadata to set the title, and group on _that._ I get that it's not for everyone. I'd be shocked if folks who have already done all that work to create the elaborate folder structure would want to undo that work. I certainly don't want to repeat all the work I did to get my metadata correct! But using all of these nested folders feels like a "lowest common denominator" solution that's ignoring the value of embedded metadata. Why would you _want_ to separate the metadata about a title from the actual files? Embedding all of the details in the folder names, in a folder structure that might look familiar to someone using a computer in the 90s (or 70s) seems like a step backwards. We have this great solution, separate, structured metadata fields, a significant improvement over just using semi-structured string file and folder names. Why not use it? Or, at least let folks who _want_ to use it still be able to use AudiobookShelf? But, in the end, it's your project. If you don't see the value, not going to fight about it.
Author
Owner

@advplyr commented on GitHub (Jan 15, 2024):

It's not about seeing the value in it. It is about it being a viable option.

I've spent a lot of time now working with meta tags for audiobook audio files and I don't see how you could determine based on the meta tags whether an audio file represents an entire book or is a track.

The best we could do is take a guess at it, which is fine as an additional setting to opt-in to, but there is no fool proof way to handle this that I can see. If you can think of an implementation that handles this and want to write some pseudocode or mock something up somewhere I'll take a look at it.

@advplyr commented on GitHub (Jan 15, 2024): It's not about seeing the value in it. It is about it being a viable option. I've spent a lot of time now working with meta tags for audiobook audio files and I don't see how you could determine based on the meta tags whether an audio file represents an entire book or is a track. The best we could do is take a guess at it, which is fine as an additional setting to opt-in to, but there is no fool proof way to handle this that I can see. If you can think of an implementation that handles this and want to write some pseudocode or mock something up somewhere I'll take a look at it.
Author
Owner

@advplyr commented on GitHub (Jan 15, 2024):

Your comment makes me think there is still a misunderstanding on how the scanner works. Abs doesn't ignore embedded metadata. Check the scanner guide I shared above and check the documentation on audio file meta tags.

@advplyr commented on GitHub (Jan 15, 2024): Your comment makes me think there is still a misunderstanding on how the scanner works. Abs doesn't ignore embedded metadata. Check the scanner guide I shared above and check the documentation on audio file meta tags.
Author
Owner

@ElDubsNZ commented on GitHub (Jan 21, 2024):

Here's an example:

If I've got these files files:

Z:\Audiobooks\Harry Potter\Harry Potter and the Philosopher's Stone.m4b
Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 1.m4b
Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 2.m4b

The challenge you highlight is ABS being able to tell that two of these are part of the same book, and one is a separate book, and they're all the same series. ABS would normally rely on these two books being in their own distinct folders.

That's where metadata should come in. If say we've got metadata like this:

Z:\Audiobooks\Harry Potter\Harry Potter and the Philosopher's Stone.opf
Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 1.opf
Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 2.opf

It could make this distinction with <dc:title>Harry Potter and the Chamber of Secrets</dc:title> and then add support for <meta property="track" refines="#title" scheme="onix:codelist5">1</meta> for chapter numbers.

Essentially, have ABS follow its default process of assuming they are all chapters in the same book if they're in the same directory, but if it finds metadata with different titles, then separate them into different books. Then use track number for chapter numbers.

This would require the expectation that all chapters in the same book would have the same "title" in metadata. Therefore you could make this entire feature optional for those who's metadata doesn't reflect this.

@ElDubsNZ commented on GitHub (Jan 21, 2024): Here's an example: If I've got these files files: Z:\Audiobooks\Harry Potter\Harry Potter and the Philosopher's Stone.m4b Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 1.m4b Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 2.m4b The challenge you highlight is ABS being able to tell that two of these are part of the same book, and one is a separate book, and they're all the same series. ABS would normally rely on these two books being in their own distinct folders. That's where metadata should come in. If say we've got metadata like this: Z:\Audiobooks\Harry Potter\Harry Potter and the Philosopher's Stone.opf Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 1.opf Z:\Audiobooks\Harry Potter\Harry Potter and the Chamber of Secrets - Chapter 2.opf It could make this distinction with `<dc:title>Harry Potter and the Chamber of Secrets</dc:title>` and then add support for `<meta property="track" refines="#title" scheme="onix:codelist5">1</meta>` for chapter numbers. Essentially, have ABS follow its default process of assuming they are all chapters in the same book if they're in the same directory, but if it finds metadata with different titles, then separate them into different books. Then use track number for chapter numbers. This would require the expectation that all chapters in the same book would have the same "title" in metadata. Therefore you could make this entire feature optional for those who's metadata doesn't reflect this.
Author
Owner

@alderete commented on GitHub (Jan 22, 2024):

@advplyr On the subject of current scanner behavior, I’m far from confident I understand how the scanner works. Here's what I think I understand from the documentation. A library scan proceeds like this:

  1. File system scan
    1. Top-level directory files become individual books, one book per file
    2. Sub-directories are recursively scanned
      1. All files in each individual directory are grouped together into a single book
  2. Reconciliation
    1. Existing books in ABS are checked to verify their referenced files are still present
    2. Books found in the file system scan are checked to see if they are in ABS already
      1. If not already in ABS, create a new book
      2. Else, check if the book is updated
        1. If it is, proceed to metadata processing
        2. Else, skip, move to next book
  3. Metadata processing and attachment
    1. New and updated books are probed for metadata
    2. Book track references are assigned to the book, in order
    3. Secondary (non-audio) files are checked, attached to book
    4. Cover image checked, set
    5. Book metadata is scanned for (via prioritized sources), copied to ABS

Writing this out from the docs, I’m certain I don’t have it correct. But I suspect the docs aren’t completely in sync with the code, either. (Is the scanner code relatively straightforward? I could read through and see if I can clarify further, with the goal of updating the book scanner doc topic.)

The fundamental aspect, though, is that:

  • The file system scan uses directories to group files together into books.
  • Once the grouping is set, while other metadata might be updated from other sources, the files for the book can't be changed.

This makes a lot of sense for numerous reasons:

  • Using the file system structure and file artifacts, it’s possible to describe details of a book collection that are hard to capture in ID3 tag metadata alone. For example, series is poorly supported in ID3 tags. (Well, more there’s so much inconsistency across providers and apps, it’s nearly pointless.) Series and sub-series are especially well-suited to being captured in the “containment” abstraction of directories and sub-directories.
  • Using the parent folder to group all of the tracks for a book makes it simple to get all of the files for a book in one go. You don’t need to track state while you’re scanning through files. Every folder with files is a book. Every file in a book folder belongs to that book. Simple.

The downside is, the required file system structure is complex, and labor-intensive to create and maintain without automation.

Without in any way suggesting that this is better, here’s a pseudo-algorithm that might work more correctly for my library:

currentBook = newBook()
For each file in directory as item:
   if (item is directory)
      recurse into it
   else if (item is not audio track)
      currentBook.addFileAsSupplementary(item)
   else
      if(getTitle(item) != getTitle(currentBook))
         currentBook = newBook()
      currentBook.addFileAsTrack(item)

The idea is, you need to (a) track a bit of state, in the form of the current book, and (b) get the title metadata from each file, and check it to see if it’s changed, that is, you’ve moved to a new book.

Significantly, the algorithm doesn’t care about the file system arrangement. Flat files all in one folder, files in author folders only, or deeply nested files, it’s all the same. (Well, you can’t have randomly named or organized files, a book’s files do need to sort together in the file system traversal order.)

This falls apart in collections where the title metadata is a mess. Tell me about it! You should have seen my iTunes library a decade ago.

And it requires you to edit the metadata for a book’s track to get correct behavior, which is harder to do than just editing file and folder names. There are a number of good tools for doing it, but to my knowledge, nothing that e.g. shows a simple list like you’d see in macOS Finder or Windows Explorer. (Or iTunes, back in the day.)

And it semi-requires you to merge multi-file audiobooks down into a minimal number of actual files, ideally one or two per book. And it eschews some of the deeper things that ABS is able to parse, like series. And, and, and!

These are significant trade-offs from using the existing scanner—but all technical systems have trade-offs, it’s a question of which matches your needs better.

For me, I’ve already done the work to make sure my file system is simple, book tracks are merged together with internal chapter markers, and the metadata is mostly correct. That makes my way better for me.

For folks who have already gone all-in on ABS, they’ve done the work to create the file system structure that works with ABS book scanner behavior. That makes the current way better for them.

In an ideal world, there’d be a preference to choose between the two. The question is, how much work would it be to create/revise a scanner implementation that supports both? Are there other ABS assumptions or decisions that would break with a metadata-first scanner? (I suspect there’s at least some issues around the supplementary files, ebooks, etc.)

@alderete commented on GitHub (Jan 22, 2024): @advplyr On the subject of current scanner behavior, I’m far from confident I understand how the scanner works. Here's what I _think_ I understand from the documentation. A library scan proceeds like this: 1. File system scan 1. Top-level directory files become individual books, one book per file 2. Sub-directories are recursively scanned 1. All files in each individual directory are grouped together into a single book 2. Reconciliation 1. Existing books in ABS are checked to verify their referenced files are still present 2. Books found in the file system scan are checked to see if they are in ABS already 1. If not already in ABS, create a new book 2. Else, check if the book is updated 1. If it is, proceed to metadata processing 2. Else, skip, move to next book 3. Metadata processing and attachment 1. New and updated books are probed for metadata 2. Book track references are assigned to the book, in order 3. Secondary (non-audio) files are checked, attached to book 4. Cover image checked, set 5. Book metadata is scanned for (via prioritized sources), copied to ABS Writing this out from the docs, I’m certain I don’t have it correct. But I suspect the docs aren’t completely in sync with the code, either. (Is the scanner code relatively straightforward? I could read through and see if I can clarify further, with the goal of updating the book scanner doc topic.) The fundamental aspect, though, is that: * The file system scan uses directories to group files together into books. * Once the grouping is set, while other metadata might be updated from other sources, the files for the book can't be changed. This makes a lot of sense for numerous reasons: * Using the file system structure and file artifacts, it’s possible to describe details of a book collection that are hard to capture in ID3 tag metadata alone. For example, series is poorly supported in ID3 tags. (Well, more there’s so much inconsistency across providers and apps, it’s nearly pointless.) Series and sub-series are especially well-suited to being captured in the “containment” abstraction of directories and sub-directories. * Using the parent folder to group all of the tracks for a book makes it simple to get _all_ of the files for a book in one go. You don’t need to track state while you’re scanning through files. Every folder with files is a book. Every file in a book folder belongs to that book. Simple. The downside is, the required file system structure is complex, and labor-intensive to create and maintain without automation. Without in any way suggesting that this is better, here’s a pseudo-algorithm that might work more correctly for _my_ library: ``` currentBook = newBook() For each file in directory as item: if (item is directory) recurse into it else if (item is not audio track) currentBook.addFileAsSupplementary(item) else if(getTitle(item) != getTitle(currentBook)) currentBook = newBook() currentBook.addFileAsTrack(item) ``` The idea is, you need to (a) track a bit of state, in the form of the current book, and (b) get the title metadata from each file, and check it to see if it’s changed, that is, you’ve moved to a new book. Significantly, the algorithm doesn’t care about the file system arrangement. Flat files all in one folder, files in author folders only, or deeply nested files, it’s all the same. (Well, you can’t have randomly named or organized files, a book’s files _do_ need to sort together in the file system traversal order.) This falls apart in collections where the title metadata is a mess. Tell me about it! You should have seen my iTunes library a decade ago. And it requires you to edit the metadata for a book’s track to get correct behavior, which is harder to do than just editing file and folder names. There are a number of good tools for doing it, but to my knowledge, nothing that e.g. shows a simple list like you’d see in macOS Finder or Windows Explorer. (Or iTunes, back in the day.) And it semi-requires you to merge multi-file audiobooks down into a minimal number of actual files, ideally one or two per book. And it eschews some of the deeper things that ABS is able to parse, like series. And, and, and! These are significant trade-offs from using the existing scanner—but all technical systems have trade-offs, it’s a question of which matches your needs better. For me, I’ve already done the work to make sure my file system is simple, book tracks are merged together with internal chapter markers, and the metadata is mostly correct. That makes my way better **for me.** For folks who have already gone all-in on ABS, they’ve done the work to create the file system structure that works with ABS book scanner behavior. That makes the current way better **for them.** In an ideal world, there’d be a preference to choose between the two. The question is, how much work would it be to create/revise a scanner implementation that supports both? Are there other ABS assumptions or decisions that would break with a metadata-first scanner? (I suspect there’s at least some issues around the supplementary files, ebooks, etc.)
Author
Owner

@erazer-2000 commented on GitHub (Jan 2, 2025):

hello, i have the same problem with sorting.
the folder structure is something like this
author\series\book1.mp3, book2.mp3 ... bookN.mp3

the main inconvenience is that abs considers a book a series of books, there are more than 5000 books in the library and shoving each file into its own folder is not an option.
changing the Order of priority of metadata does not work at all, neither Folder structure, nor Audio file meta tags OR ebook metadata give any effect (tags series and series-part are filled. the series is created but with one book).
as a crutch I can suggest somehow marking files or folders, for example, if at the beginning or end of the file there is, for example, the symbol № or #, consider it a separate book.
yes, in the end, make a checkbox in the chapter editor that this is not a chapter but a separate book.

@erazer-2000 commented on GitHub (Jan 2, 2025): hello, i have the same problem with sorting. the folder structure is something like this author\series\book1.mp3, book2.mp3 ... bookN.mp3 the main inconvenience is that abs considers a book a series of books, there are more than 5000 books in the library and shoving each file into its own folder is not an option. changing the Order of priority of metadata does not work at all, neither Folder structure, nor Audio file meta tags OR ebook metadata give any effect (tags series and series-part are filled. the series is created but with one book). as a crutch I can suggest somehow marking files or folders, for example, if at the beginning or end of the file there is, for example, the symbol № or #, consider it a separate book. yes, in the end, make a checkbox in the chapter editor that this is not a chapter but a separate book.
Author
Owner

@Pankekiks commented on GitHub (May 14, 2025):

Describe the feature/enhancement

My library REALLY didn't scan well. I keep all my audiobooks as single files (almost all .m4b, but there are a couple short stories that are just an mp3 file. Should probably convert them to .m4b too). I also greatly prefer not to create directories that are going to hold a single file.

This means that my directory structure is something like this:

  • Brandon Sanderson

    • Cosmere

      • Stormlight Archive

        • "Book 1 - The Way of Kings.m4b"
        • "Book 2 - Words of Radiance.m4b"
        • "Book 3 - Oathbringer.m4b"
        • etc
      • Mistborn

        • "Book 1 - The Final Empire.m4b"
        • "Book 2 - The Well of Ascension.m4b"
        • etc
      • "The Sunlit Man.m4b"

      • "Warbreaker.m4b"

    • Rechoners

      • "Book 1 - Steelhear.m4b"
      • "Book 2 - Firefight.m4b"
      • "Book 3 - Calamity.m4b"
    • "Snapshot.m4b"

    • "The Frugal Wizards Handbook for Surviving Medieval England.m4b"

  • [more authors]

  • [single books]

The Scanner took a look at my folder structure for all of this and decided that Brandon Sanderson has a single book. Under the 'files' tab that book lists all 41 of Brandon Sanderson's books (and short stories) in my collection.

Eventually I figured out that putting .m4b files into folders pretty well sorted things out, but it would be REALLY nice to not need that.

The relevant function seems to be groupFileItemsIntoLibraryItemDirs in server/utils/scandir.js. Could that function be changed to be aware of .m4b files and return them as individual books always? That should get most people with a library like mine up and running faster.

Long term though it seems like a generally more flexible system would be useful; and possibly also adding the ability to modify the relationship between books and audio files after the fact in the web-ui. Or perhaps with a .config type file in folders that are intended to contain multiple files intended to be part of the same book.

You have and awesome Library, is there a way you can share some of audiobooks?

@Pankekiks commented on GitHub (May 14, 2025): > ### Describe the feature/enhancement > My library REALLY didn't scan well. I keep all my audiobooks as single files (almost all .m4b, but there are a couple short stories that are just an mp3 file. Should probably convert them to .m4b too). I also greatly prefer not to create directories that are going to hold a single file. > > This means that my directory structure is something like this: > > * Brandon Sanderson > > * Cosmere > > * Stormlight Archive > > * "Book 1 - The Way of Kings.m4b" > * "Book 2 - Words of Radiance.m4b" > * "Book 3 - Oathbringer.m4b" > * etc > * Mistborn > > * "Book 1 - The Final Empire.m4b" > * "Book 2 - The Well of Ascension.m4b" > * etc > * "The Sunlit Man.m4b" > * "Warbreaker.m4b" > * Rechoners > > * "Book 1 - Steelhear.m4b" > * "Book 2 - Firefight.m4b" > * "Book 3 - Calamity.m4b" > * "Snapshot.m4b" > * "The Frugal Wizards Handbook for Surviving Medieval England.m4b" > * [more authors] > * [single books] > > The Scanner took a look at my folder structure for all of this and decided that Brandon Sanderson has a single book. Under the 'files' tab that book lists all 41 of Brandon Sanderson's books (and short stories) in my collection. > > Eventually I figured out that putting .m4b files into folders pretty well sorted things out, but it would be REALLY nice to not need that. > > The relevant function seems to be `groupFileItemsIntoLibraryItemDirs` in [server/utils/scandir.js](https://github.com/advplyr/audiobookshelf/blob/753ae3d7dc18e821ddf3d49a49aa3a9015fb5947/server/utils/scandir.js#L133). Could that function be changed to be aware of .m4b files and return them as individual books always? That should get most people with a library like mine up and running faster. > > Long term though it seems like a generally more flexible system would be useful; and possibly also adding the ability to modify the relationship between books and audio files after the fact in the web-ui. Or perhaps with a .config type file in folders that are intended to contain multiple files intended to be part of the same book. You have and awesome Library, is there a way you can share some of audiobooks?
Author
Owner

@Timo-1979 commented on GitHub (Jun 2, 2025):

I think it would great to support single file audiobook ind authors- and series-folders (a folder that contains subfolders (folder that not for discs) or if no multi-file-book is available a file like ’.series‘ cloud mark this as a series folder.).

Via naming-convention there could be placed one cover (.jpeg/.png), one ebook (.epub/.pdf) and metadata (.json) also in this folder. If multiple covers/ebook are wanted, than a subfolder must be created.

This structure would also be good for radioplays - one episode has a length of a only one CDs. And have many episodes. (i.e.: „Die drei Fragezeichen“ >230 episodes)

The Filestructure could like this:

audiobooks
   +-- Author 1
   |   +-- Book_A_01.mp3
   |   +-- Book_A_02.mp3
   |   +-- Book_A_02.epub
   |   +-- Book_A_02.json
   |       
   +-- Author 2
       +-- Book_01
       |   +-- chapter01.mp3
       |   +-- chapter02.mp3
       |    
       +-- Book_02
       |   +-- chapter01.mp3
       |   +-- chapter02.mp3
       |   +-- Book_02.epub
       |    
       +-- Book_03.mp3
       +-- Book_04.mp3
       +-- Book_04.epub
       |    
       +-- BookSeries_A
       |   +-- Series_Book_A_01
       |   |   +-- chapter01.mp3
       |   |   +-- chapter02.mp3
       |   | 
       |    +-- Book_A_02
       |   |   +-- chapter01.mp3
       |   |   +-- chapter02.mp3
       |   |   +-- Book_02.epub
       |   |    
       |   +-- Series_Book_A_03.mp3
       |   +-- Series_Book_A_04.mp3
       |   +-- Series_Book_A_04.epub
       |
       +-- BookSeries_B
           +-- .series
           +-- Series_Book_B_01.mp3
           +-- Series_Book_B_02.mp3
           +-- Series_Book_B_02.epub
           +-- Series_Book_B_02.json
@Timo-1979 commented on GitHub (Jun 2, 2025): I think it would great to support single file audiobook ind authors- and series-folders (a folder that contains subfolders (folder that not for discs) or if no multi-file-book is available a file like ’.series‘ cloud mark this as a series folder.). Via naming-convention there could be placed one cover (.jpeg/.png), one ebook (.epub/.pdf) and metadata (.json) also in this folder. If multiple covers/ebook are wanted, than a subfolder must be created. This structure would also be good for radioplays - one episode has a length of a only one CDs. And have many episodes. (i.e.: „Die drei Fragezeichen“ >230 episodes) The Filestructure could like this: ``` audiobooks +-- Author 1 | +-- Book_A_01.mp3 | +-- Book_A_02.mp3 | +-- Book_A_02.epub | +-- Book_A_02.json | +-- Author 2 +-- Book_01 | +-- chapter01.mp3 | +-- chapter02.mp3 | +-- Book_02 | +-- chapter01.mp3 | +-- chapter02.mp3 | +-- Book_02.epub | +-- Book_03.mp3 +-- Book_04.mp3 +-- Book_04.epub | +-- BookSeries_A | +-- Series_Book_A_01 | | +-- chapter01.mp3 | | +-- chapter02.mp3 | | | +-- Book_A_02 | | +-- chapter01.mp3 | | +-- chapter02.mp3 | | +-- Book_02.epub | | | +-- Series_Book_A_03.mp3 | +-- Series_Book_A_04.mp3 | +-- Series_Book_A_04.epub | +-- BookSeries_B +-- .series +-- Series_Book_B_01.mp3 +-- Series_Book_B_02.mp3 +-- Series_Book_B_02.epub +-- Series_Book_B_02.json ```
Author
Owner

@kevincox commented on GitHub (Feb 8, 2026):

I think people are correct that metadata isn't always going to be enough to tell. But I think it would be great if these could be overridden in the interface for manual organization. So basically if you want the automatic organization to work then you should follow the recommended structure. However it would be nice to also be able to just override the directory-based grouping. I end up commonly doing the following operations and I would love if I could just do it in the UI rather than needing to move the source files around.

  1. Mark an ebook as the same book as an audiobook.
  2. Split a book into multiple. (Usually a series that got scanned as a single book).
  3. Join multiple files into one book. (Usually an audiobook that is multiple files).

I think the directory based rules are a reasonable default but it would be nice if they could just be overriden rather than needing to lay out the source files again (especially since they may be being used by multiple software, not exclusively ABS and the rules may be incompatible.

@kevincox commented on GitHub (Feb 8, 2026): I think people are correct that metadata isn't always going to be enough to tell. But I think it would be great if these could be overridden in the interface for manual organization. So basically if you want the automatic organization to work then you should follow the recommended structure. However it would be nice to also be able to just override the directory-based grouping. I end up commonly doing the following operations and I would love if I could just do it in the UI rather than needing to move the source files around. 1. Mark an ebook as the same book as an audiobook. 2. Split a book into multiple. (Usually a series that got scanned as a single book). 3. Join multiple files into one book. (Usually an audiobook that is multiple files). I think the directory based rules are a reasonable default but it would be nice if they could just be overriden rather than needing to lay out the source files again (especially since they may be being used by multiple software, not exclusively ABS and the rules may be incompatible.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/audiobookshelf#1524
No description provided.