[Bug]: metadata.json parser does not work for series sequence with whitespace #1571

Open
opened 2026-02-20 10:11:03 -05:00 by deekerman · 6 comments
Owner

Originally created by @wtanksleyjr on GitHub (Dec 1, 2023).

Describe the issue

[EDIT: my example is wrong, I'll post a comment showing why. I'm not editing this apart from this note.]

When scanner finds a new book with a metadata.json file, it doesn't parse the series field correctly if the series doesn't already exist. For example, suppose the file contains "series": [ "The Pact #1"].

If there's already a series named The Pact defined, this will add a new book with its ordering=1 (as expected). But if there's no series defined with that name, the loader will create a new series named The Pact #1 (so it parses the ordering as part of the series name).

Steps to reproduce the issue

  1. Put an audio file and a metadata.json file together as indicated above, using a series name not already present on your setup.
  2. Move the folder containing both into the path of the scanner.
  3. Edit the resulting book, and edit its Series field. Observe that the whole string is placed in the series name.
  4. Delete the book entry and clean up.
  5. Add the same series name to some other book (by hand).
  6. Repeat steps 1-2 again. Observe that this time, the book's series loads as expected.

Audiobookshelf version

2.6.0

How are you running audiobookshelf?

Docker

Originally created by @wtanksleyjr on GitHub (Dec 1, 2023). ### Describe the issue [EDIT: my example is wrong, I'll post a comment showing why. I'm not editing this apart from this note.] When scanner finds a new book with a metadata.json file, it doesn't parse the `series` field correctly if the series doesn't already exist. For example, suppose the file contains `"series": [ "The Pact #1"]`. If there's already a series named `The Pact` defined, this will add a new book with its ordering=1 (as expected). But if there's no series defined with that name, the loader will create a new series named `The Pact #1` (so it parses the ordering as part of the series name). ### Steps to reproduce the issue 1. Put an audio file and a metadata.json file together as indicated above, using a series name not already present on your setup. 2. Move the folder containing both into the path of the scanner. 3. Edit the resulting book, and edit its Series field. Observe that the whole string is placed in the series name. 4. Delete the book entry and clean up. 5. Add the same series name to some other book (by hand). 6. Repeat steps 1-2 again. Observe that this time, the book's series loads as expected. ### Audiobookshelf version 2.6.0 ### How are you running audiobookshelf? Docker
Author
Owner

@wtanksleyjr commented on GitHub (Dec 4, 2023):

When scanner finds a new book with a metadata.json file, it doesn't parse the series field correctly if the series doesn't already exist. For example, suppose the file contains "series": [ "The Pact #1"].

My mistake. In order to trigger this bug, the series number HAS to include extra text. For an actual example, "Corum #4, Dramatized Adaptation" will trigger this (that's by Moorcock BTW). The intent of this extra text is to allow a series to include multiple editions of each book, so you might have Corum #1 as narrator-only, while #4 might be available both narrated and Graphical Audio. Audible actually includes this text inside the series number, and ABS does support that when doing a Match.

So it turns out your parser's perfectly fine with "The Pact #1" and so on; it's only when you include more stuff in the series-order field that it gets hairy.

Sorry about that, I hope this helps.

@wtanksleyjr commented on GitHub (Dec 4, 2023): > When scanner finds a new book with a metadata.json file, it doesn't parse the series field correctly if the series doesn't already exist. For example, suppose the file contains "series": [ "The Pact #1"]. My mistake. In order to trigger this bug, the series number HAS to include extra text. For an actual example, "Corum #4, Dramatized Adaptation" will trigger this (that's by Moorcock BTW). The intent of this extra text is to allow a series to include multiple editions of each book, so you might have Corum #1 as narrator-only, while #4 might be available both narrated and Graphical Audio. Audible actually includes this text inside the series number, and ABS does support that when doing a Match. So it turns out your parser's perfectly fine with "The Pact #1" and so on; it's only when you include more stuff in the series-order field that it gets hairy. Sorry about that, I hope this helps.
Author
Owner

@advplyr commented on GitHub (Dec 4, 2023):

Just to clarify, in some series you are storing the sequence as "4, Dramatized Adaptation"?

I'm not sure yet if we should support this now that we have moved to sqlite because in order to sort the sequences properly we have to cast it to a number. Currently there is no restriction as to what you can put in there but I think this should be changed to require it to be a number I just haven't thought of a good way to roll that out without breaking things for users already using alphanumeric.

@advplyr commented on GitHub (Dec 4, 2023): Just to clarify, in some series you are storing the sequence as "4, Dramatized Adaptation"? I'm not sure yet if we should support this now that we have moved to sqlite because in order to sort the sequences properly we have to cast it to a number. Currently there is no restriction as to what you can put in there but I think this should be changed to require it to be a number I just haven't thought of a good way to roll that out without breaking things for users already using alphanumeric.
Author
Owner

@advplyr commented on GitHub (Dec 4, 2023):

The parser that is pulling out the sequence number doesn't work for Corum #4, Dramatized Adaptation specifically because of the spaces.
Corum #4,DramatizedAdaptation would work but as I mentioned above I don't think this should be done since the sequence may not be sorted correctly

@advplyr commented on GitHub (Dec 4, 2023): The parser that is pulling out the sequence number doesn't work for `Corum #4, Dramatized Adaptation` specifically because of the spaces. `Corum #4,DramatizedAdaptation` would work but as I mentioned above I don't think this should be done since the sequence may not be sorted correctly
Author
Owner

@wtanksleyjr commented on GitHub (Dec 4, 2023):

I kind of want to protest, I don't want to lose the information Audible
includes in that field. But ... I also don't totally mind. Even though I
had assumed that would be preserved, so I also stored some info in those
fields (for example I have podcast audiobooks I've used the alphabet to
number). I'll have to find some way to search for those and fix them by
hand.

However, let's ignore that for now ... the only thing needed here is
parsing the series name, and whatever solution works is what we'll have to
do.

On Mon, Dec 4, 2023 at 12:49 PM advplyr @.***> wrote:

The parser that is pulling out the sequence number doesn't work for Corum
#4, Dramatized Adaptation specifically because of the spaces.
Corum #4,DramatizedAdaptation would work but as I mentioned above I don't
think this should be done since the sequence may not be sorted correctly


Reply to this email directly, view it on GitHub
https://github.com/advplyr/audiobookshelf/issues/2380#issuecomment-1839452744,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJ7H6OZBSRN4N54T7TVP43YHYZNXAVCNFSM6AAAAABADXHP4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGQ2TENZUGQ
.
You are receiving this because you authored the thread.Message ID:
@.***>

@wtanksleyjr commented on GitHub (Dec 4, 2023): I kind of want to protest, I don't want to lose the information Audible includes in that field. But ... I also don't totally mind. Even though I had assumed that would be preserved, so I also stored some info in those fields (for example I have podcast audiobooks I've used the alphabet to number). I'll have to find some way to search for those and fix them by hand. However, let's ignore that for now ... the only thing needed here is parsing the series name, and whatever solution works is what we'll have to do. On Mon, Dec 4, 2023 at 12:49 PM advplyr ***@***.***> wrote: > The parser that is pulling out the sequence number doesn't work for Corum > #4, Dramatized Adaptation specifically because of the spaces. > Corum #4,DramatizedAdaptation would work but as I mentioned above I don't > think this should be done since the sequence may not be sorted correctly > > — > Reply to this email directly, view it on GitHub > <https://github.com/advplyr/audiobookshelf/issues/2380#issuecomment-1839452744>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAJ7H6OZBSRN4N54T7TVP43YHYZNXAVCNFSM6AAAAABADXHP4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGQ2TENZUGQ> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
Author
Owner

@advplyr commented on GitHub (Dec 4, 2023):

This seems like it may be a mistake on audibles side. It doesn't make sense that would be part of the sequence. Here is the link to the API output for future reference https://api.audible.com/1.0/catalog/products/1648816800?response_groups=series,product_details

Do you have any more examples of audible doing that?

As a solution to this we should automatically split at the first space and don't include anything after. Same with user input

@advplyr commented on GitHub (Dec 4, 2023): This seems like it may be a mistake on audibles side. It doesn't make sense that would be part of the sequence. Here is the link to the API output for future reference https://api.audible.com/1.0/catalog/products/1648816800?response_groups=series,product_details Do you have any more examples of audible doing that? As a solution to this we should automatically split at the first space and don't include anything after. Same with user input
Author
Owner

@wtanksleyjr commented on GitHub (Dec 4, 2023):

It's present in almost every Graphical Audio book; as I mentioned the Match function returns the same results. The "Corum" series is the example I used. It's less common than plain integers (but don't forget decimal numbers), but still not rare.

One other comment: the order can be empty as well, and it turns out that the parser does the same thing in that case (it creates a new series starting in the same name but ending in " #").

@wtanksleyjr commented on GitHub (Dec 4, 2023): It's present in almost every Graphical Audio book; as I mentioned the Match function returns the same results. The "Corum" series is the example I used. It's less common than plain integers (but don't forget decimal numbers), but still not rare. One other comment: the order can be empty as well, and it turns out that the parser does the same thing in that case (it creates a new series starting in the same name but ending in " #").
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/audiobookshelf-advplyr#1571
No description provided.