Proper physical treatment of sound waves for improved spatial audio #2906

Open
opened 2026-02-20 22:14:42 -05:00 by deekerman · 7 comments
Owner

Originally created by @Krzmbrzl on GitHub (Oct 3, 2024).

Context

Spatial audio (also called positional audio) in Mumble uses a somewhat simplistic approach to simulating how an observer receives sound from different locations etc.

Description

In order to maximize the realism of positional audio, Mumble should properly take physical effects into account. These are

  • Use of proper head-related transfer functions (HRTFs). This simulates how sound waves are affected by traveling through the physical object that is the human being or more specifically, its head, of the person hearing that sound. This includes frequency changes (which are themselves frequency-dependent) as well as time- and phase- and volume-shifts of the sound waves. In other words, this also incorporates interaural delay.
  • Properly account for doppler-shifts arising from the (relative) movements of the sound source and the sound receiver.
  • Take the virtual surroundings into account. This of course requires knowledge of some kind of 3D world map in order to be able to map from positions to physical surroundings. This would allow to have effects for
    • Reverb
    • Occlusion
    • Attenuation

HRTF could be achieved by making use of the OpenAL (open audio library) ecosystem of which there exists an LGPL licensed variant called OpenAL-Soft.
Potentially, OpenAL-Soft could also be used as a regular cross-platform audio library which would then make our own platform-specific backend implementations obsolete (which would greatly reduce the maintenance effort). Whether or not this is a viable direction is not yet clear though.

The speed of entities required for Doppler shifts can be obtained via "numeric differentiation", i.e. taking two position updates and then checking how much the entity has moved in the given amount of time.

For the environmental effects, data about the physical surroundings of the entities is required. In order to obtain these, the plugin API could be extended to allow plugins to provide this kind of data.

Note that there exists also a Qt component for handling spatial audio that might be a viable option for us to use. However, ideally we wouldn't introduce a Qt dependency on the audio level processing.

Mumble component

Client

OS-specific?

No

Additional information

Online resources:

Related issues:

Originally created by @Krzmbrzl on GitHub (Oct 3, 2024). ### Context Spatial audio (also called positional audio) in Mumble uses a somewhat simplistic approach to simulating how an observer receives sound from different locations etc. ### Description In order to maximize the realism of positional audio, Mumble should properly take physical effects into account. These are - Use of proper [head-related transfer functions](https://en.wikipedia.org/wiki/Head-related_transfer_function) (HRTFs). This simulates how sound waves are affected by traveling through the physical object that is the human being or more specifically, its head, of the person hearing that sound. This includes frequency changes (which are themselves frequency-dependent) as well as time- and phase- and volume-shifts of the sound waves. In other words, this also incorporates interaural delay. - Properly account for doppler-shifts arising from the (relative) movements of the sound source and the sound receiver. - Take the virtual surroundings into account. This of course requires knowledge of some kind of 3D world map in order to be able to map from positions to physical surroundings. This would allow to have effects for - Reverb - Occlusion - Attenuation HRTF could be achieved by making use of the OpenAL (open audio library) ecosystem of which there exists an LGPL licensed variant called OpenAL-Soft. Potentially, OpenAL-Soft could also be used as a regular cross-platform audio library which would then make our own platform-specific backend implementations obsolete (which would greatly reduce the maintenance effort). Whether or not this is a viable direction is not yet clear though. The speed of entities required for Doppler shifts can be obtained via "numeric differentiation", i.e. taking two position updates and then checking how much the entity has moved in the given amount of time. For the environmental effects, data about the physical surroundings of the entities is required. In order to obtain these, the plugin API could be extended to allow plugins to provide this kind of data. Note that there exists also a [Qt component](https://doc.qt.io/qt-6/qtspatialaudio-index.html) for handling spatial audio that might be a viable option for us to use. However, ideally we wouldn't introduce a Qt dependency on the audio level processing. ### Mumble component Client ### OS-specific? No ### Additional information Online resources: - https://github.com/leomccormack/Spatial_Audio_Framework - https://en.wikipedia.org/wiki/Head-related_transfer_function - https://en.wikipedia.org/wiki/Sound_localization - https://ieeexplore.ieee.org/document/5661988 - https://github.com/kcat/openal-soft - https://www.openal.org/ - https://github.com/layeh/barnard (Mumble client for the terminal that makes use of OpenAL) - https://doc.qt.io/qt-6/qtspatialaudio-index.html - https://gist.github.com/Hiradur/388cb7f658fe117a1f4ccfd9a21adffa - https://web.archive.org/web/20070601013325/http://www.soundblaster.com/eax/abouteax/eax5ahd/eax5_2.asp Related issues: - #6532 - #2324 - #5934 - #1933 - #3234
Author
Owner

@Hiradur commented on GitHub (Oct 3, 2024):

The speed of entities required for Doppler shifts can be obtained via "numeric differentiation", i.e. taking two position updates and then checking how much the entity has moved in the given amount of time.

Please note that Mumble would have to know the ingame unit in which the movement speed is measured. This could be a parameter that a game-plugin could set. But even then I think It could cause some weird artifacts, e.g. if a player teleports from one end of a map to another (huge difference between positions in a singe time step). Some upper limit to safe guard around this would make sense.

For the environmental effects, data about the physical surroundings of the entities is required. In order to obtain these, the plugin API could be extended to allow plugins to provide this kind of data.

This would be one way to do it, Creative chose another for EAX Voice: back when EAX was popular, it received the environmental data from the game through the sound card driver and applied that to the microphone input stream so that the processed stream was available to any VoIP software.
I don't think that OpenAL Soft supports this at the moment and it would only work for games using EFX or EAX provided by OpenAL but it wouldn't require any work on Mumble's side.

Here are some examples of EAX Voice:
https://www.youtube.com/watch?v=30fTc5t5QNU
https://www.youtube.com/watch?v=wxIYNG4TQ7U

@Hiradur commented on GitHub (Oct 3, 2024): > The speed of entities required for Doppler shifts can be obtained via "numeric differentiation", i.e. taking two position updates and then checking how much the entity has moved in the given amount of time. Please note that Mumble would have to know the ingame unit in which the movement speed is measured. This could be a parameter that a game-plugin could set. But even then I think It could cause some weird artifacts, e.g. if a player teleports from one end of a map to another (huge difference between positions in a singe time step). Some upper limit to safe guard around this would make sense. > For the environmental effects, data about the physical surroundings of the entities is required. In order to obtain these, the plugin API could be extended to allow plugins to provide this kind of data. This would be one way to do it, Creative chose another for [EAX Voice](https://web.archive.org/web/20070601013325/http://www.soundblaster.com/eax/abouteax/eax5ahd/eax5_2.asp): back when EAX was popular, it received the environmental data from the game through the sound card driver and applied that to the microphone input stream so that the processed stream was available to any VoIP software. I don't think that OpenAL Soft supports this at the moment and it would only work for games using EFX or EAX provided by OpenAL but it wouldn't require any work on Mumble's side. Here are some examples of EAX Voice: https://www.youtube.com/watch?v=30fTc5t5QNU https://www.youtube.com/watch?v=wxIYNG4TQ7U
Author
Owner

@Krzmbrzl commented on GitHub (Oct 3, 2024):

Mumble would have to know the ingame unit in which the movement speed is measured.

I would argue that the already require the positional data to be in meters and since the respective audio is realtime, it would make sense for the time to be measured in seconds as well.

In order to account for games with very fast movement (e.g. cars or even spaceships) the plugin could set a speed multiplier in order to keep the Doppler effect on a sane level.

Some upper limit to safe guard around this would make sense.

Absolutely!

Creative chose another for EAX Voice

Interesting approach. Never heard of it. It sounds very convenient though.

@Krzmbrzl commented on GitHub (Oct 3, 2024): > Mumble would have to know the ingame unit in which the movement speed is measured. I would argue that the already require the positional data to be in meters and since the respective audio is realtime, it would make sense for the time to be measured in seconds as well. In order to account for games with very fast movement (e.g. cars or even spaceships) the plugin could set a speed multiplier in order to keep the Doppler effect on a sane level. > Some upper limit to safe guard around this would make sense. Absolutely! > Creative chose another for EAX Voice Interesting approach. Never heard of it. It sounds very convenient though.
Author
Owner

@davidebeatrici commented on GitHub (Oct 3, 2024):

I was aware of EAX, but not EAX Voice. That feature is/was cool!

I already had a technique like that in mind, but the issue (as usual) is supporting specific games. In theory we could gather data directly from the audio library if a known/documented one is used, but otherwise it's going to be hard unless somebody has already reverse engineered the internals.

@davidebeatrici commented on GitHub (Oct 3, 2024): I was aware of EAX, but not EAX Voice. That feature is/was cool! I already had a technique like that in mind, but the issue (as usual) is supporting specific games. In theory we could gather data directly from the audio library if a known/documented one is used, but otherwise it's going to be hard unless somebody has already reverse engineered the internals.
Author
Owner

@QmwJlHuSg9pa commented on GitHub (Oct 3, 2024):

Your best bet would probably be to speak to the maintainer of openal-soft directly; kcat has made strides in recent years towards integrating EAX support into the project.

@QmwJlHuSg9pa commented on GitHub (Oct 3, 2024): Your best bet would probably be to speak to the maintainer of openal-soft directly; kcat has made strides in recent years towards integrating EAX support into the project.
Author
Owner

@mirh commented on GitHub (Oct 4, 2024):

This would be one way to do it, Creative chose another for EAX Voice

I mean.. that's just a matter of different "places" where the mic effects are implement/offer the mic effects . But game-side there is no difference into a "predisposition" being required.

And in this sense, while openal integration could certainly smooth out things for the games using it, I'm somewhat worried that the others with some/degree of generalization may instead be penalized by going higher level (though openal could still be super useful to implement HRTF and whatnot spatial)

but it wouldn't require any work on Mumble's side.

That sound card driver thing by Creative? Of course not, it works for everybody. But we don't control RTKVHD64 or AtihdWT6.
So, either you find a way to implement this in an APO (I'm not even sure it is possible, given that they would still have to poke inside game processes) or openal will have to expose this information to the rest of the system some way.

Such ~frontend conundrum that would then stack with the one I was left with for the backend at https://github.com/kcat/openal-soft/issues/415#issuecomment-2308399677

kcat has made strides in recent years towards integrating EAX support into the project.

EAX *has* been integrated, nearly 3 years ago in one big PR already.

@mirh commented on GitHub (Oct 4, 2024): > This would be one way to do it, Creative chose another for EAX Voice I mean.. that's just a matter of different "places" where the mic effects are implement/offer the mic effects . But game-side there is no difference into a "predisposition" being required. And in this sense, while openal integration could certainly smooth out things for the games using it, I'm somewhat worried that the others with [some](https://learn.microsoft.com/en-us/windows/win32/xaudio2/how-to--integrate-x3daudio-with-xaudio2)/[degree](https://learn.microsoft.com/en-us/windows/win32/api/xaudio2fx/nf-xaudio2fx-xaudio2createreverb#remarks) of generalization may instead be penalized by going higher level (though openal could still be super useful to implement HRTF and whatnot spatial) > but it wouldn't require any work on Mumble's side. That sound card driver thing by Creative? Of course not, it works for everybody. But we don't control RTKVHD64 or AtihdWT6. So, either you find a way to implement this in an APO (I'm not even sure it is possible, given that they would still have to poke inside game processes) or openal will have to expose this information to the rest of the system some way. Such ~frontend conundrum that would then stack with the one I was left with for the backend at https://github.com/kcat/openal-soft/issues/415#issuecomment-2308399677 > kcat has made strides in recent years towards integrating EAX support into the project. EAX \*has\* been integrated, nearly 3 years ago in one big PR already.
Author
Owner

@will-ca commented on GitHub (Nov 2, 2024):

I would argue that the already require the positional data to be in meters and since the respective audio is realtime, it would make sense for the time to be measured in seconds as well.

Would it make sense for distance/position be unitless, and instead allow specifying a speed of sound parameter? (Default 340s^-1, equivalent to meters at STP.)

Though IDK if that'll affect wavelength-dependent effects.

@will-ca commented on GitHub (Nov 2, 2024): > I would argue that the already require the positional data to be in meters and since the respective audio is realtime, it would make sense for the time to be measured in seconds as well. Would it make sense for distance/position be unitless, and instead allow specifying a speed of sound parameter? (Default `340s^-1`, equivalent to meters at STP.) Though IDK if that'll affect wavelength-dependent effects.
Author
Owner

@Krzmbrzl commented on GitHub (Nov 3, 2024):

Would it make sense for distance/position be unitless, and instead allow specifying a speed of sound parameter? (Default 340s^-1, equivalent to meters at STP.)

Not sure what problem this would try to solve though 🤔

@Krzmbrzl commented on GitHub (Nov 3, 2024): > Would it make sense for distance/position be unitless, and instead allow specifying a speed of sound parameter? (Default 340s^-1, equivalent to meters at STP.) Not sure what problem this would try to solve though 🤔
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/mumble-mumble-voip#2906
No description provided.