The "delossyfiers" advent

David Brock · Jun 26, 2025

Just tried a few 128kbps. Seems to work pretty well.

ClarSum · Jun 26, 2025

forart.it said: ↑

...but MVSEP is (almost) the same of using gcolab or huggingface, just better interface.
Click to expand...

No disagreement here, simply alerting to another option as I know many people here use MVSEP with registered accounts so have access to the other file types. Also for those who are confused about what this is I thought the links might be helpful especially with the amount of user generated demos they have on the site.

forart.it said: ↑

...but they don't shares any (own/3rdparty) model, if I'm not wrong...
Click to expand...

For the Apollo Enhancers in the "Model Type" there is an option labelled Universal Super Resolution (by MVSEP Team) and for the AudioSR (Super Resolution) there's the ability to adjust the Cutoff (hz)... not sure if that answers that comment.

boomoperator · Jun 26, 2025

Legotron said: ↑

I tried it on 1901 wax cylinder recording...
Click to expand...

I tried some of this delossifying myself. I guess it won't all of a sudden create a crisp Hi-Fi sound out of your 1901 wax cylinder - frequencies gone means frequencies gone. But maybe the delossifyng process generates data for users to add frequencies to, while keeping it lossless?

forart.it · Jun 27, 2025

Balisani said: ↑

My point being, I'm not so sure an ai algorithm can restore audio spectrum that's been deleted altogether - whereas it's fairly easy to analyze a line (a p line) and double it, and render a seemingly higher resolution picture or video.
Click to expand...

Well, exactly as in video upscaling, neural networks does NOT "restore" anything but GUESSES missing frequencies of lossy encoding.

In other words they GENERATES/INVENT (with higher "fidelity" larger are trained) what has been trashed.

David Brock said: ↑

Just tried a few 128kbps. Seems to work pretty well.
Click to expand...

I honestly think it's still too early (as Detlef Kroll - the author of AudioDelossifier - claimed here, more extensive training would most likely lead to mutch better results), but the path is signed.

Anyway I do also believe that training should be performed on obsolete lossy audio encodings to be even more effective.

ClarSum said: ↑

For the Apollo Enhancers in the "Model Type" there is an option labelled Universal Super Resolution (by MVSEP Team) and for the AudioSR (Super Resolution) there's the ability to adjust the Cutoff (hz)... not sure if that answers that comment.
Click to expand...

I've found where the (Pytorch) models are stored thanks to the Apollo-Colab-Inference repo:
- the original one (aka "MP3 Enhancer") is here;
- the "Universal model for any lossy files by Lew" in his own repository.

I'm pushing Ryan Metcalfe to integrate them into Intel's OpenVINO™ AI Plugins for Audacity: fingers crossed !

boomoperator said: ↑

I guess it won't all of a sudden create a crisp Hi-Fi sound out of your 1901 wax cylinder - frequencies gone means frequencies gone.
Click to expand...

Well you could do this if you had enough similar recordings and their high-fidelity versions to train the neural network in order to “understand” the correlation between the two signals.

I'm interested in these approaches 'cause I'd like to train a neural network to generate a near-losless stereo signal as if it were recorded "directly from the PA mixer" by inferencing on multiple cameras/smartphones lossy audio tracks.

I've found this interesting study/project, that has a similar - but somewhat different - goal:
Audio Enhancement from Multiple Crowdsourced Recordings

boomoperator said: ↑

But maybe the delossifyng process generates data for users to add frequencies to, while keeping it lossless?
Click to expand...

Inference GENERATES a completely new signal from the lossy audio one by GUESSING the (probable) lossless source for.

The concept behind this machine/deep learning approach is pretty simple: if a neural network "understand", through a HUGE number of examples (= files), which mathematical/statistical correlation occurs between a lossless signal and its lossy version, then it will then be able to generate the opposite.

Last edited: Jun 27, 2025

forart.it · Jun 27, 2025

@ArticStorm I'm really sorry I missed the reply to your question!

ArticStorm said: ↑

i am wondering if we ever will be able to restore after stem separation the fuzzy tails, basically that part of the audio, which got lost due to interferences in the mix itself.
Click to expand...

It all depends on the dataset hugeness the machine learning is feeded with: more "examples" usually brings to better results, since the (mathematical/statistical) correlation between source and destination signals correlation is more clearly "understood" by the neural network.

I strongly recommend you and @ClarSum to check this very clear and simple explaination video by Leo Gibson on how (NAM, but it's basically the same for all) neural networks works:

Last but not least, I invite anyone to check - and, why not, contribute to - the (WIP) AI-based audio resources collection I've realized for the HyMPS project.

Last edited: Jun 27, 2025

ArticStorm · Jun 27, 2025

@forart.it problem is at some point a more huge dataset wont be enough, it will get saturated and a plateau is hit.

forart.it · Jun 28, 2025

ArticStorm said: ↑

@forart.it problem is at some point a more huge dataset wont be enough, it will get saturated and a plateau is hit.
Click to expand...

Maybe, but of course there are also different approaches: NAM, for example, uses a special training input file to "probe" the hardware and clone its sound...
...this could be applied for lossy audio "restoring", IMHO.

EDIT:
Ryan ported Apollo model so it will be (probably) included in the next OpenVINO™ AI Plugins for Audacity version.

Last edited: Jun 28, 2025

forart.it · Aug 13, 2025

[BREAKING] Python implementation released !

A2SB: Audio-to-Audio Schrodinger Bridges

Overview
Real-world audio is often degraded by factors such as recording devices, compression, and online transfer. For instance, certain recording devices and compression methods can result in a low sampling rate, while online transfers can lead to missing audio segments. These problems are usually ill-posed and solved with data-driven generative models.

While there are many existing works for speech enhancement at around 16-24kHz, high-res music restoration at high sampling rate (44.1kHz) is a more challenging task and less studied. Our proposed A2SB model can restore music that is lowpassed or has missing segments. Compared to existing approaches, A2SB has the following advantages:

A2SB has significantly better generation quality.

A2SB is trained on large scale permissively-licensed music datasets.

A2SB is an end-to-end model requiring no vocoder, making it easier to deploy.

A2SB can support multiple tasks (bandwidth extension and inpainting) with a single model.

A2SB can restore hour-long audio without boundary artifacts.

Click to expand...

Demopage: https://research.nvidia.com/labs/adlr/A2SB/
Git: https://github.com/NVIDIA/diffusion-audio-restoration#readme
HF: https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge

ErnieBert · Aug 13, 2025

Thimeo - Stereo Tool - i use in WinAMP

- Delossifier: Improves the sound of MPEG2/MP3 style lossy compressed files

Last edited: Aug 13, 2025

curtified · Aug 13, 2025

ErnieBert said: ↑

Thimeo - Stereo Tool - i use in WinAMP

- Delossifier: Improves the sound of MPEG2/MP3 style lossy compressed files
Click to expand...

I was about to post about Thimeo - Stereo Tool too! Ive tried a bunch of these delosifier apps, algos, and plugins. I will second this recommendation. Not only does it do that but it also has many other modules in it that can improve bad audio. It runs on all platforms, daws, even commandline.

U can find it on sister site. And their forum has updated betas for mac and windows that work with it.

I initially did a deep dive here because a lot of AI audio is trained on low bitrate music so it generates that as an output. Ive found Stereo Tool is the best for me in combination with other plugins and tools post processing.

curtified · Aug 13, 2025

forart.it said: ↑

[BREAKING] Python implementation released !

A2SB: Audio-to-Audio Schrodinger Bridges

Demopage: https://research.nvidia.com/labs/adlr/A2SB/
Git: https://github.com/NVIDIA/diffusion-audio-restoration#readme
HF: https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge
Click to expand...

I just "Vibe Coded" a gradio gui for inference with cursor and GPT5 model to run locally on my NVDIA 4090, Using that git link..

I still think overall stereo tool wins.

You are Cursor-Agent using GPT-5. Task: pull and run ONLY the inference parts of NVIDIA’s diffusion-audio-restoration on Windows 11 with an RTX 4090 + CUDA, then add a Gradio UI for local audio. Keep upstream code intact; put any new files under tools/gradio_app. Steps:
1) Clone and pin repo
- Create a clean workspace folder DiffusionAudioA2SB
- git clone https://github.com/NVIDIA/diffusion-audio-restoration.git .
- Do NOT edit existing tracked files. New code lives under tools/gradio_app and scripts/.
2) Create Windows CUDA Python env (conda preferred)
- Create conda env a2sb with Python 3.10
- Activate env
- Install PyTorch + CUDA 12.1 wheels compatible with 4090:
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
- Install required libs (from README “Requirements”):
pip install numpy scipy matplotlib jsonargparse librosa soundfile einops pytorch-lightning rotary-embedding-torch ssr-eval torchaudio
- Verify CUDA:
python - << "PY"
import torch; assert torch.cuda.is_available(); print("CUDA OK", torch.version.cuda, torch.cuda.get_device_name(0))
PY
3) Model checkpoints
- Note: NVIDIA’s A2SB page says “Code & Model Checkpoints (coming soon).” If official checkpoints are published later, add a small downloader script at scripts/fetch_checkpoints.py that pulls them to weights/a2sb/ and document expected filenames.
- Until official weights exist, structure the code to look for:
weights/a2sb/a2sb_44k_be_split0.ckpt
weights/a2sb/a2sb_44k_be_split1.ckpt
weights/a2sb/a2sb_inpaint_split0.ckpt
weights/a2sb/a2sb_inpaint_split1.ckpt
- The inference API should gracefully error if files are missing with a clear message showing where to place them.
4) Wire up inference-only entry points
- Do NOT include or run any training code.
- Use the repo’s existing inference utilities referenced in README:
- inference/A2SB_upsample_api.py for bandwidth extension of arbitrary length audio with automatic rolloff detection
- inference/A2SB_inpaint_dataset.py and related scripts for inpainting
- Create a thin Python adapter tools/gradio_app/a2sb_runner.py that:
- Loads CUDA device and checkpoints
- Exposes two functions:
restore_bandwidth(input_wav_path: str, n_steps: int, rolloff_hz: float|None, chunk_seconds: float, overlap_seconds: float) -> output_wav_path
inpaint_gaps(input_wav_path: str, gap_ms: int, every_sec: float, n_steps: int, chunk_seconds: float, overlap_seconds: float) -> output_wav_path
- Internally call the logic from inference/* files rather than duplicating models
- Auto-detect sample rate, convert to 44.1k if needed, preserve channels if supported by the repo. If stereo is not supported, downmix to mono with a warning.
5) Add a Gradio UI for local audio
- File: tools/gradio_app/app.py
- Features:
- Upload or path picker for local WAV/FLAC/MP3 (librosa handles decode; write temp WAV at 44.1k)
- Task toggle: “Bandwidth Extension” or “Inpainting”
- Shared sliders with sensible defaults aligned to README guidance:
- n_steps: int slider [25..400] default 200 (document that more steps may improve quality at cost of time)
- chunk_seconds: float slider [2.0..10.0] default 3.0 (used for long audio tiling)
- overlap_seconds: float slider [0.25..2.0] default 0.75 (for smoother stitching)
- Bandwidth Extension specific:
- rolloff_mode: dropdown [auto, custom] default auto
- rolloff_hz: float slider [2000..12000] default 4000 visible when custom
- Inpainting specific:
- gap_ms: int slider [100..2000] default 1000
- every_sec: float slider [2.0..10.0] default 5.0
- Buttons: “Process” and “Open Output Folder”
- Outputs: downloadable restored WAV + simple waveform preview
- Gradio version: 4.x
- All defaults set to conservative values recommended for quality; ensure GPU torch.no_grad() context and autocast where safe.
6) CLI wrappers
- Create scripts/run_be_api.bat:
- Example: python inference\A2SB_upsample_api.py -f "<in.wav>" -o "outputs\restored.wav" -n 200
- Create scripts/launch_gradio.bat to run the UI: python tools\gradio_app\app.py
- Windows-friendly paths; create outputs/ if missing.
7) Robustness and UX
- If checkpoints are missing, show a single clear dialog in Gradio with the expected filenames and the local weights directory to drop them into.
- Validate CUDA memory. If OOM, suggest reducing chunk_seconds or n_steps.
- Log timing and GPU name at start of each run.
8) README patch (new file docs/RUN_INFERENCE_WINDOWS.md)
Include:
- Env setup
- How to place checkpoints
- How to run scripts and the Gradio UI
- Troubleshooting: CUDA driver versions, libsndfile install on Windows, FFmpeg optional for MP3.
9) Execute now
- Run the commands to set up the env and launch the Gradio UI so I can test locally:
- conda create -y -n a2sb python=3.10
- conda activate a2sb
- pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
- pip install numpy scipy matplotlib jsonargparse librosa soundfile einops pytorch-lightning rotary-embedding-torch ssr-eval gradio==4.44.0
- python - << "PY"
import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available(), torch.cuda.get_device_name(0))
PY
- mkdir -p weights/a2sb outputs tools/gradio_app scripts docs
- Create the files specified above
- python tools/gradio_app/app.py
10) Deliverables
- tools/gradio_app/app.py and a2sb_runner.py
- scripts/run_be_api.bat and scripts/launch_gradio.bat
- docs/RUN_INFERENCE_WINDOWS.md
- No changes to existing repo files
- Print next steps and where to drop checkpoints once the UI starts
Now start.

Last edited: Aug 13, 2025

forart.it · Aug 14, 2025

ErnieBert said: ↑

Thimeo - Stereo Tool - i use in WinAMP
Click to expand...

There's a similar function in RX too, but deep learning/neural networks approach should be (mutch) more effective.

curtified said: ↑

I just "Vibe Coded" a gradio gui for inference with cursor and GPT5 model to run locally on my NVDIA 4090, Using that git link..
Click to expand...

...what about a jupiter notebook (colab) to let anyone test it ?

curtified said: ↑

I still think overall stereo tool wins.
Click to expand...

According to their tests it seems to produce better results than others (note: I've never heard about CQTDiff nor IBAR):

...anyway I would really like to see some "official" listening tests @ Hydrogenaudio...

Last edited: Aug 14, 2025

curtified · Aug 14, 2025

forart.it said: ↑

...what about a jupiter notebook (colab) to let anyone test it ?
Click to expand...

I can try to vibe code one for colab. I think colab has AI built in now so u can just give it the github to test im assuming?

curtified · Aug 14, 2025

curtified said: ↑

I can try to vibe code one for colab. I think colab has AI built in now so u can just give it the github to test im assuming?
Click to expand...

I did a verrrrry clunky prompt build using google colab prompt feature. You might have to prompt it some more to dial it in. Its late and i only spent a few minutes on it.

https://colab.research.google.com/drive/1HqISJHZ0UOmjNOqKcKk4ofWoCRUEWK85?usp=sharing

ceo54 · Aug 14, 2025

forart.it said: ↑

There's a similar function in RX too, but deep learning/neural networks approach should be (mutch) more effective.

...what about a jupiter notebook (colab) to let anyone test it ?

According to their tests it seems to produce better results than others (note: I've never heard about CQTDiff nor IBAR):

...anyway I would really like to see some "official" listening tests @ Hydrogenaudio...
Click to expand...

I ran it on Colab yesterday only to run out of RAM. I tried lower settings -n 30 that ended in the same way. I also found out that the package doesn't have a requirements.txt so I manullay downloaded every dependency painstakingly. And I also found out that it was trained on only 1 channel so the output will always be a mono file. I tried to get around the issue by splitting the left and right channel and processing them seperately only to run out of Ram yet again. I finally gave up without seeing the results.

I don't think there's anything out there that can beat the Sony's DSEE yet. But it's not available as a standalone or as a plugin so there's no way to use it.

ceo54 · Aug 14, 2025

curtified said: ↑

I did a verrrrry clunky prompt build using google colab prompt feature. You might have to prompt it some more to dial it in. Its late and i only spent a few minutes on it.

https://colab.research.google.com/drive/1HqISJHZ0UOmjNOqKcKk4ofWoCRUEWK85?usp=sharing

Click to expand...

"google colab prompt feature"

What is it, could you please expand on that ? And if you did successfully ran it, may you share the results ? Audio files or spectrograms ?

Thank you kindly.

ceo54 · Aug 14, 2025

forart.it said: ↑

There's a similar function in RX too, but deep learning/neural networks approach should be (mutch) more effective.

...what about a jupiter notebook (colab) to let anyone test it ?

According to their tests it seems to produce better results than others (note: I've never heard about CQTDiff nor IBAR):

...anyway I would really like to see some "official" listening tests @ Hydrogenaudio...
Click to expand...

What is that GroundTruth in your spectrogram comparison chart ? I did a Google search but it came out empty handed. Looks interesting enough to try it out.

xorome · Aug 14, 2025

Probably: "Ground truth" is the source's original spectrum. "Degraded" is the spectrum after encoding "Ground truth" to a lossy format. The other spectra are reconstructed from "Degraded" using AI, which are then compared to "Ground truth" to judge the reconstruction's accuracy.

forart.it · Aug 14, 2025

ceo54 said: ↑

I ran it on Colab yesterday only to run out of RAM. I tried lower settings -n 30 that ended in the same way. I also found out that the package doesn't have a requirements.txt so I manullay downloaded every dependency painstakingly.
Click to expand...

Yes, it's very bad implementation: it needs mutch more refinements (starting from requirements.txt).

ceo54 said: ↑

And I also found out that it was trained on only 1 channel so the output will always be a mono file.
Click to expand...

Stereo is supported too, but you have to change the config inside A2SB_lightning_module.py / A2SB_lightning_module_api.py (but don't ask me how).

ceo54 said: ↑

What is that GroundTruth in your spectrogram comparison chart ? I did a Google search but it came out empty handed. Looks interesting enough to try it out.
Click to expand...

Check the demo page: https://research.nvidia.com/labs/adlr/A2SB/

Legotron · Aug 14, 2025

@curtified I tried your prompt with Gemini, but all I got was error. I wish I could code or compile stuff.. If someone manage to make it executable for windows, please share it here, I´d like to test it with historical records. TIA

Similar Threads - delossyfiers advent	Forum	Date
Prismatic Spray - Bytebeat Adventure Synth - $249 - 05/12/22	Synthesizers	Dec 6, 2022
Ghosthack launches 2022 Advent Calendar with freebies, giveaways, deals & more	Software News	Dec 3, 2022
Advent calendar - 1 Dec to 24 Dec - Please post links here	Lounge	Nov 30, 2022
Advent Calendar - Free Samples and Software	Software News	Dec 3, 2021
The Tone Adventure Soundbank by Abc Sound Design for the Arturia - Mini-V VST	Presets, Patches	Apr 14, 2020

The "delossyfiers" advent

David Brock Rock Star

ClarSum Kapellmeister

boomoperator Rock Star

forart.it Kapellmeister

forart.it Kapellmeister

ArticStorm Moderator Staff Member

forart.it Kapellmeister

forart.it Kapellmeister

ErnieBert Noisemaker

curtified Audiosexual

curtified Audiosexual

forart.it Kapellmeister

curtified Audiosexual

curtified Audiosexual

Attached Files:

Screenshot 2025-08-14 at 1.31.33 AM.png

ceo54 Producer

ceo54 Producer

ceo54 Producer

xorome Audiosexual

forart.it Kapellmeister

Legotron Audiosexual

PROFESSIONAL AUDIO LOVERS

The "delossyfiers" advent

David Brock Rock Star

ClarSum Kapellmeister

boomoperator Rock Star

forart.it Kapellmeister

forart.it Kapellmeister

ArticStorm Moderator Staff Member

forart.it Kapellmeister

forart.it Kapellmeister

ErnieBert Noisemaker

curtified Audiosexual

curtified Audiosexual

forart.it Kapellmeister

curtified Audiosexual

curtified Audiosexual

Attached Files:

Screenshot 2025-08-14 at 1.31.33 AM.png

ceo54 Producer

ceo54 Producer

ceo54 Producer

xorome Audiosexual

forart.it Kapellmeister

Legotron Audiosexual

Useful Searches