The "delossyfiers" advent

Discussion in 'Ai for Music' started by forart.it, Apr 24, 2025.

  1. David Brock

    David Brock Rock Star

    Joined:
    Sep 15, 2020
    Messages:
    444
    Likes Received:
    454
    Location:
    Royston Vasey
    Just tried a few 128kbps. Seems to work pretty well.
     
  2. ClarSum

    ClarSum Kapellmeister

    Joined:
    Aug 24, 2023
    Messages:
    126
    Likes Received:
    61
    No disagreement here, simply alerting to another option as I know many people here use MVSEP with registered accounts so have access to the other file types. Also for those who are confused about what this is I thought the links might be helpful especially with the amount of user generated demos they have on the site.

    For the Apollo Enhancers in the "Model Type" there is an option labelled Universal Super Resolution (by MVSEP Team) and for the AudioSR (Super Resolution) there's the ability to adjust the Cutoff (hz)... not sure if that answers that comment.
     
  3. boomoperator

    boomoperator Rock Star

    Joined:
    Oct 16, 2013
    Messages:
    654
    Likes Received:
    368
    I tried some of this delossifying myself. I guess it won't all of a sudden create a crisp Hi-Fi sound out of your 1901 wax cylinder - frequencies gone means frequencies gone. But maybe the delossifyng process generates data for users to add frequencies to, while keeping it lossless?
     
    • Interesting Interesting x 1
    • List
  4. forart.it

    forart.it Kapellmeister

    Joined:
    May 5, 2023
    Messages:
    148
    Likes Received:
    74
    Well, exactly as in video upscaling, neural networks does NOT "restore" anything but GUESSES missing frequencies of lossy encoding.

    In other words they GENERATES/INVENT (with higher "fidelity" larger are trained) what has been trashed.

    I honestly think it's still too early (as Detlef Kroll - the author of AudioDelossifier - claimed here, more extensive training would most likely lead to mutch better results), but the path is signed.

    Anyway I do also believe that training should be performed on obsolete lossy audio encodings to be even more effective.

    I've found where the (Pytorch) models are stored thanks to the Apollo-Colab-Inference repo:
    - the original one (aka "MP3 Enhancer") is here;
    - the "Universal model for any lossy files by Lew" in his own repository.

    I'm pushing Ryan Metcalfe to integrate them into Intel's OpenVINO™ AI Plugins for Audacity: fingers crossed !

    Well you could do this if you had enough similar recordings and their high-fidelity versions to train the neural network in order to “understand” the correlation between the two signals.

    I'm interested in these approaches 'cause I'd like to train a neural network to generate a near-losless stereo signal as if it were recorded "directly from the PA mixer" by inferencing on multiple cameras/smartphones lossy audio tracks.

    I've found this interesting study/project, that has a similar - but somewhat different - goal:
    Audio Enhancement from Multiple Crowdsourced Recordings

    Inference GENERATES a completely new signal from the lossy audio one by GUESSING the (probable) lossless source for.

    The concept behind this machine/deep learning approach is pretty simple: if a neural network "understand", through a HUGE number of examples (= files), which mathematical/statistical correlation occurs between a lossless signal and its lossy version, then it will then be able to generate the opposite.
     
    Last edited: Jun 27, 2025
  5. forart.it

    forart.it Kapellmeister

    Joined:
    May 5, 2023
    Messages:
    148
    Likes Received:
    74
    @ArticStorm I'm really sorry I missed the reply to your question!
    It all depends on the dataset hugeness the machine learning is feeded with: more "examples" usually brings to better results, since the (mathematical/statistical) correlation between source and destination signals correlation is more clearly "understood" by the neural network.

    I strongly recommend you and @ClarSum to check this very clear and simple explaination video by Leo Gibson on how (NAM, but it's basically the same for all) neural networks works:


    Last but not least, I invite anyone to check - and, why not, contribute to - the (WIP) AI-based audio resources collection I've realized for the HyMPS project.
     
    Last edited: Jun 27, 2025
  6. ArticStorm

    ArticStorm Moderator Staff Member

    Joined:
    Jun 7, 2011
    Messages:
    8,587
    Likes Received:
    4,511
    Location:
    AudioSexPro
    @forart.it problem is at some point a more huge dataset wont be enough, it will get saturated and a plateau is hit.
     
  7. forart.it

    forart.it Kapellmeister

    Joined:
    May 5, 2023
    Messages:
    148
    Likes Received:
    74
    Maybe, but of course there are also different approaches: NAM, for example, uses a special training input file to "probe" the hardware and clone its sound...
    ...this could be applied for lossy audio "restoring", IMHO.

    EDIT:
    Ryan ported Apollo model so it will be (probably) included in the next OpenVINO™ AI Plugins for Audacity version.
     
    Last edited: Jun 28, 2025
    • Interesting Interesting x 2
    • List
  8. forart.it

    forart.it Kapellmeister

    Joined:
    May 5, 2023
    Messages:
    148
    Likes Received:
    74
    [BREAKING] Python implementation released !

    A2SB: Audio-to-Audio Schrodinger Bridges
    Demopage: https://research.nvidia.com/labs/adlr/A2SB/
    Git: https://github.com/NVIDIA/diffusion-audio-restoration#readme
    HF: https://huggingface.co/nvidia/audio_to_audio_schrodinger_bridge
     
    • Interesting Interesting x 2
    • Like Like x 1
    • List
  9. ErnieBert

    ErnieBert Noisemaker

    Joined:
    Feb 11, 2025
    Messages:
    28
    Likes Received:
    5
    Thimeo - Stereo Tool - i use in WinAMP

    - Delossifier: Improves the sound of MPEG2/MP3 style lossy compressed files
     
    Last edited: Aug 13, 2025
  10. curtified

    curtified Audiosexual

    Joined:
    Feb 3, 2015
    Messages:
    1,019
    Likes Received:
    565
    I was about to post about Thimeo - Stereo Tool too! Ive tried a bunch of these delosifier apps, algos, and plugins. I will second this recommendation. Not only does it do that but it also has many other modules in it that can improve bad audio. It runs on all platforms, daws, even commandline.

    U can find it on sister site. And their forum has updated betas for mac and windows that work with it. :wink:

    I initially did a deep dive here because a lot of AI audio is trained on low bitrate music so it generates that as an output. Ive found Stereo Tool is the best for me in combination with other plugins and tools post processing.
     
    • Like Like x 1
    • Useful Useful x 1
    • List
  11. curtified

    curtified Audiosexual

    Joined:
    Feb 3, 2015
    Messages:
    1,019
    Likes Received:
    565
    I just "Vibe Coded" a gradio gui for inference with cursor and GPT5 model to run locally on my NVDIA 4090, Using that git link..

    I still think overall stereo tool wins.

    You are Cursor-Agent using GPT-5. Task: pull and run ONLY the inference parts of NVIDIA’s diffusion-audio-restoration on Windows 11 with an RTX 4090 + CUDA, then add a Gradio UI for local audio. Keep upstream code intact; put any new files under tools/gradio_app. Steps:
    1) Clone and pin repo
    - Create a clean workspace folder DiffusionAudioA2SB
    - git clone https://github.com/NVIDIA/diffusion-audio-restoration.git .
    - Do NOT edit existing tracked files. New code lives under tools/gradio_app and scripts/.
    2) Create Windows CUDA Python env (conda preferred)
    - Create conda env a2sb with Python 3.10
    - Activate env
    - Install PyTorch + CUDA 12.1 wheels compatible with 4090:
    pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
    - Install required libs (from README “Requirements”):
    pip install numpy scipy matplotlib jsonargparse librosa soundfile einops pytorch-lightning rotary-embedding-torch ssr-eval torchaudio
    - Verify CUDA:
    python - << "PY"
    import torch; assert torch.cuda.is_available(); print("CUDA OK", torch.version.cuda, torch.cuda.get_device_name(0))
    PY
    3) Model checkpoints
    - Note: NVIDIA’s A2SB page says “Code & Model Checkpoints (coming soon).” If official checkpoints are published later, add a small downloader script at scripts/fetch_checkpoints.py that pulls them to weights/a2sb/ and document expected filenames.
    - Until official weights exist, structure the code to look for:
    weights/a2sb/a2sb_44k_be_split0.ckpt
    weights/a2sb/a2sb_44k_be_split1.ckpt
    weights/a2sb/a2sb_inpaint_split0.ckpt
    weights/a2sb/a2sb_inpaint_split1.ckpt
    - The inference API should gracefully error if files are missing with a clear message showing where to place them.
    4) Wire up inference-only entry points
    - Do NOT include or run any training code.
    - Use the repo’s existing inference utilities referenced in README:
    - inference/A2SB_upsample_api.py for bandwidth extension of arbitrary length audio with automatic rolloff detection
    - inference/A2SB_inpaint_dataset.py and related scripts for inpainting
    - Create a thin Python adapter tools/gradio_app/a2sb_runner.py that:
    - Loads CUDA device and checkpoints
    - Exposes two functions:
    restore_bandwidth(input_wav_path: str, n_steps: int, rolloff_hz: float|None, chunk_seconds: float, overlap_seconds: float) -> output_wav_path
    inpaint_gaps(input_wav_path: str, gap_ms: int, every_sec: float, n_steps: int, chunk_seconds: float, overlap_seconds: float) -> output_wav_path
    - Internally call the logic from inference/* files rather than duplicating models
    - Auto-detect sample rate, convert to 44.1k if needed, preserve channels if supported by the repo. If stereo is not supported, downmix to mono with a warning.
    5) Add a Gradio UI for local audio
    - File: tools/gradio_app/app.py
    - Features:
    - Upload or path picker for local WAV/FLAC/MP3 (librosa handles decode; write temp WAV at 44.1k)
    - Task toggle: “Bandwidth Extension” or “Inpainting”
    - Shared sliders with sensible defaults aligned to README guidance:
    - n_steps: int slider [25..400] default 200 (document that more steps may improve quality at cost of time)
    - chunk_seconds: float slider [2.0..10.0] default 3.0 (used for long audio tiling)
    - overlap_seconds: float slider [0.25..2.0] default 0.75 (for smoother stitching)
    - Bandwidth Extension specific:
    - rolloff_mode: dropdown [auto, custom] default auto
    - rolloff_hz: float slider [2000..12000] default 4000 visible when custom
    - Inpainting specific:
    - gap_ms: int slider [100..2000] default 1000
    - every_sec: float slider [2.0..10.0] default 5.0
    - Buttons: “Process” and “Open Output Folder”
    - Outputs: downloadable restored WAV + simple waveform preview
    - Gradio version: 4.x
    - All defaults set to conservative values recommended for quality; ensure GPU torch.no_grad() context and autocast where safe.
    6) CLI wrappers
    - Create scripts/run_be_api.bat:
    - Example: python inference\A2SB_upsample_api.py -f "<in.wav>" -o "outputs\restored.wav" -n 200
    - Create scripts/launch_gradio.bat to run the UI: python tools\gradio_app\app.py
    - Windows-friendly paths; create outputs/ if missing.
    7) Robustness and UX
    - If checkpoints are missing, show a single clear dialog in Gradio with the expected filenames and the local weights directory to drop them into.
    - Validate CUDA memory. If OOM, suggest reducing chunk_seconds or n_steps.
    - Log timing and GPU name at start of each run.
    8) README patch (new file docs/RUN_INFERENCE_WINDOWS.md)
    Include:
    - Env setup
    - How to place checkpoints
    - How to run scripts and the Gradio UI
    - Troubleshooting: CUDA driver versions, libsndfile install on Windows, FFmpeg optional for MP3.
    9) Execute now
    - Run the commands to set up the env and launch the Gradio UI so I can test locally:
    - conda create -y -n a2sb python=3.10
    - conda activate a2sb
    - pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
    - pip install numpy scipy matplotlib jsonargparse librosa soundfile einops pytorch-lightning rotary-embedding-torch ssr-eval gradio==4.44.0
    - python - << "PY"
    import torch; print(torch.__version__, torch.version.cuda, torch.cuda.is_available(), torch.cuda.get_device_name(0))
    PY
    - mkdir -p weights/a2sb outputs tools/gradio_app scripts docs
    - Create the files specified above
    - python tools/gradio_app/app.py
    10) Deliverables
    - tools/gradio_app/app.py and a2sb_runner.py
    - scripts/run_be_api.bat and scripts/launch_gradio.bat
    - docs/RUN_INFERENCE_WINDOWS.md
    - No changes to existing repo files
    - Print next steps and where to drop checkpoints once the UI starts
    Now start.
     
    Last edited: Aug 13, 2025
    • Funny Funny x 1
    • Interesting Interesting x 1
    • List
  12. forart.it

    forart.it Kapellmeister

    Joined:
    May 5, 2023
    Messages:
    148
    Likes Received:
    74
    There's a similar function in RX too, but deep learning/neural networks approach should be (mutch) more effective.

    ...what about a jupiter notebook (colab) to let anyone test it ?

    According to their tests it seems to produce better results than others (note: I've never heard about CQTDiff nor IBAR):
    [​IMG]


    ...anyway I would really like to see some "official" listening tests @ Hydrogenaudio...
     
    Last edited: Aug 14, 2025
  13. curtified

    curtified Audiosexual

    Joined:
    Feb 3, 2015
    Messages:
    1,019
    Likes Received:
    565
    I can try to vibe code one for colab. I think colab has AI built in now so u can just give it the github to test im assuming?
     
  14. curtified

    curtified Audiosexual

    Joined:
    Feb 3, 2015
    Messages:
    1,019
    Likes Received:
    565

    Attached Files:

  15. ceo54

    ceo54 Producer

    Joined:
    Jan 28, 2019
    Messages:
    307
    Likes Received:
    94
    I ran it on Colab yesterday only to run out of RAM. I tried lower settings -n 30 that ended in the same way. I also found out that the package doesn't have a requirements.txt so I manullay downloaded every dependency painstakingly. And I also found out that it was trained on only 1 channel so the output will always be a mono file. I tried to get around the issue by splitting the left and right channel and processing them seperately only to run out of Ram yet again. I finally gave up without seeing the results.

    I don't think there's anything out there that can beat the Sony's DSEE yet. But it's not available as a standalone or as a plugin so there's no way to use it.
     
  16. ceo54

    ceo54 Producer

    Joined:
    Jan 28, 2019
    Messages:
    307
    Likes Received:
    94
    "google colab prompt feature"

    What is it, could you please expand on that ? And if you did successfully ran it, may you share the results ? Audio files or spectrograms ?

    Thank you kindly.
     
  17. ceo54

    ceo54 Producer

    Joined:
    Jan 28, 2019
    Messages:
    307
    Likes Received:
    94
    What is that GroundTruth in your spectrogram comparison chart ? I did a Google search but it came out empty handed. Looks interesting enough to try it out.
     
  18. xorome

    xorome Audiosexual

    Joined:
    Sep 28, 2021
    Messages:
    1,521
    Likes Received:
    1,144
    Probably: "Ground truth" is the source's original spectrum. "Degraded" is the spectrum after encoding "Ground truth" to a lossy format. The other spectra are reconstructed from "Degraded" using AI, which are then compared to "Ground truth" to judge the reconstruction's accuracy.
     
    • Agree Agree x 2
    • Like Like x 1
    • List
  19. forart.it

    forart.it Kapellmeister

    Joined:
    May 5, 2023
    Messages:
    148
    Likes Received:
    74
    Yes, it's very bad implementation: it needs mutch more refinements (starting from requirements.txt).

    Stereo is supported too, but you have to change the config inside A2SB_lightning_module.py / A2SB_lightning_module_api.py (but don't ask me how).

    Check the demo page: https://research.nvidia.com/labs/adlr/A2SB/
     
  20. Legotron

    Legotron Audiosexual

    Joined:
    Apr 24, 2017
    Messages:
    2,451
    Likes Received:
    2,379
    Location:
    Hyperborea
    @curtified I tried your prompt with Gemini, but all I got was error. I wish I could code or compile stuff.. If someone manage to make it executable for windows, please share it here, I´d like to test it with historical records. TIA
     
Loading...
Loading...