UVR5 the Best AI stem separation algo?

Deuterium · Oct 11, 2023

very impressive

VocalEnthusiast · Oct 15, 2023

I have access to Ripple's model. The model it uses is hosted on SAMI ByteDance servers, but I reverse-engineered the iOS app to see how it makes the requests, and wrote a script to make requests for me to separate any audio file I want into two or four stems. It uses the Spongeband API server and an Amazon S3 proxy to upload audio files.

If anybody wants me to separate a track for them, drop me a message.

But so far I'm actually not at all impressed with the quality of their model. Kim's vocal models and MDX23C seem to be far superior than what I'm getting out of Ripple's model. Those models are also the best and most reliable that I've used.

Maybe SAMI ByteDance aren't using their best model for the Ripple iOS app, but surely it's in their interests to use the best model they have? Maybe they're using a much smaller model to keep processing times down, seeing as it's hosted on their servers. Their model only seems to take 5-10 seconds to do a two stem separation on a 200+ second audio file.

jarredou · Oct 15, 2023

HansPaetsch said: ↑

Hi,

can somebody explain to me, what is meant by the UVR-algorithms, which have the word "MGM" in it? Is it made for MGM (Metro-Goldwyn-Mayer) Recordings?
Click to expand...

it's an old naming convention for VR arch models, it just means "Multi Genre Model".

jarredou · Oct 15, 2023

VocalEnthusiast said: ↑

I have access to Ripple's model. The model it uses is hosted on SAMI ByteDance servers, but I reverse-engineered the iOS app to see how it makes the requests, and wrote a script to make requests for me to separate any audio file I want into two or four stems. It uses the Spongeband API server and an Amazon S3 proxy to upload audio files.

If anybody wants me to separate a track for them, drop me a message.

But so far I'm actually not at all impressed with the quality of their model. Kim's vocal models and MDX23C seem to be far superior than what I'm getting out of Ripple's model. Those models are also the best and most reliable that I've used.

Maybe SAMI ByteDance aren't using their best model for the Ripple iOS app, but surely it's in their interests to use the best model they have? Maybe they're using a much smaller model to keep processing times down, seeing as it's hosted on their servers. Their model only seems to take 5-10 seconds to do a two stem separation on a 200+ second audio file.
Click to expand...

Yeah, it's a more lightweight version to speed up the use in Ripple, with also no overlap (so it potentialy generates clicks every 8 seconds, at audio chunk boundaries).

Congrats for the API requests reverse-engineering ! I know that some people from Audio Separation discord server have tried but failed to sort it out. It seems that their Models/API are also used in capcut.cn software, I don't know which version of it is used (light or full) :

Last edited by a moderator: Oct 15, 2023

BlackHaze1986 · Oct 15, 2023

clone said: ↑

RipX is very impressive. Way better than Izotope Music Rebalance with what I have used them on. But it is rather slow processing.

Acon Digital just released a plugin called Remix which I have not read much about or tried at all. But I am very interested to soon, because their general audio restoration plugins are very good. They describe it as "Real-Time" and low cpu usage. But these companies make lots of claims, as we all know.

Click to expand...

Yes we know and maybe its true if you use todays top notch CPU's but such a phrase without pointing out any system specs which where used to test CPU usage is useless in my opinion. By the Way thx @clone for the hint to the Remix Plugin does anybody know how it compares to the Stem Seperation used in Serato Sample 2.0 is it better or on the same level might give it a try.

Lanz · Oct 15, 2023

Who can advise on removing dialog from video - by example, movies, TV series? I've looked through a lot of topics and found nothing useful.

deton24 · Oct 16, 2023

Something working well for instrumentals or vocals, will also make decent job for dialogues in videos.
Look around:
https://docs.google.com/document/d/...bAxEh_OBv94ZdRG5c/edit#heading=h.rz0d5zk9ms4w

Although there are dedicated models for audio tracks from video separation like CDX23 available on mvsep.com too.

BlackHaze1986 · Oct 16, 2023

Lanz said: ↑

Who can advise on removing dialog from video - by example, movies, TV series? I've looked through a lot of topics and found nothing useful.
Click to expand...

Just remove the Soundtrack file with an Videocoverter and you good to go or do you mean keep the Music etc. and remove only the dialog?

Lanz · Oct 16, 2023

BlackHaze1986 said: ↑

Just remove the Soundtrack file with an Videocoverter and you good to go or do you mean keep the Music etc. and remove only the dialog?
Click to expand...

Yes, delete only the dialogs. I understand that you can do 5.1, but what about stereo?

BlackHaze1986 · Oct 16, 2023

Lanz said: ↑

Yes, delete only the dialogs. I understand that you can do 5.1, but what about stereo?
Click to expand...

Seperate the Soundtrack from the Video and let the AI of your Choice remove the Vocals after that you can add the Soundtrack back to the Video. But i don't get it why you would like to have a TV Series or Movie without Dialog is there a special reason? The only Reason which comes to my Head would be a Fan Dub.

Last edited: Oct 16, 2023

Lanz · Oct 16, 2023

BlackHaze1986 said: ↑

Seperate the Soundtrack from the Video and let the AI of your Choice remove the Vocals after that you can add the Soundtrack back to the Video. But i don't get it why you would like to have a TV Series or Movie without Dialog is there a special reason? The only Reason which comes to my Head would be a Fan Dub.
Click to expand...

Yes, almost fan dubbing, trying to learn from the example of movies to work with dialog and effects and want to be good and not be in the background of anything. Obviously, I'm doing it for me. Maybe there are some good options?

BlackHaze1986 · Oct 17, 2023

Lanz said: ↑

Yes, almost fan dubbing, trying to learn from the example of movies to work with dialog and effects and want to be good and not be in the background of anything. Obviously, I'm doing it for me. Maybe there are some good options?
Click to expand...

Doing it like i wrote above would be a good choice to keep the Music and FX and remove the dialog.

Dyslexicon · Nov 5, 2023

Anywhere to try the new Kimberly Jensen vocal model? (5390 Kimberley Jensen ISMIR 2023 Oct28 leaderboard post)

And- is there a full-spectrum Vocal model that beats htdemucs_ft at both accuracy and fidelity yet?
These MDX2023 models beat htdemucs_ft at accuracy (as in not confusing guitars with vocals), but theres still a bunch of spectral garbage in these newer models.

I wish Stem engineers would start reviewing the results of their work, by taking a look in Spek
https://www.spek.cc/

Last edited: Nov 5, 2023

ElectroCity · Nov 5, 2023

clone said: ↑

RipX is very impressive. Way better than Izotope Music Rebalance with what I have used them on. But it is rather slow processing.

Acon Digital just released a plugin called Remix which I have not read much about or tried at all. But I am very interested to soon, because their general audio restoration plugins are very good. They describe it as "Real-Time" and low cpu usage. But these companies make lots of claims, as we all know.

Click to expand...

i tried remix. is absolute trash. can't do shit. only good for separating the kick.

Trurl · Nov 5, 2023

Apparently now MAL is the best tool for separation... although maybe only for Beatles songs... (not that we'll ever know)

clone · Nov 5, 2023

ElectroCity said: ↑

i tried remix. is absolute trash. can't do shit. only good for separating the kick.
Click to expand...

That was sort of my expectations. Just from seeing how long RipX and Demucs options take to process these separations, I figured "realtime" anything wouldn't be even close to as good. Fuse Audio Labs already has one called DrumsSSX which is something like "Realtime", as it doesn't have any sort of "offline processing" time used either. One I like even more for that is BlueCat's MB-7; it basically works the same way with multi-band frequency separation and you get a channel strip for each band and can load your own FX plugins to each "virtual channel". I am a skeptic when I see claims of "first" whatever they are selling. Thanks though, I will not waste my time with this one.

Myfanwy · Nov 5, 2023

Virtual DJ is still as "realtime" as you can get, and it is very good compared to other algorithms. It starts analyzing as soon as you load a track, and after one or two seconds, you can mix with stems while VDJ is processing the whole track in the background.

Good quality low latency real time stem separation will never be possible, because it is impossible to "decide" in blocks of let's say 256 samples which part of the signal belongs to vocals, bass or drums.

deton24 · Nov 17, 2023

Dyslexicon said: ↑

And- is there a full-spectrum Vocal model that beats htdemucs_ft at both accuracy and fidelity yet?
These MDX2023 models beat htdemucs_ft at accuracy (as in not confusing guitars with vocals), but theres still a bunch of spectral garbage in these newer models.
Click to expand...

Maybe Demucs 2023 Vocals and VitLarge23 models on MVSEP.com, but I'm not sure whether they're fullband. Check them out.

They're also on ZFTurbo GitHub.

Dyslexicon · Nov 18, 2023

MVSEP 4-model ensemble beats htdemucs_ft, but it still leaves spectral garbage in the background. This can be removed by running a noise gate around -65dB or so.

Need an audiophile quality control officer to check the work of these programming gurus lol
(I'm able and willing )

RaspberriesOver · Nov 21, 2023

Try Demix Pro v5 - it does lead/backing vocal separation as well as standard vocal seps and the quality is great
https://www.audiosourcere.com/products/demix-pro-audio-separation-software

Similar Threads - UVR5 Best stem	Forum	Date
best settings for voice extraction with UVR5 ?	Software	Jan 2, 2024
UVR5 (Ultimale vocal remover) - how to get rid of numbers and underscore in new file?	Software	May 12, 2024
House music lovers - help ! (House Music - Best Course / Channel)	Education	May 22, 2026
5 WORST and 5 BEST Guitar Pedal Brands You Can Buy	Soundgear	May 5, 2026
best headphones ever made	Lounge	Apr 24, 2026

UVR5 the Best AI stem separation algo?

Deuterium Kapellmeister

VocalEnthusiast Newbie

jarredou Guest

jarredou Guest

BlackHaze1986 Rock Star

Lanz Newbie

deton24 Member

BlackHaze1986 Rock Star

Lanz Newbie

BlackHaze1986 Rock Star

Lanz Newbie

BlackHaze1986 Rock Star

Dyslexicon Member

ElectroCity Ultrasonic

Trurl Audiosexual

clone Audiosexual

Myfanwy Platinum Record

deton24 Member

Dyslexicon Member

RaspberriesOver Newbie

PROFESSIONAL AUDIO LOVERS

UVR5 the Best AI stem separation algo?

Deuterium Kapellmeister

VocalEnthusiast Newbie

jarredou Guest

jarredou Guest

BlackHaze1986 Rock Star

Lanz Newbie

deton24 Member

BlackHaze1986 Rock Star

Lanz Newbie

BlackHaze1986 Rock Star

Lanz Newbie

BlackHaze1986 Rock Star

Dyslexicon Member

ElectroCity Ultrasonic

Trurl Audiosexual

clone Audiosexual

Myfanwy Platinum Record

deton24 Member

Dyslexicon Member

RaspberriesOver Newbie

Useful Searches