UVR5 the Best AI stem separation algo?

Discussion in 'Software' started by curtified, Feb 27, 2023.

  1. Deuterium

    Deuterium Kapellmeister

    Joined:
    Oct 15, 2021
    Messages:
    117
    Likes Received:
    44
    very impressive
     
  2. VocalEnthusiast

    VocalEnthusiast Newbie

    Joined:
    Oct 15, 2023
    Messages:
    1
    Likes Received:
    2
    I have access to Ripple's model. The model it uses is hosted on SAMI ByteDance servers, but I reverse-engineered the iOS app to see how it makes the requests, and wrote a script to make requests for me to separate any audio file I want into two or four stems. It uses the Spongeband API server and an Amazon S3 proxy to upload audio files.

    If anybody wants me to separate a track for them, drop me a message.

    But so far I'm actually not at all impressed with the quality of their model. Kim's vocal models and MDX23C seem to be far superior than what I'm getting out of Ripple's model. Those models are also the best and most reliable that I've used.

    Maybe SAMI ByteDance aren't using their best model for the Ripple iOS app, but surely it's in their interests to use the best model they have? Maybe they're using a much smaller model to keep processing times down, seeing as it's hosted on their servers. Their model only seems to take 5-10 seconds to do a two stem separation on a 200+ second audio file.
     
  3. jarredou

    jarredou Guest

    it's an old naming convention for VR arch models, it just means "Multi Genre Model".
     
  4. jarredou

    jarredou Guest

    Yeah, it's a more lightweight version to speed up the use in Ripple, with also no overlap (so it potentialy generates clicks every 8 seconds, at audio chunk boundaries).

    Congrats for the API requests reverse-engineering ! I know that some people from Audio Separation discord server have tried but failed to sort it out. It seems that their Models/API are also used in capcut.cn software, I don't know which version of it is used (light or full) :
    upload_2023-10-15_19-17-11.png
     
    Last edited by a moderator: Oct 15, 2023
  5. BlackHaze1986

    BlackHaze1986 Rock Star

    Joined:
    Jun 25, 2014
    Messages:
    746
    Likes Received:
    362
    Yes we know and maybe its true if you use todays top notch CPU's but such a phrase without pointing out any system specs which where used to test CPU usage is useless in my opinion. By the Way thx @clone for the hint to the Remix Plugin does anybody know how it compares to the Stem Seperation used in Serato Sample 2.0 is it better or on the same level might give it a try.
     
    • Interesting Interesting x 1
    • List
  6. Lanz

    Lanz Newbie

    Joined:
    Oct 15, 2023
    Messages:
    3
    Likes Received:
    0
    Who can advise on removing dialog from video - by example, movies, TV series? I've looked through a lot of topics and found nothing useful.
     
  7. deton24

    deton24 Member

    Joined:
    Jun 23, 2023
    Messages:
    9
    Likes Received:
    14
    • Interesting Interesting x 1
    • List
  8. BlackHaze1986

    BlackHaze1986 Rock Star

    Joined:
    Jun 25, 2014
    Messages:
    746
    Likes Received:
    362
    Just remove the Soundtrack file with an Videocoverter and you good to go or do you mean keep the Music etc. and remove only the dialog?
     
  9. Lanz

    Lanz Newbie

    Joined:
    Oct 15, 2023
    Messages:
    3
    Likes Received:
    0
    Yes, delete only the dialogs. I understand that you can do 5.1, but what about stereo?
     
  10. BlackHaze1986

    BlackHaze1986 Rock Star

    Joined:
    Jun 25, 2014
    Messages:
    746
    Likes Received:
    362
    Seperate the Soundtrack from the Video and let the AI of your Choice remove the Vocals after that you can add the Soundtrack back to the Video. But i don't get it why you would like to have a TV Series or Movie without Dialog is there a special reason? The only Reason which comes to my Head would be a Fan Dub.
     
    Last edited: Oct 16, 2023
  11. Lanz

    Lanz Newbie

    Joined:
    Oct 15, 2023
    Messages:
    3
    Likes Received:
    0
    Yes, almost fan dubbing, trying to learn from the example of movies to work with dialog and effects and want to be good and not be in the background of anything. Obviously, I'm doing it for me. Maybe there are some good options?
     
  12. BlackHaze1986

    BlackHaze1986 Rock Star

    Joined:
    Jun 25, 2014
    Messages:
    746
    Likes Received:
    362
    Doing it like i wrote above would be a good choice to keep the Music and FX and remove the dialog.
     
  13. Dyslexicon

    Dyslexicon Noisemaker

    Joined:
    Mar 19, 2023
    Messages:
    23
    Likes Received:
    4
    Anywhere to try the new Kimberly Jensen vocal model? (5390 Kimberley Jensen ISMIR 2023 Oct28 leaderboard post)

    And- is there a full-spectrum Vocal model that beats htdemucs_ft at both accuracy and fidelity yet?
    These MDX2023 models beat htdemucs_ft at accuracy (as in not confusing guitars with vocals), but theres still a bunch of spectral garbage in these newer models.

    I wish Stem engineers would start reviewing the results of their work, by taking a look in Spek
    https://www.spek.cc/
     
    Last edited: Nov 5, 2023
  14. ElectroCity

    ElectroCity Ultrasonic

    Joined:
    Aug 16, 2023
    Messages:
    29
    Likes Received:
    38
    i tried remix. is absolute trash. can't do shit. only good for separating the kick.
     
  15. Trurl

    Trurl Audiosexual

    Joined:
    Nov 17, 2019
    Messages:
    2,480
    Likes Received:
    1,464
    Apparently now MAL is the best tool for separation... although maybe only for Beatles songs... :dunno: (not that we'll ever know)
     
    • Like Like x 1
    • Funny Funny x 1
    • List
  16. clone

    clone Audiosexual

    Joined:
    Feb 5, 2021
    Messages:
    7,438
    Likes Received:
    3,280
    That was sort of my expectations. Just from seeing how long RipX and Demucs options take to process these separations, I figured "realtime" anything wouldn't be even close to as good. Fuse Audio Labs already has one called DrumsSSX which is something like "Realtime", as it doesn't have any sort of "offline processing" time used either. One I like even more for that is BlueCat's MB-7; it basically works the same way with multi-band frequency separation and you get a channel strip for each band and can load your own FX plugins to each "virtual channel". I am a skeptic when I see claims of "first" whatever they are selling. Thanks though, I will not waste my time with this one.
     
  17. Myfanwy

    Myfanwy Platinum Record

    Joined:
    Sep 16, 2020
    Messages:
    401
    Likes Received:
    182
    Virtual DJ is still as "realtime" as you can get, and it is very good compared to other algorithms. It starts analyzing as soon as you load a track, and after one or two seconds, you can mix with stems while VDJ is processing the whole track in the background.

    Good quality low latency real time stem separation will never be possible, because it is impossible to "decide" in blocks of let's say 256 samples which part of the signal belongs to vocals, bass or drums.
     
  18. deton24

    deton24 Member

    Joined:
    Jun 23, 2023
    Messages:
    9
    Likes Received:
    14
    Maybe Demucs 2023 Vocals and VitLarge23 models on MVSEP.com, but I'm not sure whether they're fullband. Check them out.

    They're also on ZFTurbo GitHub.
     
  19. Dyslexicon

    Dyslexicon Noisemaker

    Joined:
    Mar 19, 2023
    Messages:
    23
    Likes Received:
    4
    MVSEP 4-model ensemble beats htdemucs_ft, but it still leaves spectral garbage in the background. This can be removed by running a noise gate around -65dB or so.

    Need an audiophile quality control officer to check the work of these programming gurus lol
    (I'm able and willing :wink:)
     
  20. RaspberriesOver

    RaspberriesOver Newbie

    Joined:
    Nov 21, 2023
    Messages:
    2
    Likes Received:
    0
Loading...
Loading...