AI Cloning Voice and instruments

Discussion in 'Music' started by Ryck, Jun 3, 2023.

  1. Ryck

    Ryck Guest

    Hello friends, how are you after such a long time? I hope you're doing well.
    It turns out that a few days ago I came across AI-generated covers. For example, I listened to McCartney singing Queen songs and many other songs, and it blew my mind. So I started looking for how to do it.


    Until I found tutorials on YouTube on how to make them and a Discord community where they lend a helping hand. So I started using a training Colab and I was truly amazed. I cloned my voice with just 50 epochs, and it already sounds somewhat decent. Of course, my voice was recorded here at my house, and for the AI, it's a clean reference, meaning I didn't have to separate the vocals from the music. That's why I think it sounds "good" with just 50 epochs. The Colab process can be tedious because it has some errors, and sometimes you have to start over again and so on.

    That's amazing! I'm glad you're exploring the possibilities of AI-generated instrument cloning. It's a technique that can be quite helpful, especially when dealing with instruments that may sound out of tune or of lower quality. By training the AI, you can achieve better results.

    It's interesting to hear that you've tried cloning a bass and a saxophone, and they sound quite good even with just 50 epochs. Keep in mind that training for more epochs, such as 500, might further enhance the results. However, it's important to note that the Colab platform has limitations, and you have to be mindful of the time limit.

    As for polyphonic instruments like an acoustic guitar, it seems that the AI you're using can only handle monophonic cloning at the moment. It's possible that there are limitations or techniques you're not fully aware of yet. Exploring and experimenting further may help you understand the capabilities and limitations of the AI in this regard.

    Remember to keep learning and experimenting, and don't hesitate to reach out to the community or forums for guidance and insights.

    Anyway, I find this fascinating. I believe a new era is coming with AI that will revolutionize the entire music industry. I know some of you will like it, while others may dislike it. However, I also believe it's inevitable to move forward. But beyond that, I wanted to know about your experiences. Have you tried it? What do you think about it? Do you use it? Are you familiar with other methods of AI cloning?
     
    • Like Like x 3
    • Interesting Interesting x 1
    • List
  2.  
  3. BEAT16

    BEAT16 Audiosexual

    Joined:
    May 24, 2012
    Messages:
    9,081
    Likes Received:
    6,995
    Thank you @Ryck for sharing your personal experience with AI here.

    I am waiting for a free KI VSTi / VSTi for my DAW. Then I will certainly also test times.
    Maybe then people with a bad voice but still have a chance to sing to your music.

    Surely in the end many people will be unemployed - who then have to look for a hobby because of the many free time.
    Some probably end up with cracked software - so with us and our sister site - because the income is rationalized
    away because of the AI and other robots and only a subsistence minimum is available.
     
  4. Ryck

    Ryck Guest

    Are you waiting for a VST that clones voices like AI? I didn't understand.

    Yes, I have been thinking about what you mentioned for a while now, and in fact, it's something that everyone should consider. There are always news and updates on social media or the internet about AI and its advancements, and how accessible it is to anyone of us.
    And people in the comments on social media always talk in the third person when referring to AI, meaning they blame someone who intends to leave us all without jobs. But in reality, no one forces us to use AI; we do it willingly. From my point of view, AI is not something new. Technology has been replacing human labor for a long time.

    For example, before most of us had access to a computer, we would go to a studio to record, and if we wanted to play an instrument, we would seek out a musician. Now, with PCs and virtual instruments, we have reduced the need for all that labor: the recording studio, bassist, drummer, and even singers. But nobody came to our houses and forced us to use PCs and DAWs. I believe that we are replacing one form of labor with another. And yes, there will come a time when our professions will be in the hands of AI. I think that by then, we will have some form of "universal income" so that everyone can choose what to consume from AI, and the money generated would return as "universal income" and so on. I'm not sure if it's clear, but I believe that we will reach something like that in the future.
     
  5. Maxim2018

    Maxim2018 Noisemaker

    Joined:
    Jun 12, 2017
    Messages:
    16
    Likes Received:
    3
    If it's not difficult for you, maybe share links to what you watched on YouTube, and what you eventually used yourself?
     
  6. Ryck

    Ryck Guest

    Sure, but the tutorials I've seen and the Colab are in Spanish, no problem?.

    You can also create your own voice cloning using OpenAI. It helps and guides you through everything you need to create a cloning environment, whether in the cloud or on a local server

    Here a example from OpenAI:




    Certainly! Here's a step-by-step guide on how to create a voice cloning using Colab:

    1. Open your web browser and go to the Google Colab website (colab.research.google.com).
    2. Sign in to your Google account if prompted.
    3. Click on "New Notebook" to create a new Colab notebook.
    4. In the first cell of the notebook, install the required libraries by running the following command:
    sql
    !pip install git+https://github.com/CorentinJ/Real-Time-Voice-Cloning.git

    1. Create a new cell and import the necessary modules with the following code:
    python
    import os
    import sys
    import numpy as np
    import torch
    import gdown
    import IPython.display as ipd
    from IPython.utils import io

    1. Next, download the pretrained models by executing the following commands in separate cells:
    python
    !pip install gdown
    !gdown --id 1n_kLJ32sJ2BgZm4o91ONwj1dNL6PYSaT -O encoder.pt
    !gdown --id 1m_ZzK7vzqjmf_s5-6J-_CAmTp4HJiLcF -O vocoder.pt
    !gdown --id 1n2H8NJm8OwBvM1LJkhq5mYunN5pd6jFD -O synthesizer.pt

    1. Create a new cell and load the downloaded models with the following code:
    python
    from synthesizer.inference import Synthesizer
    from encoder import inference as encoder
    from vocoder import inference as vocoder

    encoder.load_model("encoder.pt")
    synthesizer = Synthesizer("synthesizer.pt")
    vocoder.load_model("vocoder.pt")

    1. Now, you can clone a voice using the following code:
    python
    # Record a voice sample (replace 'path/to/your/voice/sample.wav' with the actual path)
    voice_sample_path = 'path/to/your/voice/sample.wav'
    ipd.Audio(voice_sample_path)

    # Convert the voice sample into embeddings
    preprocessed_wav = encoder.preprocess_wav(voice_sample_path)
    embed = encoder.embed_utterance(preprocessed_wav)

    # Generate a voice using the embeddings
    specs = synthesizer.synthesize_spectrograms([text], [embed])
    generated_wav = vocoder.infer_waveform(specs[0])

    # Save the generated voice as a WAV file
    output_path = 'path/to/save/generated/voice.wav'
    vocoder.save_wav(generated_wav, output_path)

    Remember to replace 'path/to/your/voice/sample.wav' with the actual path to your voice sample and 'path/to/save/generated/voice.wav' with the desired path to save the generated voice.

    That's it! You have now created a voice cloning using Colab. Feel free to experiment with different voice samples and texts to generate customized voices.
     
    • Like Like x 2
    • Interesting Interesting x 1
    • List
  7. Legotron

    Legotron Audiosexual

    Joined:
    Apr 24, 2017
    Messages:
    1,975
    Likes Received:
    1,876
    Location:
    Hyperborea
  8. Ryck

    Ryck Guest

  9. Legotron

    Legotron Audiosexual

    Joined:
    Apr 24, 2017
    Messages:
    1,975
    Likes Received:
    1,876
    Location:
    Hyperborea
    I haven´t yet got time to dive deep into this, I yet have undertand what all these methods are.. So-vits, RVC, Diff-SVC, Fish
    This is pretty good discord channel -> https://discord.gg/jvA5c2xzSE
     
    • Like Like x 1
    • Love it! Love it! x 1
    • List
  10. Trurl

    Trurl Audiosexual

    Joined:
    Nov 17, 2019
    Messages:
    2,480
    Likes Received:
    1,459
    This sounds really really interesting but I fear I'm at an age where I don't want to learn any new tricks. Gotta think about it.
     
  11. Xupito

    Xupito Audiosexual

    Joined:
    Jan 21, 2012
    Messages:
    7,102
    Likes Received:
    3,930
    Location:
    Europe
    As usual these new AI-based technologies evolve by the week.

    DISCLAIMER: IF YOU'RE FROM SPAIN PLEASE WATCH THIS AND PREPARE TO LAUGH LIKE THERE'S NO TOMORROW. I MEAN IT.
    IF YOU AREN'T NOT WORTH IT
    Just yesterday I watched a compilation of the most funny-naughty-style spanish streamer. He activated a text-to-speech AI-modeled after a (passed away) famous stand-up comedian. For his subscribers chat comments while he ate. Priceless.
     
  12. Ryck

    Ryck Guest

    I too am at an age where, like you, I think five times before learning something, as it is quite challenging for me. But believe me, it's worth it. Besides, AI is going to revolutionize absolutely everything as we know it. And it will be very useful for us to stay informed about the changes.
     
  13. Ryck

    Ryck Guest

  14. Ryck

    Ryck Guest

    Look guys, I want to show you the results, which, in my opinion, are very good. I obtained this model from a Discord channel. The first audio is my voice, and the second one is the cloned voice of Paul based on my voice.
     
    • Love it! Love it! x 2
    • Like Like x 1
    • List

    Attached Files:

    • yo.mp3
      File size:
      2.1 MB
      Views:
      70
    • Paul.mp3
      File size:
      852.8 KB
      Views:
      64
  15. Legotron

    Legotron Audiosexual

    Joined:
    Apr 24, 2017
    Messages:
    1,975
    Likes Received:
    1,876
    Location:
    Hyperborea
    Like you guys, I also think the age and spare time is my biggest obstacle on the way learning this stuff, also I´m no coder. But what I would like to do is to train own model at home. I have RTX3050, which should be good enough to train, also electricity comes with rent, so I can basically leave even for week to train, but there is so many things that twists this project all along on the road. Mostly technically, like epochs, overtraining..
     
  16. mk_96

    mk_96 Audiosexual

    Joined:
    Dec 31, 2020
    Messages:
    1,083
    Likes Received:
    755
    Location:
    Your heart
    I don't know what to think about this. On one side, i don't see why would anyone want to do this voice imitation thing on a serious context other than research.

    On the other hand...this masterpiece exists:
     
    • Interesting Interesting x 1
    • List
  17. RachProko

    RachProko Producer

    Joined:
    Sep 25, 2022
    Messages:
    263
    Likes Received:
    134
    The thing with AI is that It’s all very interesting and very cool that you can make Frank Sinatra, Bob Dylan or Aretha Franklin sing on your own 2 chord composition. And pretty soon your full orchestra arrangement of your 2 chord song will also be created by AI!

    I just wonder sometimes…what does it mean for humanity? What does it mean for the human creative mind? What’s the point for humans to learn about music, to master an instrument if you can just press a few buttons and let AI do it for you?

    What’s the future of artist’s performing live? Who will be able to play all this AI generated music live on stage?

    Yes, I know, AI is here and there’s no stopping it anymore! And everyone is so excited about all the new possibilities it will bring!

    But I wonder, will it really bring happiness to the (creative) human race? I sincerely doubt it.

    As some experts and even founders of AI have already warned us about. This may well be the prologue to the end of our civilization and even the end of humanity! And I'm afraid they may be right!

    Think about it. If AI will do it all for us and do it even better and faster, then what’s the point of our existence?
     
    Last edited: Jun 4, 2023
    • Like Like x 1
    • Interesting Interesting x 1
    • List
  18. thebert

    thebert Member

    Joined:
    Jan 15, 2014
    Messages:
    39
    Likes Received:
    8
    Wowee that's good. Did you use the tools you listed above or something different?
     
  19. Daisy69

    Daisy69 Platinum Record

    Joined:
    Oct 3, 2022
    Messages:
    543
    Likes Received:
    178
    It's a long way before it happens. Now it is even not real AI.
    There are many animals which are better than us in many aspects.
    Antilopes for example running much more faster than us but people still running and make Olympics and rivalising with each other and have in the ass that there is some creature which is 4x more faster.
    Don't worry :winker:
     
  20. phumb-reh

    phumb-reh Guest

    People said this about samplers and things like autotune. "Who needs to learn to sing anymore?!" and so on.

    Eventually someone will find a way to use and abuse new tech and come up with something new and creative. So I'm not expecting immediate doomsday for us.
     
  21. Ryck

    Ryck Guest

    AI is not something new, it is the modern use that is given to technology. AI is a technology.
    A calculator is a primitive AI, as it can do complex calculations.
    We have been using AI for a long time.
    If we talk about music, we have been using Daw, instead of going to a recording studio.
    You talk about touching a button and the music comes out. In the 80s synthesizers came out and just touching a button triggered an arpeggio, a drum, etc.
    In the field of factories, machines have long replaced the human hand.
    For example, before 1,000 people were needed to pack 1,000 alfajores, now a machine packs 1,000 alfajores in just a few minutes, so those 1,000 people in the past were left without work, well actually 1,000 people for each machine, you can't imagine the number of people who have lost their jobs. But in reality it is mutating, we are mutating from generation to generation. I also don't imagine how creative the AI will be, but since the human being is very creative, and super greatly the AI will find a way to be creative with the AI. and well, it will be as it has always happened, there will be people who will use AI to make music with very little creativity, and there will be other people who will use it to do more creative things.
    The point is that this has always happened for as long as I can remember.
     
Loading...
Loading...