How Do I Do This in Sound as it has been Done in Video?

Discussion in 'Working with Sound' started by jeffglobal, May 25, 2016.

  1. jeffglobal

    jeffglobal Producer

    May 4, 2016
    Likes Received:

    I would like to get my Machinima FUCs (Feline Urban Commandos) to have "realistic" tiny, female like voices. I've tried everything from MorphVox, to vsts to phone apps.

    This is using "test" software CrazyTalk8:

    What I really want, is to match the formant of a target with my speech for the dialogue I need to produce, analogous to how they show they can fake a world leader doing anything in REALTIME, live on Youtube...(you BET they can perfectly fake the voice, video is way harder because our visual processing is a large part of our brain and harder to fool than our ears.)

    The "size" of my voicebox and head seems to be the tell. I thought Melodyne could match the formant of a target track to process my voice, but idk y, but I just can't figure it out, or what it produces is not what I'm expecting. I don't "sound like a girl" I sound like a man trying to sound like a girl...

    I don't want to have Sia as one of my FUCs (well, I would, but I mean, I'm not the government, so I just don't take things from ppl) but I can't think of anything to do other than get my nieces to be voice actors and then process to my hearts content. My nieces are like cats and I've never been able to herd a cat...I think that is the problem with cats, or their strength...
    Last edited: May 25, 2016
  3. tulamide

    tulamide Audiosexual

    Feb 13, 2016
    Likes Received:
    You should look for pitch shifters that support elastique (2.8 or higher).
    For example, here I used a sample in Reaper, with ReaPitch (set to elastique pro 2.8, preserve lower pitches, pitch shift + 10 semitones, formant shift - 2), an eq (to cut a bit of the low end) and a reverb.

    That sample could be replaced just as well with mic input. The first half is the female, the second half the original male voice. Note: I only made a quick example (about 5 minutes of overall). With proper tweaking of the various parameters you might get even better results.

    But also be aware that pitch shifting, especially with formant shifting, will always keep the characteristics of your voice. If you want to sound like someone completely else, you might need something like vocaloid or tts combined with a software that converts you voice input into vowals and consonants for those engines to work with. But I have no clue if such things even exist.
  4. Nimbuss

    Nimbuss Platinum Record

    Nov 21, 2015
    Likes Received:
    Oh my word.. the future of politics is here, so what you guys are saying is you can do this in real time?? Wow now all you need is a pretty face to lead a nation hehe
  5. jeffglobal

    jeffglobal Producer

    May 4, 2016
    Likes Received:
    @tulamide Ty, I'll check out your suggestions. I have no doubt it exists somewhere, I just wonder what we as plebs have access to now. Vocaloid sounds familiar, tts I thought was the voices I use to make pdfs audiobooks, no? But yes, the whole purpose is the use of a target formant that is a real female voice to use what I say. Even if I can get my neices to record for a day, I was hoping I could use that to capture their formants for the rest of the stuff. I thought for certain Melodyne could do that. I have to play with that again...I'm a little too unfocused about stuff, but I'll retry from the beginning to do that. I remember for sure a Melodyne tutorial said you can use a track as a target for the formant with another track to use. I forgot why I stopped trying it. Could have been a shinny light got my attention for all I know.

    @Nimbuss NO, follow me. Any garbage actress or actor for that matter can be anyone. The most dangerous being an actual person.
    Say, the US wants to justify the invasion of the Ukraine with first use of nuclear weapons. Np getting Putin to say anything they want him to say, and with perfect fidelity (well better than we can tell the difference).
    My concern, is Trump must be squeaky clean if they can't dig anything real up on him, so this is gonna be used. Idk y, but the Owl worshipers (sic) need to tell ppl first what they're gonna do before they do it, (like fictional vampires have to be "invited in") and the youtube above, I take as their warning.

    It's cutzie, if it's like this:

    It's far more serious when the target is a person we know, like above, or the government wants to attack. I'm still looking for the youtube from Stanford something lab [Image Metrics] 7 years ago! [found it!!!!]

    that made a simulation of an actress so real, that when she sat and watched it said, and I quote, "If I didn't know it wasn't me, I wouldn't be able to tell it wasn't me."

    Until the 1:30 mark when they revert back to the source (the real actress), her entire face is being simulated by the technology.
    That was 7 years ago, ok 8 now, mofo.

    Now they don't even need to follow the target around hoping for him/her to do something stupid and record it, fk it, they'll just fabricate it. Entiendes? No more need for "parallel construction" because the government did something illegal, they can just make perfect fabrications! Oh crap, anyone?

    You now need NO ONE. Even a saint can be made for all to see and hear do heinous acts, with over the uncanny valley perfection.
    Last edited: May 26, 2016
  6. tulamide

    tulamide Audiosexual

    Feb 13, 2016
    Likes Received:
    @jeffglobal Yes, tts just means Text To Speech. My idea was to use what's already there. Some tts software have really convincing voices (for example acapela group). There is also speech recognition as part of HTML5 (for example here:, and explained here: digital inspiration blog)

    If you can find a way to feed the tts software continuously with the speech to text conversion, you're almost there.
    Or you find a programmer that combines a speech to text engine with a text to speech engine, while leaving out the text conversions and directly passing syllables (or whatever chunks are expected, like vowels or consonants) for faster reactions.

    Maybe somebody even programmed this already and you just have to find it. I mean, the technology is already there ( realtime voice conversion, iTranslate, ...), you just need to find something open source.
  7. jeffglobal

    jeffglobal Producer

    May 4, 2016
    Likes Received:
    You're definitely selling me on making nieces nauseated with ice cream so I can get them recorded...I looked at the melodyne adv tut and the target formant to apply to another track was not there. mofo. Maybe the guy said it in passing. All I know is, the day is almost over and all I did was learn Faded by Alan Walker. To me that was hard. I can almost sight read, so, yay. Now my fingers have to move unlike sausages.

    Well I also experimented with this other vocal Kontakt library with some pro female (Maybe Bohemian, idr) and it sounded really good actually. I made a 10-30 sec piece on how a man's life goes. I started with her singing a little and then there's another male/female vocal ensemble (Storm Choir) that if you combine two or three female phonemes together sounds like their saying "no." So I added a cinematic string accompaniment and there you have it. Sirens for men on ships to crash on the rocks.