Automatically place markers in a song (workarounds?)

MHEO · Jun 10, 2025

Looking for software that automatically analyzes a song and places markers (or creates regions/segments) at the start and end of sung passages. It doesn't exist. Easiest workarounds?

Ads Master

clone · Jun 10, 2025

zplane Decoda. https://products.zplane.de/products/decoda/

MHEO · Jun 10, 2025

Thanks, but it's not what I am looking for. I need a software with kind of "VAD" (Vocal Activity Detection) that keeps parts with singing voice and removes the rest. Not a voice isolator or a song structure analyzer. I know it does not exist, I just asked for simplest workarounds. Of course I can already do it manually, adding markers and exporting regions. I needed something faster, like an AI app that makes it for me. ChatGPT wasn't exactly my friend, I should be an expert programmer to code with C+ and python. I am just a musician. The process should be like the usual automatic "trim/crop" "remove pauses and keep the rest", but aimed to "remove parts with no singing and keep the rest". Someone suggested to play with gate and frequences, but the result is a mess.

Last edited: Jun 10, 2025

PulseWave · Jun 10, 2025

You are correct that most existing Voice Activity Detection (VAD) tools are designed to detect general speech—including both spoken and sung voice—but not specifically to isolate singing voice and remove everything else. Most VAD systems, such as Silero VAD, Picovoice Cobra, or PyAnnote Audio, are optimized for speech detection and can quickly trim audio to keep only sections with human voice, but they do not distinguish between singing and other types of speech or background music1 3 4 7.

Workarounds and Practical Options
1. Using VAD as a First Step

You can use a high-quality VAD tool (e.g., Silero VAD, Picovoice Cobra, or PyAnnote Audio) to automatically remove silence and non-voice sections from your audio. This will keep all segments with any human voice (including singing), but it will also keep spoken parts if present1 3 4 7.

This approach is fast and does not require programming skills if you use available GUI tools or web-based solutions, but it won't distinguish between singing and speaking.

2. Source Separation + VAD

Some musicians use source separation tools (like Spleeter or Demucs) to extract the vocal stem from a song. You could then run a VAD tool on the isolated vocal track to remove silent or non-vocal parts. This method still doesn’t distinguish between singing and talking, but it can help if your main concern is to get only the vocal presence, regardless of type.

3. No-Code or Low-Code Tools

There are some AI-powered DAWs and plugins that offer "auto-trim" or "remove silence" features based on voice detection, such as iZotope RX's "Voice Activity Detector" or Audacity's "Truncate Silence" function. These are easy to use but, again, do not specifically target singing voice.

4. Advanced (But Not Plug-and-Play) Solutions

Research in sung speech recognition (lyrics transcription) is ongoing and some deep learning models are being developed to distinguish sung speech from spoken speech9. However, these are not yet available as user-friendly apps and typically require programming and machine learning expertise to deploy.

Why Gates and Frequency Tricks Don’t Work Well

Using gates and frequency filtering is unreliable for this task because the frequency range of singing overlaps heavily with both spoken voice and some instruments, leading to a "messy" result as you described.

Summary Table
Method Keeps Singing Only Removes Spoken Parts Easy/No Code Fast/Automatic
Basic VAD (Silero, Cobra) No (keeps all voice) No Yes Yes
Source Separation + VAD No (keeps all voice) No Somewhat Somewhat
AI DAW Plugins (RX, Audacity) No (keeps all voice) No Yes Yes
Custom ML Models Potentially Potentially No No
Conclusion
Currently, there is no out-of-the-box tool that automatically detects and keeps only the singing voice while removing everything else, without also keeping spoken voice. The most practical workaround is to use a VAD tool to quickly trim non-voice sections, which at least speeds up the manual process. For a more precise solution (singing-only), you would need a custom-trained AI model, which is not yet available as a simple app for musicians9.

MHEO · Jun 10, 2025

Too complex, furthermore the online apps ask for microphone, can't load wav or mp3. I guess I'll keep doing it manually, thanks.

1_i_Pi · Jun 10, 2025

MHEO said: ↑

Looking for software that automatically analyzes a song and places markers (or creates regions/segments) at the start and end of sung passages. It doesn't exist. Easiest workarounds?
Click to expand...

Work takes time. That's why you get paid. Even for the monotonous tedious tasks (I absolutely hate comping a lot of the time). This incessant need to "optimize" absolutely every part of every process is/will be the undoing. You may say "this is such a small thing who cares" but we're at the point where all that counts. They're already beginning to try to make composition obsolete, audio engineering is right around the corner.

It literally has to be an all or nothing approach if any of us want jobs in 5 years. Yeah, AI is great for a lot of things, but i truly believe this is unfortunately an all or nothing kinda thing.

MHEO · Jun 11, 2025

1_i_Pi said: ↑

Work takes time. That's why you get paid. Even for the monotonous tedious tasks (I absolutely hate comping a lot of the time). This incessant need to "optimize" absolutely every part of every process is/will be the undoing. You may say "this is such a small thing who cares" but we're at the point where all that counts. They're already beginning to try to make composition obsolete, audio engineering is right around the corner.

It literally has to be an all or nothing approach if any of us want jobs in 5 years. Yeah, AI is great for a lot of things, but i truly believe this is unfortunately an all or nothing kinda thing.
Click to expand...

You are right, work takes time. Anyway, consider that I have to break up about two thousand songs, each of which contains about 50 or 60 sung phrases. That is why I asked if there was anything that could speed up the work. Peace.

P.S. Meanwhile I found a way, for those interested:
(1) Isolate vocals via stem separation;
(2) Automarker (markers take into account spaces of silence),
(3) Import CSV markers and apply on original audio;
(4) autotrim and export markers as audio regions.

This way you'll get more or less what you need.

Last edited: Jun 11, 2025

Similar Threads - Automatically place markers	Forum	Date
KXStudio Carla - how to automatically load up profiles?	DAW	Jun 25, 2024
How does one automatically export multiple sample CD clips little to no cutoff?	Working with Sound	May 18, 2024
Looking for a software that automatically exports FLP projects to MP3	FL Studio	Mar 23, 2024
Program to automatically update file name with key and bpm info	Software	Jan 27, 2024
Omnisphere 2 (k'd) automatically updated "You are up to date" (Windows 7)	Omnisphere	Oct 4, 2023

Automatically place markers in a song (workarounds?)

MHEO Ultrasonic

Ads Master

clone Audiosexual

MHEO Ultrasonic

PulseWave Audiosexual

MHEO Ultrasonic

1_i_Pi Member

MHEO Ultrasonic

PROFESSIONAL AUDIO LOVERS

Automatically place markers in a song (workarounds?)

MHEO Ultrasonic

Ads Master

clone Audiosexual

MHEO Ultrasonic

PulseWave Audiosexual

MHEO Ultrasonic

1_i_Pi Member

MHEO Ultrasonic

Useful Searches