Why AI strugle for theory voicing in song analysis?

Discussion in 'Ai for Music' started by Sacculus, Apr 8, 2026 at 1:38 AM.

  1. Sacculus

    Sacculus Kapellmeister

    Joined:
    Oct 22, 2017
    Messages:
    230
    Likes Received:
    56
    I try to figure out PYT of Michael Jackson to look at the exact Greg Phillinganes Rhodes chords and the voicing but none of ChatGPT, Gemini, Grok, hook theory and YouTube piano tutorials have the same insight for the answer!?
    It seems difficult to trust someone if you don't have very good ears...
     
  2.  
  3. Djord Emer

    Djord Emer Audiosexual

    Joined:
    Sep 12, 2021
    Messages:
    1,276
    Likes Received:
    1,079
    Location:
    Taured
    They're all still very bad a that, yeah. I'm not sure why but I'm dead sure I wouldn't trust them for anything music theory related... maybe music history.
     
  4. macros mk2

    macros mk2 Audiosexual

    Joined:
    Sep 22, 2022
    Messages:
    617
    Likes Received:
    547
    Location:
    seattle
    a LLM probably thinks Beethoven is a Saint Bernard, then will tell you're special for asking.
     
  5. flier0244

    flier0244 Member

    Joined:
    Jan 25, 2026
    Messages:
    28
    Likes Received:
    11
    Can you try splitting stems, then loading into Melodyne?

    Try splitting stems, then feed to AI?
     
  6. Lois Lane

    Lois Lane Audiosexual

    Joined:
    Jan 16, 2019
    Messages:
    6,279
    Likes Received:
    6,263
    Location:
    Somewhere Over The Rainbow
    Perhaps AI LLMs struggle
    For theory voicing in a song
    Because...
    It is not actually down to
    Dry mathematical formulae, and
    Even though they cannibalize
    The physical notes of musicians
    They can never digest our souls.
     
    Last edited: Apr 8, 2026 at 1:43 PM
    • Love it! Love it! x 3
    • Like Like x 2
    • Creative Creative x 1
    • List
  7. orbitbooster

    orbitbooster Audiosexual

    Joined:
    Jan 8, 2018
    Messages:
    1,405
    Likes Received:
    774
    Well, I'll be a bit off topic, but I don't trust much tools like melodyne and similar, they get the basic line, but nuances are hard to get, and I don't ask help to AI for that matter.

    However, i must say I was astonished by a software from Aurally Sound - Prism.
    I thought was yet another trascribe tool comparable to all the others, but the results I got with it were jaw-dropping.
    I played a pretty complex piano piece, and it mostly got it all right, beside some sustain pedal passage that could shuffle notes.
    If I can i'll post original and the reconstruction.

    This tool is only for solo instruments.
     
  8. PulseWave

    PulseWave Audiosexual

    Joined:
    May 4, 2025
    Messages:
    4,983
    Likes Received:
    2,923
    Exceptionally, here is the AI answer:
    What’s tripping you up is that PYT exists in more than one musical “version”: the released Thriller track, the earlier Michael/Greg demo, and then transcription guesses that simplify the keyboard part into generic chords. The demo is credited to Michael Jackson and Greg Phillinganes, while the released song was reworked by Quincy Jones and James Ingram, so the exact Rhodes voicings people share online often don’t match each other.

    Why the answers disagree
    Most tutorial sites reduce the part to the harmonic shell, not the actual voicing. For example, one common chord transcription gives the song as B7, Em7, G, A, which may describe the progression but not the exact Rhodes shape or inversion. Another analysis notes the song is harmonically more complex than average and centers around B Dorian / B minor regions, which also helps explain why a simple four-chord answer can feel incomplete.

    What the Rhodes part likely is
    The Rhodes comping in PYT is best understood as stacked voicings and voice-leading, not block chords copied from a chart. In practice, that usually means:

    left hand holding root or rootless support,
    right hand using 3rds, 7ths, 9ths, and sometimes sus tones,
    smooth movement between shapes instead of big jumps.
    So when people say “it’s B7 to Em7,” they may be naming the harmony correctly while missing the actual keyboard color that makes Greg’s part sound right.

    How to get closer to the real voicing

    A reliable way to reconstruct it is:
    Identify the bass note at each hit.
    Listen for the guide tones first, especially 3rd and 7th.
    Add color tones like 9ths or 11ths only after the shell works.
    Match the comping rhythm before worrying about note-for-note density.
    That approach usually gets closer to the original than copying a piano tutorial with simplified chord names. The reason is that Rhodes parts in records like this are often arranged as voiced textures, not “lead-sheet chords”.

    Practical takeaway
    If your goal is the exact Greg Phillinganes feel, you probably need to treat the part as a transcription job, not a chord-chart job. Start from the progression B7 / Em7 / G / A as a harmonic skeleton, then build a voicing-by-voicing ear transcription over the recording, because the missing detail is in the inversion, spacing, and rhythmic placement rather than just the chord symbols.

    A useful next step is to isolate one short phrase of the song and transcribe only the Rhodes attack points; that will reveal the actual voicing much faster than trying to decode the whole track at once.
     
  9. ArticStorm

    ArticStorm Moderator Staff Member

    Joined:
    Jun 7, 2011
    Messages:
    9,080
    Likes Received:
    4,769
    Location:
    AudioSexPro
    Usually like i noticed, there are tracks who dont really follow traditional music theory. Its the uncommon, surprising ideas, which sound fresh to our ears. Thats what makes human creativity surperior to thes models still.

    You can however use basics from the models and then figure out your own contribution to the idea ...
     
  10. orbitbooster

    orbitbooster Audiosexual

    Joined:
    Jan 8, 2018
    Messages:
    1,405
    Likes Received:
    774
    That's why most automatic tools fail in giving the full transcription (but Prism was near).
    That BTW are roughly the steps I do to transcribe (not always sorted this way):
    1 - get the bass, so just from the I and V you'll get the base scale (really!), no matter what type is;
    2 - guess if major or minor;
    3 - guess if a modal scale (not very important, you'll get it in step 4 and 5);
    4 - guess the melody (usually just monophonically);
    5 - guess basic triad chords playing along the bass (like Am, C, etc.);
    6 - guess the chord inversions if any (in most cases they are);
    7 - guess if there are 7th, 9th, 11th, sus, and so on;
    8 - arrange with the right voicing (arpeggio, etc.) and add harmony to the melody if any.

    For all these steps, unless you want to hit the keyboard hoping that chance is with you, you need at least some basics in music theory aka harmony theory.

    Edit: I forgot: there are times that songs modulate to other scales, that complicates a little, but not that much.
     
    • Like Like x 1
    • Agree Agree x 1
    • List
  11. Crinklebumps

    Crinklebumps Audiosexual

    Joined:
    Nov 1, 2017
    Messages:
    1,266
    Likes Received:
    905
    Location:
    UK
    AI gives better responses with specific, detailed questions. If you can break down what you're hearing and convey what you think is being played in words it will get you closer, just keep hammering it with questions, telling it that it's wrong and it will keep analysing. I'm talking about a chat AI but in my experience they're very good at this kind of interaction. For example, AI will be able to give you a great deal of insight into Greg Phillinganes style, usual choices. It can't analyse audio but it doesn't need to, it already 'knows' the song structure of PYT and the exact chords Greg plays - eg., (the following is Copilot AI):

    Here’s a clean, accurate, musician‑level breakdown of Greg Phillinganes’ style — the kind of insight an AI can give, because this is pattern‑based, not audio‑based.

    I’ll structure it the way a session player would explain another session player.


    Greg Phillinganes: Core Style Traits
    1. Ultra‑clean, ultra‑precise voicings
    Greg’s Rhodes and synth parts are surgically tidy.
    He avoids muddy clusters and prefers:

    • 3‑note shells (rootless when the bass is covered)
    • tight 4‑note voicings with a clear top voice
    • spread voicings with 10ths or 6ths in the left hand
    • upper‑structure triads for colour (especially over dominant chords)
    He plays like someone who knows the exact harmonic function of every note.

    2. Pop‑funk economy
    Greg never overplays.
    He’s the opposite of a gospel‑chops keyboardist.

    His parts are:
    • short
    • rhythmic
    • intentional
    • supportive
    He plays hooks, not “keyboard parts”.

    Listen to any MJ track he’s on — the Rhodes is basically a rhythmic instrument.

    3. The “Phillinganes snap” on Rhodes
    He has a very recognisable Rhodes attack:

    • slightly harder velocity
    • bright tine
    • tight envelope
    • almost clav‑like articulation
    He often accents the off‑beats with a little extra bite.

    This is why his Rhodes parts cut through dense mixes without being loud.

    4. Syncopation that locks with the drums
    Greg plays with the drummer, not over them.

    Typical traits:

    • anticipations on the “e” or “a”
    • ghosted left‑hand notes
    • right‑hand stabs that mirror the guitar
    • rhythmic displacement (he loves pushing a chord 1/16th early)
    On PYT specifically, the Rhodes is basically part of the percussion section.

    5. He outlines harmony with minimal notes
    Greg often implies full chords with:

    • 3‑note voicings
    • dyads
    • octave + colour tone
    • triads over alternate bass notes
    He trusts the bass player and the arrangement.

    This is why his parts feel so clean.

    6. Signature harmonic tendencies
    He gravitates toward:

    • major 9
    • minor 11
    • dominant 13
    • sus2 / sus4
    • add9
    • quartal voicings in passing
    • chromatic approach chords (very subtle)
    He rarely plays plain triads unless the arrangement demands it.

    7. Synth parts that are “keyboardist clean,” not “sound designer messy”
    Greg’s synth lines are:

    • monophonic
    • precise
    • hook‑driven
    • rhythmically tight
    • harmonically simple but placed perfectly
    He doesn’t noodle.
    He plays statements.

    How this applies to PYT
    Without analysing the audio directly, we can still say:

    • the Rhodes part is tight, syncopated, and minimal
    • the voicings are clean, upper‑mid register, no mud
    • he uses add9 and 6/9 colours
    • the left hand is light, mostly outlining roots or 10ths
    • the right hand plays stabs, not pads
    • the part is locked to the drums, not floating
    This is classic Greg.

    If you want, I can go deeper
    I can break down:

    • typical Greg Phillinganes voicing shapes
    • his left‑hand habits
    • his right‑hand rhythmic vocabulary
    • how he approaches MJ‑era Rhodes comping
    • what he tends to do on dominant chords
    • how he voices minor chords in pop‑funk
    Or I can give you a Greg‑style voicing dictionary you can use to reverse‑engineer PYT.

    Just tell me which direction you want to go.
     
    • Like Like x 1
    • Agree Agree x 1
    • List
  12. orbitbooster

    orbitbooster Audiosexual

    Joined:
    Jan 8, 2018
    Messages:
    1,405
    Likes Received:
    774
    But that requires too a knowledge of (basic or advanced - it depends) music theory.
    I guess many want the supper ready instead.
     
    Last edited: Apr 8, 2026 at 12:21 PM
  13. Gre89

    Gre89 Member

    Joined:
    Mar 28, 2018
    Messages:
    38
    Likes Received:
    11
    Try to get the original studio stems. Maybe they are even on YT or somewhere. Use surgical EQs for further analysis of the track if needed.
     
  14. jazzzz

    jazzzz Platinum Record

    Joined:
    Mar 13, 2018
    Messages:
    745
    Likes Received:
    253
    Why don't you ask Greg?
     
  15. Sacculus

    Sacculus Kapellmeister

    Joined:
    Oct 22, 2017
    Messages:
    230
    Likes Received:
    56
    This is what Grok give me but it didn't make sense... There's errors everywhere!!??

    1. Main voicings by section (recommended for piano)
    Here are the voicings that most closely match the original sound, adapted for piano (right hand + bass in the left hand or with the pedal). The notes are listed from bass to treble (left to right).
    Intro / Vamp (before the verse)
    Classic progression:
    Bm7 → E7 → Bm7 → F#7sus4 (or variations)
    Bm7: B (bass) – F# – A – D – E (open voicing: B-F#-A-D-E)
    (very common: left hand B-F#, right hand A-D-E)
    E7: E (bass) – G# – B – D – F# (E7(9))
    F#7sus4: F# – B – C# – E – A (highly suspended, creates tension)
    Verse – the most iconic
    Basic progression (Hooktheory + transcriptions):
    Bm → Gmaj7 → Bm (with variations)
    Typical Rhodes voicings by Greg Phillinganes (bright, percussive):
    Bm9 (frequently used instead of plain Bm):
    Left hand: B – F#
    Right hand: A – C# – D – E → full voicing: B – F# – A – C# – D – E (Bm9)
    Gmaj7 / A (very common slash chord):
    Bass: A
    Chords: G – B – D – F# (or G – B – D – F# – A for Gmaj9/A)
    C#m7 / G# (pre-chorus):
    Bass: G#
    Chords: B – D – E – G# (C#m7/G#)
    Bm7 / F#:
    Bass: F#
    Chords: A – B – D – E (very soft and open)
    These voicings are often played with the right hand fairly high (4th–5th octave) to leave room for Louis Johnson’s slap bass and the handclaps.
    Pre-Chorus
    Ascending, tense progression:
    Dmaj7/A → C#m7/G# → Bm7/F# → A(add9)
    Recommended voicings:
    Dmaj7/A: A (bass) – C# – D – F# – A – C# (very soft)
     
  16. clone

    clone Audiosexual

    Joined:
    Feb 5, 2021
    Messages:
    10,233
    Likes Received:
    4,419
    I wanted to see how GPT would answer your larger question. Would it blame the user, the song, or itself?

    Short answer: the model isn’t “hearing” what you’re hearing—it’s guessing from patterns, and this kind of question punishes guessing hard.

    Let’s use P.Y.T. (Pretty Young Thing) as the example. Those Greg Phillinganes Rhodes parts are exactly the kind of thing AI struggles with, for a few concrete reasons:

    First problem: voicing ≠ chord name
    Most systems are trained to identify chords (Cmaj7, F#min9, etc.), not how they’re voiced. But in real playing—especially Rhodes/funk/pop stuff—the voicing is everything:

    • spread vs closed
    • missing roots
    • extensions on top
    • inversions that don’t match the bass
    Two players can both be “playing Cmaj9” and sound completely different. AI tends to collapse that into one label and call it done.

    Second: dense, layered recordings
    Tracks like PYT aren’t a clean piano recording. You’ve got:

    • Rhodes
    • synth layers
    • bass implying different harmony
    • vocal harmonies adding extensions
    So what is “the chord”?
    AI (and even humans) can legitimately disagree because the harmony is distributed across instruments.

    Third: training data problem
    Most AI models learned theory from:

    • lead sheets
    • MIDI files
    • simplified transcriptions
    Not from:

    • isolated multitracks
    • pro session voicings
    • nuanced studio performances
    So when you ask for exact Greg Phillinganes voicings, you’re asking for something that barely exists in its training data.

    Fourth: no true audio grounding (in many cases)
    Unless the system is explicitly doing high-quality transcription from audio (and most aren’t doing that deeply), it’s not:

    • isolating the Rhodes
    • tracking exact note stacks
    • resolving inversions precisely
    It’s pattern-matching based on “this song is usually described as X harmony.”

    Fifth: even humans disagree on this stuff
    If you pull 5 legit transcriptions of a complex pop/R&B track, you’ll often get:

    • different chord labels
    • different voicings
    • different interpretations of extensions
    So when AI gives inconsistent answers, it’s not just failing—it’s reflecting a real ambiguity.

    Here’s the blunt truth:
    If you want exact voicings like what Phillinganes played, AI is currently the wrong tool.

    What actually works better:

    • slow the track down (Transcribe!, Logic Flex, etc.)
    • isolate midrange (EQ out bass/kick)
    • play along and lock the top notes first
    • then fill in inner voices
    Or go even more direct:

    • find live footage of Greg Phillinganes playing
    • or trusted transcribers who specialize in R&B/jazz voicings
    If you want, I can take a specific section of PYT (like verse or chorus) and give you a realistic Rhodes-style voicing breakdown instead of generic chord names. That’s where this actually becomes useful.


    *]:pointer-events-auto scroll-mt-(--header-height)" dir="auto" data-turn-id="a07197c0-ddbc-4221-9209-4211d16a44ee" data-testid="conversation-turn-3" data-scroll-anchor="false" data-turn="user">
    ok

    *]:pointer-events-auto scroll-mt-[calc(var(--header-height)+min(200px,max(70px,20svh)))]" dir="auto" data-turn-id="request-WEB:d85d4f80-ad94-464c-810a-bb8f4236471a-1" data-testid="conversation-turn-4" data-scroll-anchor="true" data-turn="assistant">
    Alright—let’s actually do this properly instead of hand-waving.

    We’ll break down a realistic Rhodes-style voicing approach for the main groove in P.Y.T. (Pretty Young Thing), based on how someone like Greg Phillinganes actually plays—not just chord labels.

    I’m going to focus on the core vamp (verse groove feel) because that’s where the signature voicing lives.

    What’s really going on (important mindset)
    Forget “what are the chords” for a second.
    This part is built on:

    • upper-structure triads
    • rootless voicings
    • tight top-note movement
    • bass defining the harmony more than the keyboard
    That’s why AI falls apart—it tries to label instead of hear structure.

    Core harmonic idea (simplified)
    You can think of the groove roughly orbiting:

    • B major / G# minor tonal space
    But the Rhodes is not playing full chords. It’s playing partial stacks that imply them.

    Rhodes voicing approach (this is the useful part)
    Shape 1 (home color)
    Right hand:

    • D# – F# – A#
    That’s basically a B major triad in 1st inversion (no root)

    Left hand:

    • either nothing OR light B (often implied by bass instead)
    Already this tells you something:
    He’s not hammering roots. He’s staying out of the bass.

    Movement (this is the signature sound)
    Now shift that shape down slightly while keeping it smooth:

    From:

    • D# – F# – A#
    To:

    • C# – F# – G#
    Now you’ve got:

    • a suspended / add9-ish flavor
    Notice:

    • top note drops (A# → G#)
    • middle stays (F# anchor)
    • bottom moves (D# → C#)
    That kind of minimal motion voice-leading is classic.

    Funk/Rhodes trick he’s using
    Instead of big chords, it’s:

    • 3-note clusters
    • often in the mid register
    • with rhythmic stabs, not sustained pads
    And very important:

    • slight timing looseness
    • not perfectly quantized (huge for feel)
    What AI usually gets wrong here
    AI will say something like:

    • “Bmaj7 → C#m7 → F#sus”
    That’s not wrong, but it’s useless if you’re trying to play it.

    Because the actual playable reality is:

    • small moving shapes
    • not full chords
    • bass + Rhodes = complete harmony
    If you want to actually play it
    Start here (don’t overcomplicate):

    Right hand loop:

    • D#–F#–A#
      → C#–F#–G#
      → back
    Keep it tight, rhythmic, and slightly detached.

    Then:

    • let your left hand barely touch roots (or skip entirely)
    • imagine the bass is doing the heavy lifting
    The honest takeaway
    You weren’t wrong to be frustrated.

    The question you saw is basically asking:

    “Why can’t AI reconstruct a studio musician’s nuanced voicing choices inside a dense mix?”

    That’s one of the hardest music tasks there is—even for trained humans.

    If you want, next step I’d suggest is:

    • we zoom into one exact timestamp (like 0:30–0:45)
    • I’ll map the voicings tighter, note-for-note, including octave placement
    That’s how you actually crack these tracks instead of chasing vague chord charts.
     
  17. orbitbooster

    orbitbooster Audiosexual

    Joined:
    Jan 8, 2018
    Messages:
    1,405
    Likes Received:
    774
    Nailed it!
    Nailed it!
    Nailed twice!

    BTW there is another solution: get the score sheet if available. But that calls back again staff reading that is part of the music theory.

    One last point, something that I find uncomfortable, and that's our personal limits: there are pieces that no matter our knowledge or skill, are out of reach within a lifespan.
     
  18. Lois Lane

    Lois Lane Audiosexual

    Joined:
    Jan 16, 2019
    Messages:
    6,279
    Likes Received:
    6,263
    Location:
    Somewhere Over The Rainbow
    Absolutely!! I await a future lifetime when I shall learn the saxophone!!! :guru:
     
Loading...
Similar Threads - strugle theory voicing Forum Date
Applied Compression Theory explained to dummies. Mixing and Mastering Mar 27, 2026
Soundtheory Kraftur Software News Jun 18, 2024
Games on PC to learn music theory and/or keyboard? Education Feb 16, 2024
For Sale - Oeksound, Soundtheory, Soundtoys, Mixwave, Submission Audio Selling / Buying Dec 19, 2023
UVI releases NOCTUA by Venus Theory - FREE! Software News Dec 14, 2023
Loading...