Journal

@SchuminWeb

Archives

Categories

Singing along to only the music…

7 minute read

May 9, 2025, 11:14 AM

Back in March, I made a Journal entry about the then-impending liquidation of the Hudson’s Bay Company, including the company’s flagship location at Yonge and Queen Streets in Toronto, which many people know as the store from Today’s Special.  As part of that entry, I ran the following slideshow, set to a song from the Today’s Special episode “Sleep”, where I covered the lyrics:

Since then, people have asked me how I managed to do that, since it’s the music without the original vocals, and TVOntario never released clean recordings of the music from the various songs.  As it turns out, I used an AI tool called Vocal Remover to remove the vocals, and it does a pretty good job at it.  It’s not perfect, but it does well enough to work.  It does its best work on songs that only have one singer, and it tends to get confused on songs where there are multiple voices doing different things.

Let me illustrate with some examples.  Here’s “I’m So Happy”, a song from the Today’s Special episode “Songs” that was sung by Muffy, as aired:

Then this is what the Vocal Remover tool put out:

This is the gold standard right here, where the tool put out a completely clean version of the music.  Muffy’s voice is completely gone, and all you have is the music.

Then the tool also outputs the vocals that it removed:

This shows a few imperfections in the tool’s functionality, because while it’s largely a clean rip of Muffy’s voice, you can hear a few bits of instrumentation here and there.  I consider that acceptable, because I view the vocal track largely as a byproduct of the creation of the music track, and to be honest, I’d rather have a few bits of music here and there in the vocal track than to have a few bits of vocals here and there in the music track.

Here’s another example of the tool at work using only one vocalist, using “Just Singing a Song” from the same episode, sung by Jeff.  Here’s the original, as aired:

Here’s the music:

And here’s the vocal track:

Just like before, the music is clean, while the vocal track has a few wisps of musical accompaniment here and there.  Again, acceptable, because I view the vocal track as a byproduct of the music with this treatment.  And by the way, if that song sounds familiar, I sang it in this space a few years ago, and I would love to know how you think I did compared to the original.

The tool then tends to have more difficulty when it’s working with multiple singers, particularly with different vocal ranges.  Here’s “New Shoes” from “Shoes“, which was sung by Sam and Jeff:

And here’s the music:

In this instance, we have two male voices, with Sam’s voice being lower than Jeff’s.  I noticed that when the two of them are singing together, it tends to leave a faint trace of Sam’s voice on the track when the two of them are singing together.  Additionally, you can hear Sam laugh at the twenty-second mark, during what is otherwise a solo by Jeff.  Presumably the tool didn’t recognize the different voice in what is otherwise a long stretch of just Jeff, and left it in.

That seems to be a common occurrence with multiple singers, where the lower of the two tends to leave ghosting on the track.  Here’s “I Love My Running Shoes”, which Jeff and Jodie performed, also in “Shoes”:

And here is the music:

If you listen carefully, you can pick up on little bits of both Jeff’s voice and Jodie’s voice on the music track, but more so Jeff’s than Jodie’s.  When one of them is singing by themselves, the tool can separate that cleanly, as we saw in the songs with only one singer.  But then when Jeff and Jodie are singing together, you can hear Jeff here and there.  That’s a trend that I noticed with the tool, that if it’s going to miss someone, it’s usually going to miss the lower voice.

I had initially thought that it was just a bias towards female voices that was causing it to miss Jeff and Sam, but it also did it to Jodie in “I Can Be Silly With You” from “Smiles“, which Jodie and Muffy performed:

And the music:

Here, Muffy, having the higher voice, is removed from the track completely, while it leaves traces of Jodie’s voice here and there, including a bit of laughter.  That confirms to me that it’s not so much a gender bias when it misses voices, but rather, it’s a pitch bias, that when there are multiple vocalists, it almost always tends to miss the lower of the voices.

Here’s “Something About a Hat” from “Hats“, which features Jeff, Jodie, and Sam singing about hats:

And the music:

While the tool did a very good job stripping out the vocals, particularly in the parts where they’re talking at the end and not singing, which is where I would have expected it to fail, you can still hear a few traces of Sam’s voice on the track.  However, you really have to listen carefully to hear Sam’s voice, so that’s not the worst thing in the world, especially compared to other songs.

Now compare that version to the reprise that they did later in the episode, which was performed by TXL:

Here’s TXL’s version with the vocals removed:

I admit that TXL’s version is more sterile overall, as it features TXL in voiceover while the cast is shown trying on different kinds of hats, with no interaction between the two.  But the tool did its job flawlessly, creating clean music.

Now, so far, all of this uses songs where the vocals are performed more or less straight, with no major special effects.  I suspect that this is what the tool was designed for, to simply separate vocals from music and output both.  When I tried the tool on “The Magic Bird” from “Tears“, which was mostly performed by Jodie, the results were a little bit different than I had hoped.  Here’s the original:

Note that Jodie’s voice has a little bit of reverb on it, which fits with the presentation, where the song is a voiceover, telling a story that is being acted out onscreen.

Here’s the music:

With the reverb in there, the tool caught Jodie’s voice, but it missed the reverb.  So while Jodie’s actual voice is gone, the reverb remains, leaving a ghosting of sorts.  Jeff’s part, at the end, has no reverb on it, and is removed nearly perfectly.

However, where it really falls apart is with the more complex songs, where different cast members are singing different things.  The best example of this is the ending number in “Songs”, where Jeff, Jodie, Sam, and Muffy all sing their various songs together, and the result as aired is pretty solid:

Here’s the music:

The beginning, where Sam has a solo of his own song, is the only part that is clean.  Once you get multiple people in the mix, all singing different things, it all really starts to fall apart, as the tool really doesn’t know what to remove, and tends to move around a bit in how successful it is in removing each voice.  Unsurprisingly, Sam, having the lowest voice, tends to be missed the most, and appears in the music track the most often.  I was a bit surprised, though, that the voice that it was most successful at removing, notwithstanding Sam’s solo, was Jeff’s, as I would have expected it to remove Muffy’s voice most successfully, with hers being the highest.  Jeff is audible on the track, but you really have to listen carefully for him, while the others can easily be picked out.

Then the vocal track, as expected, mirrors the music track exactly:

While you can definitely follow along with all of the different characters’ parts, especially with the music removed, notice that Sam’s voice cuts in and out a bit as the tool misses his voice when separating the music from the vocals, and includes it with the music rather than the vocals.  I suspect that this song is where the tool meets its match, but at the same time, no AI tool is perfect, as my various other discussions about artificial intelligence have borne out.  In other words, there is always room for improvement, and I imagine that like the others, this tool will improve as time goes on.

Meanwhile, this particular tool is one that I’ve had a lot of fun with.  After all, this is Clive and the Cowboys, which is one of my favorite musical groups.  In typical form, I first discovered this tool at 2 AM on a Sunday night, and I really got into it, which led to Elyse’s yelling up to me to stop.  Then there was another occasion where I brought the music into the bathroom with me while I took a shower.  See, “The Rainmakers”, the opera from “A Visit to the Opera“, being about eight and a half minutes long, is a good length for singing in the shower, and I’ve sung it a cappella plenty of times.  So this time, I brought my phone in with me, set it down nearby, and queued up “The Rainmakers”.  So I’m singing while I’m showering, and then the door opens.  It’s Elyse, who turned off my music and then left (she doesn’t like hearing my singing).  Okay, then.  After Elyse left, I reached out, turned the music back on, and resumed.  After all, it can’t rain as long as there’s still all of that junk in Thunder’s trumpet, and I hadn’t gotten that far in the song yet.  So, ya know, gotta finish and all because it can’t rain without Thunder.  So a couple of minutes later, the door opened again, and in came Elyse.  This time, not only did she turn my music off, but she also took my phone with her so that I couldn’t turn it back on.  Clearly, someone had no appreciation for a classic song.  But I wasn’t deterred.  I was like, the joke’s on you, because I know the whole thing by heart, and then finished it a cappella.  And there was rain all throughout the land.

That song is also one where Vocal Remover helped me out.  There was one part of the song towards the end, where Lightning finally does his dance after Thunder is able to roar, where I was never quite able to understand the lyrics.  Once I was able to strip away the music, I was able to make this part out, so now I can sing “The Rainmakers” flawlessly, in its entirety.  There are still a few other songs where I’m missing lyrics, but finally getting “The Rainmakers” right was amazing.

Right now, if you don’t mind, I’m going to go sing some of my favorite songs with the music.

Leave a Reply