![]() |
| [[ Home | Forums | 3D Engines Database | Wiki | Articles/Tutorials | Game Dev Jobs | IRC Chat Network | Contact Us ]] |
|
|
#1 |
|
Senior Member
Join Date: Mar 2006
Location: perth
Posts: 887
|
THE PHASE VOCODER
I dedicate this to Einstiens mum, ive never written any educational material before, so forgive me if its a little poorly explained in places. DESCRIPTION ----------- A phase vocoder rips the sound apart into its individual harmonics, which are sine waves, and they are the tiniest element that the ear can percieve as a separate frequency. Then it restores it as it was, and can be played like any other sound after its been converted back to the time domain, which is the usual domain for sound, amplitude over time. You can tell maybe 2hz apart no less than this and thats only near the bottom of the spectrum. This means, the "samples" you take from the audio signal to rip apart into individual harmonics are spaced apart similarly to how your ear can percieve difference in frequency, if you take enough samples, the spaces between dont matter to your ear, if you dont take enough samples you will hear an empty sound that sounds like its had noise reduction added to it. It sounds "jingly". That means if there is someone out there with very sensitive ears, the phase vocoder will sound more jingly to him that it does to you, hypothetically. The purpose of a phase vocoder is varied, once the sound has been ripped apart into sine waves, and before you go back to the time domain, there are a few things you could do to the sound, but its highly experimental and theres not much out there for the phase vocoder yet, because only recently have computers had enough power to do this real time. Possible purposes for a phase vocoder-> * Pitch transposition without altering time scale. * Time scaling without altering pitch. * Noise reduction * Harmonic chorusing * Speech recognition (its where the word vocoder comes from, "voice coder") QUALITY ------- The quality you get out of the phase vocoder goes from completely unintelligably without fault to poppy and crackly if you stuff it up, and it can also sound wooshy and blurry if its not set right also. On the wikipedia today, they say the phase vocoder blurs the sound and makes it sound crap, this isnt true for the model im explaining, the model im explaining doesnt suffer from this problem but it sounds a little ringy when you pitch up, but the transience is preserved perfectly. But the woosh you can get out of a poor phase vocoder can actually be used in techno as an effect. ![]() Im quite sure on an expensive piece of equipment the pitch scaler can be perfect, it is possible. METHOD ------ This method uses the fourier transform to get the phase and amplitude of each sine wave, then uses sine wave oscillators to restore the sound. The fourier transform takes segments of the sound at a time, (say 512 samples) and produces a batch of amplitudes and phases for each sine wave it found in that segment. These will be pumped into the sine wave oscillators on output of the phase vocoder, each oscillator must have correct readings at each sound segment or you wont get the exact same output as input. HOW THE FOURIER TRANSFORM WORKS ------------------------------- The transform works by taking a window of the sound, and for each sine wave that fits inside it you can get a phase and amplitude reading for. Only whole cycles can be subtracted out from it successfully, If the window size is 512, only cycle sizes 512 (512/1), 256 (512/2), 170 (512/3), 128 (512/4)... etc.. will fit inside the window approximately whole phased, so only these sine waves you can get out of it, but funnily enough these are the only sine waves you care about. Take a sine wave of the cycle size your wanting to get the amplitude and phase for, and multiply it with the signal starting at 0 degree phase and at 90 degree phase(a sine and cosine wave) at normalized amplitude. You may be thinking something funny now, that it works using ring modulation! And it does! Then sum up all the amplitudes across the segment. Then treat the sine wave as an x component, the cosine wave as a y component and the 2d vector you make, the length of it is the amplitude of the sine wave with a complex phase going from 0 - 2*PI and the phase is the angle of the vector. TWEAKING THE TRANSFORM ---------------------- Now you know its that easy to get fragments of sound, you still have to use the transform right for it to work properly. You must overlap your time segments you look at, this is because of how the transform only looks at a window of the sound with pops either side of it, if you dont overlap your looking segment (say 1024 samples) with your segment interval (say 512 samples) then you will get pops and crackle in your output because the phase readings will be wrong. PRECOMPUTATION -------------- If your using the FFT, (I only explained the ordinary transform) you could find all the amplitudes and phases realtime, but if your using the transform I explained, you precompute all the sine wave amplitudes and phases into a harmonic wave file, then you can avoid this step when playing back the file, and only render the oscillators, its all you have to do. OUTPUTING BACK TO THE TIME DOMAIN --------------------------------- Make a sine wave oscillator (optimize with a lookup table) for each harmonic, take the phase reading and amplitude reading, and every segment play the amplitude and phase reading for each oscillator, and you should have the exact sound playing out of the sine wave oscillators! The only problem is, from small errors in the transform you will hear slight pops and crackle in the sound. To remedy this, (and so the pitch transposition works) slide the amplitude from the last oscillator to the new oscillator, and give it a slight tremor in frequency so the phase matches up from the old phase to the new phase. And youve basicly got it now. PITCH TRANSPOSITION AND TIME SCALING ------------------------------------ Only pitch and time scaling is explained in this document, the rest of the uses for this device is up to you if you want to experiment with it. Changing the pitch is simple, Just alter the oscillators frequencies up or down and you will get instaniously pitch scaling on output keeping the time scale the same, and the "phase fix" will keep the phase from popping and the pitch will change. But note, you must have an interval size per oscillator large enough, for the phase fix to have enough room to pop the phase back to alter the pitch properly, or the frequencies will stick and not change and itll sound more like a phaser than vibrato. You can use a smaller segment size for your higher frequencies than lower frequencies, and this will improve transience, so you should do this. Changing the time scale can be implemented by simply doubling up the algorythm with a resample, pitch up and resample down and you will get a slower sound, pitch down then resample up and you will get a quicker sound. |
|
|
|
|
|
#2 |
|
Senior Member
Join Date: Sep 2005
Location: Jönköping, Sweden
Posts: 546
|
Interesting!
I couldn't parse this, though: "slide the amplitude from the last oscillator to the new oscillator"
___________________________________________
Topmost webbyrå i Jönköping - webbutveckling & design |
|
|
|
|
|
#3 |
|
New Member
Join Date: Oct 2009
Location: uk
Posts: 3
|
Nice description.I can clear so many doubts of mine with the help of your article.
Please keep sharing such articles in future.Thank you for this piece of information..
___________________________________________
acai berry |
|
|
|
|
|
#4 |
|
Senior Member
Join Date: Mar 2006
Location: perth
Posts: 887
|
![]()
___________________________________________
Do you say-> "C plus plus" or "C increment." |
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|