How to get raw sample frequencies
anto80

Hello all,

I'm trying to get all frequencies of a sound sample (22 kbps Mono).
That is to say, if I break down the sample in pieces of (let's say) 0.01 secs the sound frequency of each piece.
(This has nothing to do with al_get_sample_frequency, which returns 22kbps.)

So to do this, I was looking at void *al_get_sample_data(const ALLEGRO_SAMPLE *spl)

ALLEGRO_SAMPLE* pSample = /* ...the loaded wav file*/ ;
int nDataLength = al_get_sample_length(pSample);
int16_t* ptr = (int16_t*)al_get_sample_data(pSample);

In my case, nDataLength returns 47936 (My sample is 1.780 sec long)
However if I look precisely at the ptr memory, the int16_t values are as follows

ptr[ 0] =  3
ptr[ 1] =  0
ptr[ 2] = -1
ptr[ 3] =  1
ptr[ 4] =  0
ptr[ 5] =  0
ptr[ 6] =  0
ptr[ 7] =  2
ptr[ 8] =  0
ptr[ 9] =  1
ptr[10] = -1
ptr[11] =  0
ptr[12] =  2
ptr[13] =  2
ptr[14] =  1
ptr[15] =  1
ptr[16] = -1
ptr[17] = -2
ptr[18] =  0
ptr[19] = -1
ptr[20] =  0
... ptr[10000] = -201
... ptr[20000]	= -346
... ptr[30000]	= -226
... ptr[40000]	= -91
... ptr[47935] = 34

...It does not seems to be the raw frequencies of the sample pieces.
So what are these values? (especially negative values?)
Is there a way to get small parts frequencies?

al_get_sample_data documentation is somewhat cryptic here... It says "Return a pointer to the raw sample data."

Thank you

Arthur Kalliokoski

I believe you want to use Fast Fourier Transform on the sample data.

anto80

It seems that fft is the right way to go, but is it possible to do fft with Allegro 5 or C++?

Another question is: Why nDataLength = 47936 since my sample is 1.780 sec long? 1.780 * 22050 = 39249 (not 47936). What does ptr[] contain?

Thanks

Arthur Kalliokoski
anto80 said:

Why nDataLength = 47936 since my sample is 1.780 sec long?

I don't know.

The ptr[] evidently contains int16_t values, which are signed short ints.

I was fiddling with a guitar tuner last summer, but I've reinstalled the OS to a bigger hard drive and haven't reinstalled Allegro. Give me an hour and I'll see if I can give you an example program that allows you to input sounds through the microphone and see how the "main frequency" and the sample distributions change (also known as timbre).

[EDIT] Well, the last announcement for a WIP in Allegro Development tried to give me allegro-5.0.11 something, which I already had from 2014, so I tried allegro-5.1.8. I can't get this stuff to compile, haven't used it so long I'm a n00b again too.

I think this zip file has the source for some sort of FFT thing, IIRC it tried to print the nearest note of the predominant frequency and how much it was off by. It uses OpenGL, sorry. You may be able to get some relevant bits out of it.

As for how FFT works, I have some totally wild guesses.

<total n00bishness follows>
The FFT thing works by making a sine wave of some given frequency and compares it to the samples to see how well it follows the curve. A good fit results in a large value, so the "bin" for that frequency gets the high value. It then tries higher and higher frequencies to see how well they fit in turn and stores those values in their respective bins. And for low values too, obviously. However, since the generated sine wave starts from a zero point, and the samples waveforms start from a random point, it would most likely guess wrong unless the sample could be forced to start from the zero point of a waveform. This is about as difficult in itself, so what the FFT does is make a "window" to copy the sample to, and reduces the frequencies toward both ends so it "looks" like it starts from a zero point. OTOH, this "window" necessarily averages quite a few sample points to fit one bin, so eventually you have to compromise between accuracy and speed. Also, I got stuck for awhile on the fact that the display of the "bins" seemed to produce a useless S shaped curve, but what I didn't realize was that it was symmetrical and I only needed to examine the very first few "bins" at the beginning to get something reasonable. At least, that's my memory of what I thought last summer.

Note: edited the above again

anto80

Thank you for your time and trouble...

(Well, I can't compile 5.1 either; I'm sticking to 5.0.10 for now...)

If I understand your example correctly, it seems that you calculates volume (
magnitudes[i] variable) using root mean square (average of sqrt(v[i]*v[i]))
It seems to be like in this other thread https://www.allegro.cc/forums/thread/610890 (see SiegeLord's message)

However I don't see the relation between the array of instant volumes and an array of frequencies I'm searching...

Arthur Kalliokoski

The vertical bouncing green bars on his graph each represent a small range of frequencies, the "bins" I speak of above. It's like the histogram thing you might see on an audio player.

anto80

Thanks again.

Yes I know what FFT is.
(Transforming "Time domain" overlapping cosine-like waveforms into a set of "Frequency domain" values.)

But, to do this, I think I must first know what I get from al_get_sample_data.
If I can't figure what it is (seems that the length of the array does not match...) I won't be able to perform any math.

Kitty Cat
anto80 said:

But, to do this, I think I must first know what I get from al_get_sample_data.

Most likely it's the amplitude. Essentially it relates to the electrical signal being sent to the speaker, but more basically, it determines the physical position the speaker should have at that point in time; 0 is centered ("at rest"), negative values pull in, and positive values push out. By constantly oscillating in and out, the speaker compresses the air and creates sound waves.

Why the array is longer than the sound length, I can't say. Are you sure the sample is exactly 1.780 seconds long, and not 1.780 seconds of sound followed by a bit of silence?

Arthur Kalliokoski
Kitty Cat said:

it determines the physical position the speaker should have at that point in time; 0 is centered ("at rest"), negative values pull in, and positive values push out. By constantly oscillating in and out, the speaker compresses the air and creates sound waves.

This ^^

If you were to grab those signed 16 bit ints and draw the values on the screen in the x direction with increasing index values (scaling appropriately so they'll fit) then you'd see an oscilloscope-like pattern on the screen. This would be the time domain mentioned in the video.

anto80

Thank you for the further explanation!

Now I still need to figure out how to FFT the small chunks of my waveform into sets of "frequency windows bins", each chunk giving a set of bins(an amplitude value for each frequency window)

I have solved my problem with the following code, thanks to Arthur's example.
I'm letting all commented code to see the difference with the original code.

At the end I get frequencies magnitudes in magnitudes[i].
The frequencies are output on screen (1024x800)
The only thing I don't know is the frequency scale, that is to say, what is the frequency window of "magnitudes[i]"... but I'll try to find out
Thanks again.

1#include <stdio.h> 2#include <allegro5/allegro.h> 3#include <allegro5/allegro_audio.h> 4#include <allegro5/allegro_primitives.h> 5#include <allegro5/allegro_acodec.h> 6#include "kiss_fft.h" 7#include "kiss_fftr.h" 8 9#define ASSERT(a) assert(a) 10#define MIN(a,b) (a<b)?a:b 11#define MAX(a,b) (a>b)?a:b 12 13 14 15/* Comment out the following line to use 16-bit audio */ 16#define WANT_8_BIT_DEPTH 17 18#define SCRW 1024 19#define SCRH 800 20 21#define BUFFSIZE (SCRW*8) 22 23#define CHUNK_SIZE 512 24 25const ALLEGRO_AUDIO_DEPTH audio_depth = ALLEGRO_AUDIO_DEPTH_UINT8; 26//typedef uint8_t* audio_buffer_t; 27//const uint8_t sample_center = 128; 28const short min_sample_val = -32768; //const int8_t min_sample_val = 0x80; 29const short max_sample_val = 32767;//const int8_t max_sample_val = 0x7f; 30const int sample_range = 0xFFFF; //const int sample_range = 0xff; 31//const int sample_size = 1; 32 33const unsigned int samples_per_fragment = BUFFSIZE; 34 35const unsigned int frequency = 44100; 36const unsigned int max_seconds_to_record = 60 * 5; 37 38const unsigned int playback_fragment_count = 4; 39const unsigned int playback_samples_per_fragment = 4096; 40 41double cardfreq = 44100; 42 43float unbiased[BUFFSIZE]; 44float array[BUFFSIZE]; 45float magnitudes[BUFFSIZE]; 46//float buf[BUFFSIZE]; //used to rebuild time from frequency 47 48kiss_fft_cpx* copycpx(float *mat, int nframe) 49{ 50 int i; 51 kiss_fft_cpx *mat2; 52 mat2=(kiss_fft_cpx*)KISS_FFT_MALLOC(sizeof(kiss_fft_cpx)*nframe); 53 kiss_fft_scalar zero; 54 memset(&zero,0,sizeof(zero) ); 55 for(i=0; i<nframe ; i++) 56 { 57 mat2[i].r = mat[i]; 58 mat2[i].i = zero; 59 } 60 return mat2; 61} 62 63int main(int argc, const char **argv) 64{ 65 //ALLEGRO_AUDIO_RECORDER *r; 66 //ALLEGRO_AUDIO_STREAM *s; 67 68 ALLEGRO_EVENT_QUEUE *q; 69 ALLEGRO_DISPLAY *d; 70 float maxmag; 71 float ultimatemag = 0.0; 72 float sum; 73 74 int prev = 0; 75 76 al_init(); 77 78 if (!al_init_primitives_addon()) 79 { 80 fprintf(stderr,"Unable to initialize primitives addon\n"); 81 } 82 83 if (!al_install_keyboard()) 84 { 85 fprintf(stderr,"Unable to install keyboard\n"); 86 } 87 88 if (!al_install_audio()) 89 { 90 fprintf(stderr,"Unable to initialize audio addon\n"); 91 } 92 93 if (!al_init_acodec_addon()) 94 { 95 fprintf(stderr,"Unable to initialize acodec addon\n"); 96 } 97 98 /* Note: increasing the number of channels will break this demo. Other 99 * settings can be changed by modifying the constants at the top of the 100 * file. 101 */ 102 // r = al_create_audio_recorder(1000, samples_per_fragment, frequency, 103 // audio_depth, ALLEGRO_CHANNEL_CONF_1); 104 //if (!r) 105 //{ 106 // fprintf(stderr,"Unable to create audio recorder\n"); 107 //} 108 109 // s = al_create_audio_stream(playback_fragment_count, 110 // playback_samples_per_fragment, frequency, audio_depth, 111 // ALLEGRO_CHANNEL_CONF_1); 112 //if (!s) 113 //{ 114 // fprintf(stderr,"Unable to create audio stream\n"); 115 //} 116 117 al_reserve_samples(0); 118 //al_set_audio_stream_playing(s, false); 119 //al_attach_audio_stream_to_mixer(s, al_get_default_mixer()); 120 121 122 ALLEGRO_SAMPLE* pSample = al_load_sample("E:\\devt\\mywork\\FFT\\FFT\\test.wav"); 123 int nSampleLength = al_get_sample_length(pSample); 124 int16_t* ptr = (int16_t*)al_get_sample_data(pSample); 125 126 ALLEGRO_SAMPLE_INSTANCE* pSampleInstance = al_create_sample_instance(pSample); 127 ASSERT(pSampleInstance != NULL); 128 129 bool bRet = al_attach_sample_instance_to_mixer(pSampleInstance, al_get_default_mixer()); 130 131 132 133 q = al_create_event_queue(); 134 135 /* Note: the following two options are referring to pixel samples, and have 136 * nothing to do with audio samples. */ 137 al_set_new_display_flags(ALLEGRO_WINDOWED|ALLEGRO_RESIZABLE); 138 139 al_set_new_display_option(ALLEGRO_SAMPLE_BUFFERS, 1, ALLEGRO_SUGGEST); 140 al_set_new_display_option(ALLEGRO_SAMPLES, 8, ALLEGRO_SUGGEST); 141 142 d = al_create_display(SCRW, SCRH); 143 144 //al_register_event_source(q, al_get_audio_recorder_event_source(r)); 145 //al_register_event_source(q, al_get_audio_stream_event_source(s)); 146 al_register_event_source(q, al_get_display_event_source(d)); 147 al_register_event_source(q, al_get_keyboard_event_source()); 148 149 //create necessary fft buffers 150 kiss_fft_cpx out_cpx[BUFFSIZE],*cpx_buf; 151 kiss_fftr_cfg fft = kiss_fftr_alloc(BUFFSIZE*2 ,0 ,0,0); 152 //kiss_fftr_cfg ifft = kiss_fftr_alloc(BUFFSIZE*2,isinverse,0,0); 153 154 //al_start_audio_recorder(r); 155 156 //while (true) 157 //{ 158 // ALLEGRO_EVENT e; 159 160 // al_wait_for_event(q, &e); 161 162 //if (e.type == ALLEGRO_EVENT_AUDIO_RECORDER_FRAGMENT) 163 //{ 164 165 int nChunkCursor; 166 167 for (nChunkCursor=0; nChunkCursor < nSampleLength; nChunkCursor+=CHUNK_SIZE) 168 { 169 170 //ALLEGRO_AUDIO_RECORDER_EVENT *re = al_get_audio_recorder_event(&e); 171 //audio_buffer_t input = (audio_buffer_t) re->buffer; 172 int sample_count = MIN(CHUNK_SIZE, nSampleLength-nChunkCursor); //re->samples; 173 //const int R = sample_count / BUFFSIZE; 174 int i; 175 176 177 178 179 al_clear_to_color(al_map_rgb(0,0,0)); 180 181 //draw the original waveform (time domain) 182 //for (i = 1; i < SCRW; ++i) 183 //{ 184 // int j, c = 0; 185 186 // /* Take the average of R samples so it fits on the screen */ 187 // for (j = i * R; j < i * R + R && j < sample_count; ++j) 188 // { 189 // c += ptr[nChunkCursor+j]; // input[j] - sample_center; 190 // } 191 // c /= R; 192 193 // /* Draws a line from the previous sample point to the next */ 194 // al_draw_line(i - 1, 195 // SCRH/4 + ( ( (prev - min_sample_val) / (float) sample_range) * SCRH/2 - 256), 196 // i, 197 // SCRH/4 + ( ( (c - min_sample_val) / (float) sample_range) * SCRH/2 - 256), 198 // al_map_rgb(255,255,255), 1.2); 199 200 // prev = c; 201 //} 202 203 //copy to another buffer and remove any bias 204 sum = 0.0; 205 for(i=0;i<BUFFSIZE;i++) 206 { 207 unbiased[i] = ptr[nChunkCursor+i]; // input[i]; 208 sum += unbiased[i]; 209 } 210 211 sum /= (float)BUFFSIZE; 212 213 for(i=0;i<BUFFSIZE;i++) 214 { 215 unbiased[i] -= sum; 216 } 217 218 maxmag = 0.0; 219 //filter out "low input" like noise gate 220 for(i=0;i<BUFFSIZE;i++) 221 { 222 if(maxmag < unbiased[i]) 223 maxmag = unbiased[i]; 224 } 225 if(maxmag < 2.0) 226 goto noise_gate; 227 228 //copy to another buffer while clamping to a triangular "window" 229 //Wild Ass Guess: Pinching off the ends to zero simulates having the sinusoids 230 //begin and end at the beginning and end of the buffer to avoid distorting the fft 231 //while still keeping the buffer size at a power of two. 232 for(i=0;i<BUFFSIZE/2;i++) 233 { 234 float ttmp = (float)i/(BUFFSIZE/2); 235 array[i] = unbiased[i] * ttmp; 236 array[BUFFSIZE - i - 1] *= ttmp; //mirror it on the second half 237 } 238 239 cpx_buf = copycpx(array,BUFFSIZE); 240 kiss_fftr(fft,(kiss_fft_scalar*)cpx_buf, out_cpx); 241 242 //kiss_fftri(ifft,out_cpx,(kiss_fft_scalar*)out ); //the inverse, which we're not using 243 244 //convert the complex numbers to scalar magnitude via Pythagoras 245 maxmag = 0.0; 246 for(i=1;i<BUFFSIZE/4;i++) 247 { 248 magnitudes[i] = 0.005 * (-(sqrt(out_cpx[i].i*out_cpx[i].i) + (out_cpx[i].r * out_cpx[i].r))); //*(float)1.0/i ); 249 if(magnitudes[i] < maxmag) 250 maxmag = magnitudes[i]; 251 } 252 253 maxmag = -200.0/maxmag; 254 255 for(i=0;i<BUFFSIZE/4;i++) 256 { 257 magnitudes[i] *= maxmag; 258 } 259 260 //draw the frequency domain 261 /* Changed this to only include the first 1/4 of the bins because of the max expected frequency 262 * compared to the Nyquist frequency 263 for(i=1;i<BUFFSIZE/2 - 1;i++) 264 { 265 al_draw_line( (i - 1) * 2, 266 SCRH/2 + magnitudes[i-1], 267 i * 2, 268 SCRH/2 + magnitudes[i], 269 al_map_rgb(255,0,0), 1.2); 270 } 271 */ 272 273 for(i=1;i<SCRW;i++) 274 { 275 al_draw_line( (i - 1) * 8, 276 SCRH/2 + magnitudes[i-1], 277 i * 8, 278 SCRH/2 + magnitudes[i], 279 al_map_rgb(255,0,0), 1.2); 280 } 281 282noise_gate: 283 al_flip_display(); 284 //printf("%f\n",maxmag); 285 if(ultimatemag < maxmag) 286 ultimatemag = maxmag; 287 288 //} 289 290 //else if (e.type == ALLEGRO_EVENT_DISPLAY_CLOSE) 291 //{ 292 // break; 293 //} 294 //else if (e.type == ALLEGRO_EVENT_KEY_CHAR) 295 //{ 296 // if (e.keyboard.unichar == 27) 297 // { 298 // /* pressed ESC */ 299 // break; 300 // } 301 //} 302 //} 303 304 } 305 306 /* clean up */ 307 // al_destroy_audio_recorder(r); 308 // al_destroy_audio_stream(s); 309 kiss_fft_cleanup(); 310 free(fft); 311 312 al_stop_sample_instance(pSampleInstance); 313 al_destroy_sample_instance(pSampleInstance); 314 al_destroy_sample(pSample); 315 printf("biggest mag was %f\n",ultimatemag); 316 return 0; 317}