Hey all!
After reading through documentation and the source for Allegro I have come to the conclusion that I am still a newbie, and I need some help figuring out how I can use Allegro's sound capabilities.
I am trying to create a program that can take two sound samples and compare them to determine if they are a close enough match. Now for the most part I have discovered that sample plays a crucial role, but I am not completely sure how it works. I see that the sound data is stored in memory with the help of pointers to direct it, but I was looking at the example posted by spellcaster and noticed that the pointer is basically incremented by the length of the buffer. Now if that is the case what if part of the buffer is not filled and the pointer is just incremented that full size of the buffer. Won't it leave some empty space with silence? Furthermore, I see that pointer is assigned to a unsigned char pointer. Does this mean that I can access any byte of the data by just checking the content of that address?
I am throughly confused as how to work the buffer and/or memory, so that I can take certain parts of it and compare it to another set of data. Can someone enlighten me as to how the raw data is stored in allegro
Thanks,
Dan
To record sound data you do this:
1. Create a SAMPLE that will be large enough to hold the amount of sound data you want to record. Do this with create_sample().
2. Call start_sound_input() to find out how much data is recorded at a time. You won't get the entire sample filled at once, but instead buffer by buffer, so you have to piece it together yourself.
3. Make a pointer that points to the sample's data member.
4. In a loop repeatedly call read_sound_input() giving it the pointer from #3 as input argument. Each time the call to read_sound_input() was successful, increment the pointer by the amount start_sound_input() returned. Continue this until all the sample data has been filled. Make sure you don't go past the sample's data buffer size, otherwise your program will crash.
I am trying to create a program that can take two sound samples and compare them to determine if they are a close enough match.
You will want to do this in the frequency domain. The sample's data is stored in the time domain. You will transform it to frequency domain with Fourier transform. You will probably want to use an algorithm such as Fast Fourier Transform (FFT) for this. Once you've done that for both samples, you can compare them by a number of means. You can calculate the euclidian distance, or you can search for the location of peaks or something like that. If the difference between two frequency footprints is small enough, then you can say the samples match.
Depends on what it is you need to compare. If you're interested in the samples' pitch, then auto-correlating each one and comparing the fundamental pitches may be a better method. For speech recognition, multiple band-pass filters (the analyzer part of a vocoder) may give good results. Otherwise, I'd suggest FFT (with a proper window function).
Ok, so now I am wondering, because I have a snippet of code that displays the contents of the memory, how do the values correlate to the sounds. Anyway, look at the code and see if I am correct in my method on checking the data
1 | #include <allegro.h> |
2 | #include <stdio.h> |
3 | #include <string.h> |
4 | #include <iostream.h> |
5 | |
6 | void init(); |
7 | void deinit(); |
8 | |
9 | int main() |
10 | { |
11 | int audio, working_format, bitcap,stereocap,ratecap; |
12 | int ofs = 0; |
13 | int len; |
14 | int interval; |
15 | SAMPLE *sample; |
16 | int foo; |
17 | int size = 1024*1024; |
18 | unsigned char *buf, *tempbuf, *start; |
19 | init(); |
20 | |
21 | //Let's find out what we are capable of |
22 | bitcap = get_sound_input_cap_bits(); |
23 | if (bitcap == 0) |
24 | allegro_message("No Audio input capabilty?"); |
25 | stereocap = get_sound_input_cap_stereo(); |
26 | if (stereocap == 0) |
27 | allegro_message("No Stereo recording"); |
28 | ratecap = get_sound_input_cap_rate(bitcap, 0); |
29 | working_format = get_sound_input_cap_parm(ratecap, bitcap, 0); |
30 | if (working_format == 0) |
31 | allegro_message("We Are not a go!"); // |
32 | else if (working_format == 1) |
33 | allegro_message("We Are a go but can't record and playback at the same time!"); |
34 | else if (working_format == 2) |
35 | allegro_message("We Are a go!"); |
36 | |
37 | //Setting up the sample and the pointers |
38 | sample = create_sample(bitcap, 0, ratecap, size); |
39 | buf=(unsigned char*)(sample->data); |
40 | start = buf; |
41 | |
42 | //Let's start recording |
43 | allegro_message("Start Recording"); |
44 | len = start_sound_input(ratecap, 16, 0); |
45 | |
46 | //Find the interval to loop according to Spellcaster |
47 | interval = 1000 / ((len / 2) / ratecap); |
48 | /* use 9/10 of the max intervall */ |
49 | interval *= 9; |
50 | interval /= 10; |
51 | printf("length of buffer in bytes = %i, max interval = %i\n", len, interval); |
52 | |
53 | //Start the looping to transfer the data into the sample...again this is taken from spellcaster |
54 | while (!keypressed() && ofs < size) |
55 | { |
56 | foo = read_sound_input(buf); |
57 | if (foo > 0) |
58 | { |
59 | ofs += len; |
60 | buf += len; |
61 | } |
62 | rest(interval); |
63 | } |
64 | if (keypressed()) |
65 | { |
66 | foo = read_sound_input(buf); |
67 | if (foo > 0) |
68 | { |
69 | ofs += len; |
70 | buf += len; |
71 | } |
72 | } |
73 | |
74 | //Stop the sound recording |
75 | stop_sound_input(); |
76 | |
77 | //lets put a pointer to the first memory location and output the contents |
78 | tempbuf = start; |
79 | while(tempbuf < start + 100) |
80 | { |
81 | printf("in %p is %u\n", tempbuf,*tempbuf); //changed to %p and %u from %i |
82 | tempbuf++; |
83 | } |
84 | |
85 | //Finally lets play it |
86 | play_sample(sample, 255, 128, 1000,0); |
87 | |
88 | while (!key[KEY_ESC]) { |
89 | /* put your code here */ |
90 | } |
91 | |
92 | deinit(); |
93 | return 0; |
94 | } |
95 | END_OF_MAIN() |
96 | |
97 | void init() |
98 | { |
99 | allegro_init(); |
100 | install_keyboard(); |
101 | install_sound(DIGI_AUTODETECT,MIDI_AUTODETECT, NULL); |
102 | install_sound_input(DIGI_AUTODETECT,MIDI_NONE); |
103 | } |
104 | |
105 | void deinit() |
106 | { |
107 | clear_keybuf(); |
108 | } |
Just to explain what I think I am looking at, the read_sound_input(buf) takes the buffer, all 88200 bytes in this case, and puts each byte into a memory location. Therefore, each piece of sound data is represented by numerical value from 0 to 255 since that is the range of a value that can be stored in a byte. Also it appears to be a linked list, so that data is stored in a linear progression.
Now I am curious as to how to interpret this because it looks like the interval is allowing for the buffer to be emptied into the sample every 900ms. Which means that we are storing roughly 98,000 bytes each second, but if I look at the data that streams out of the memory it looks random and doesn't seem to correlate to the sound. For example, if I take out the microphone I get data like the following
1 | length of buffer in bytes = 88200, max interval = 900 |
2 | in 022A0020 is 0 |
3 | in 022A0021 is 128 |
4 | in 022A0022 is 0 |
5 | in 022A0023 is 128 |
6 | in 022A0024 is 0 |
7 | in 022A0025 is 128 |
8 | in 022A0026 is 0 |
9 | in 022A0027 is 128 |
10 | in 022A0028 is 0 |
11 | in 022A0029 is 128 |
12 | in 022A002A is 0 |
13 | in 022A002B is 128 |
14 | in 022A002C is 0 |
15 | in 022A002D is 128 |
16 | in 022A002E is 255 |
17 | in 022A002F is 127 |
18 | in 022A0030 is 0 |
19 | in 022A0031 is 128 |
20 | in 022A0032 is 255 |
21 | in 022A0033 is 127 |
22 | in 022A0034 is 0 |
23 | in 022A0035 is 128 |
24 | in 022A0036 is 255 |
25 | in 022A0037 is 127 |
26 | in 022A0038 is 0 |
27 | in 022A0039 is 128 |
28 | in 022A003A is 255 |
29 | in 022A003B is 127 |
30 | in 022A003C is 0 |
31 | in 022A003D is 128 |
32 | in 022A003E is 255 |
33 | in 022A003F is 127 |
34 | in 022A0040 is 0 |
35 | in 022A0041 is 128 |
36 | in 022A0042 is 255 |
37 | in 022A0043 is 127 |
38 | in 022A0044 is 0 |
39 | in 022A0045 is 128 |
40 | in 022A0046 is 255 |
41 | in 022A0047 is 127 |
42 | in 022A0048 is 0 |
43 | in 022A0049 is 128 |
44 | in 022A004A is 255 |
45 | in 022A004B is 127 |
46 | in 022A004C is 0 |
47 | in 022A004D is 128 |
48 | in 022A004E is 255 |
49 | in 022A004F is 127 |
50 | in 022A0050 is 0 |
51 | in 022A0051 is 128 |
52 | in 022A0052 is 255 |
53 | in 022A0053 is 127 |
54 | in 022A0054 is 0 |
55 | in 022A0055 is 128 |
56 | in 022A0056 is 255 |
57 | in 022A0057 is 127 |
58 | in 022A0058 is 0 |
59 | in 022A0059 is 128 |
60 | in 022A005A is 255 |
61 | in 022A005B is 127 |
62 | in 022A005C is 0 |
63 | in 022A005D is 128 |
64 | in 022A005E is 14 |
65 | in 022A005F is 128 |
66 | in 022A0060 is 24 |
67 | in 022A0061 is 128 |
68 | in 022A0062 is 30 |
69 | in 022A0063 is 128 |
70 | in 022A0064 is 18 |
71 | in 022A0065 is 128 |
72 | in 022A0066 is 10 |
73 | in 022A0067 is 128 |
74 | in 022A0068 is 35 |
75 | in 022A0069 is 128 |
76 | in 022A006A is 31 |
77 | in 022A006B is 128 |
//edit: I noticed that I get the same values when I run it multiple times. I wonder if this means anything?
This makes me wonder if each byte of the sample memory is actually a member of a group of bytes that form a piece of sound data instead of just one byte forming a piece of sound data. Additionally, I know my sound card is probably junk, but I would think that the values that were to be stored would be closer to 0 since the microphone was unplugged.
how do the values correlate to the sounds
Depends on whether you record in 8 or 16 bits and stereo or mono. Your code snippet wrong. You should read the manual to find the meaning of the values returned by the get_sound_input_cap_xxx() functions. When you call create sample, you should pass it parameters that you want to and at the same time know your hardware supports (by looking at the previously mentioned values). Then pass the same parameters to start_sound_input().
For example if you want to record 4 seconds of sound in 8bit mono at 22kHz, you use the get_sound_input_cap_xxx() functions to see if that is supported on your hardware and if it is, pass 8, 0, 22050 and 4*22050 to the create_sample() function. Then pass 22050, 8 and 0 to start_sound_input(). Then when you read sound input, you will get 22050 8bit values (bytes) for each second of recorded sound. I'm not sure whether allegro uses signed or unsigned format, you should check with the documentation or source, it has to be documented somewhere. If you record in 16bit, then you will get 44100 bytes for each second, that is 22050 16bit values. And if you record in stereo, you will get twice as many values. Again I'm not sure, but I think in stereo Allegro uses the scheme that interlaces left and right channels. That is one value for left, one for right, one for left, one for right and so on.
You should read the manual to find the meaning of the values returned by the get_sound_input_cap_xxx() functions.
Well I went through that part of the code and cleaned it up so that it should be correct, but I know that the setup shouldn't have been a problem because it was playing the samples back just fine. However, I discovered that I cannot record at 8 bits. When I tried get_sound_input_cap_rate(8, stereocap); it was returning a zero, so I am assuming my sound card on my laptop is crap.
Regardless, this is the updated part of the code that checks my hardware, but I am absolutely sure that it is correct. Then on top of this I am using the ratecap, bitcap, and stereocap as parameters in create_sample
1 | //Let's find out what we are capable of |
2 | bitcap = get_sound_input_cap_bits(); |
3 | if (bitcap == 0) |
4 | allegro_message("No Audio input capabilty?"); |
5 | stereocap = get_sound_input_cap_stereo(); |
6 | if (stereocap == 0) |
7 | allegro_message("No Stereo recording"); |
8 | ratecap = get_sound_input_cap_rate(bitcap, stereocap); |
9 | working_format = get_sound_input_cap_parm(ratecap, bitcap, stereocap); |
10 | if (working_format == 0) |
11 | allegro_message("We Are not a go!"); // |
12 | else if (working_format == 1) |
13 | allegro_message("We Are a go but can't record and playback at the same time!"); |
14 | else if (working_format == 2) |
15 | allegro_message("We Are a go!"); |
16 | |
17 | printf("We are capable of recording at %i bits, with %i stereo, at a rate of %i \n" , bitcap, stereocap, ratecap); |
So I don't see how this can be wrong, however, I think I am going to put zero back in to simplify the data.
Then on top of this I am using the ratecap, bitcap, and stereocap as parameters in create_sample
According to the manual if your hardware supports both 8bit and 16bit, bitcap will be 24 (that is 8 & 16). Don't use the value of bitcap, use either 8 or 16, depending on what you're interested in.
GOT IT!!!...sorta
I should have realized this. A char is only a byte long so it doesn't fit the data correctly for 16-bit sound which I am using. Therefore, I used unsigned short as the variable that I am using to store and and output the sound data, but I was getting around 32768 as the output. Now 2^15 equals 32768, so this means that the 16th bit is the sign bit.
Nice theory right? When I tried to use signed short I got numbers around -32768, so it sounds like it is including the 16th bit in the value of the number. However, I have to go to work, so I'll post my results later, but I think I am on the right path to understanding how the data is stored. Thanks miran for pounding the simple concepts into my head.
Now I just need to find a way to compare the data. Oh and the best part is when I increment the pointed it automatically skips the next byte.