Sound recording basics
clarkd

Hey all!

After reading through documentation and the source for Allegro I have come to the conclusion that I am still a newbie, and I need some help figuring out how I can use Allegro's sound capabilities.

I am trying to create a program that can take two sound samples and compare them to determine if they are a close enough match. Now for the most part I have discovered that sample plays a crucial role, but I am not completely sure how it works. I see that the sound data is stored in memory with the help of pointers to direct it, but I was looking at the example posted by spellcaster and noticed that the pointer is basically incremented by the length of the buffer. Now if that is the case what if part of the buffer is not filled and the pointer is just incremented that full size of the buffer. Won't it leave some empty space with silence? Furthermore, I see that pointer is assigned to a unsigned char pointer. Does this mean that I can access any byte of the data by just checking the content of that address?

I am throughly confused as how to work the buffer and/or memory, so that I can take certain parts of it and compare it to another set of data. Can someone enlighten me as to how the raw data is stored in allegro

Thanks,

Dan

miran

To record sound data you do this:

1. Create a SAMPLE that will be large enough to hold the amount of sound data you want to record. Do this with create_sample().

2. Call start_sound_input() to find out how much data is recorded at a time. You won't get the entire sample filled at once, but instead buffer by buffer, so you have to piece it together yourself.

3. Make a pointer that points to the sample's data member.

4. In a loop repeatedly call read_sound_input() giving it the pointer from #3 as input argument. Each time the call to read_sound_input() was successful, increment the pointer by the amount start_sound_input() returned. Continue this until all the sample data has been filled. Make sure you don't go past the sample's data buffer size, otherwise your program will crash.

Quote:

I am trying to create a program that can take two sound samples and compare them to determine if they are a close enough match.

You will want to do this in the frequency domain. The sample's data is stored in the time domain. You will transform it to frequency domain with Fourier transform. You will probably want to use an algorithm such as Fast Fourier Transform (FFT) for this. Once you've done that for both samples, you can compare them by a number of means. You can calculate the euclidian distance, or you can search for the location of peaks or something like that. If the difference between two frequency footprints is small enough, then you can say the samples match.

Tobias Dammers

Depends on what it is you need to compare. If you're interested in the samples' pitch, then auto-correlating each one and comparing the fundamental pitches may be a better method. For speech recognition, multiple band-pass filters (the analyzer part of a vocoder) may give good results. Otherwise, I'd suggest FFT (with a proper window function).

clarkd

Ok, so now I am wondering, because I have a snippet of code that displays the contents of the memory, how do the values correlate to the sounds. Anyway, look at the code and see if I am correct in my method on checking the data

1#include <allegro.h>
2#include <stdio.h>
3#include <string.h>
4#include <iostream.h>
5 
6void init();
7void deinit();
8 
9int main()
10{
11 int audio, working_format, bitcap,stereocap,ratecap;
12 int ofs = 0;
13 int len;
14 int interval;
15 SAMPLE *sample;
16 int foo;
17 int size = 1024*1024;
18 unsigned char *buf, *tempbuf, *start;
19 init();
20
21 //Let's find out what we are capable of
22 bitcap = get_sound_input_cap_bits();
23 if (bitcap == 0)
24 allegro_message("No Audio input capabilty?");
25 stereocap = get_sound_input_cap_stereo();
26 if (stereocap == 0)
27 allegro_message("No Stereo recording");
28 ratecap = get_sound_input_cap_rate(bitcap, 0);
29 working_format = get_sound_input_cap_parm(ratecap, bitcap, 0);
30 if (working_format == 0)
31 allegro_message("We Are not a go!"); //
32 else if (working_format == 1)
33 allegro_message("We Are a go but can't record and playback at the same time!");
34 else if (working_format == 2)
35 allegro_message("We Are a go!");
36
37 //Setting up the sample and the pointers
38 sample = create_sample(bitcap, 0, ratecap, size);
39 buf=(unsigned char*)(sample->data);
40 start = buf;
41
42 //Let's start recording
43 allegro_message("Start Recording");
44 len = start_sound_input(ratecap, 16, 0);
45
46 //Find the interval to loop according to Spellcaster
47 interval = 1000 / ((len / 2) / ratecap);
48 /* use 9/10 of the max intervall */
49 interval *= 9;
50 interval /= 10;
51 printf("length of buffer in bytes = %i, max interval = %i\n", len, interval);
52
53 //Start the looping to transfer the data into the sample...again this is taken from spellcaster
54 while (!keypressed() && ofs < size)
55 {
56 foo = read_sound_input(buf);
57 if (foo > 0)
58 {
59 ofs += len;
60 buf += len;
61 }
62 rest(interval);
63 }
64 if (keypressed())
65 {
66 foo = read_sound_input(buf);
67 if (foo > 0)
68 {
69 ofs += len;
70 buf += len;
71 }
72 }
73
74 //Stop the sound recording
75 stop_sound_input();
76
77 //lets put a pointer to the first memory location and output the contents
78 tempbuf = start;
79 while(tempbuf < start + 100)
80 {
81 printf("in %p is %u\n", tempbuf,*tempbuf); //changed to %p and %u from %i
82 tempbuf++;
83 }
84
85 //Finally lets play it
86 play_sample(sample, 255, 128, 1000,0);
87 
88 while (!key[KEY_ESC]) {
89 /* put your code here */
90 }
91 
92 deinit();
93 return 0;
94}
95END_OF_MAIN()
96 
97void init()
98{
99 allegro_init();
100 install_keyboard();
101 install_sound(DIGI_AUTODETECT,MIDI_AUTODETECT, NULL);
102 install_sound_input(DIGI_AUTODETECT,MIDI_NONE);
103}
104 
105void deinit()
106{
107 clear_keybuf();
108}

Just to explain what I think I am looking at, the read_sound_input(buf) takes the buffer, all 88200 bytes in this case, and puts each byte into a memory location. Therefore, each piece of sound data is represented by numerical value from 0 to 255 since that is the range of a value that can be stored in a byte. Also it appears to be a linked list, so that data is stored in a linear progression.

Now I am curious as to how to interpret this because it looks like the interval is allowing for the buffer to be emptied into the sample every 900ms. Which means that we are storing roughly 98,000 bytes each second, but if I look at the data that streams out of the memory it looks random and doesn't seem to correlate to the sound. For example, if I take out the microphone I get data like the following

1length of buffer in bytes = 88200, max interval = 900
2in 022A0020 is 0
3in 022A0021 is 128
4in 022A0022 is 0
5in 022A0023 is 128
6in 022A0024 is 0
7in 022A0025 is 128
8in 022A0026 is 0
9in 022A0027 is 128
10in 022A0028 is 0
11in 022A0029 is 128
12in 022A002A is 0
13in 022A002B is 128
14in 022A002C is 0
15in 022A002D is 128
16in 022A002E is 255
17in 022A002F is 127
18in 022A0030 is 0
19in 022A0031 is 128
20in 022A0032 is 255
21in 022A0033 is 127
22in 022A0034 is 0
23in 022A0035 is 128
24in 022A0036 is 255
25in 022A0037 is 127
26in 022A0038 is 0
27in 022A0039 is 128
28in 022A003A is 255
29in 022A003B is 127
30in 022A003C is 0
31in 022A003D is 128
32in 022A003E is 255
33in 022A003F is 127
34in 022A0040 is 0
35in 022A0041 is 128
36in 022A0042 is 255
37in 022A0043 is 127
38in 022A0044 is 0
39in 022A0045 is 128
40in 022A0046 is 255
41in 022A0047 is 127
42in 022A0048 is 0
43in 022A0049 is 128
44in 022A004A is 255
45in 022A004B is 127
46in 022A004C is 0
47in 022A004D is 128
48in 022A004E is 255
49in 022A004F is 127
50in 022A0050 is 0
51in 022A0051 is 128
52in 022A0052 is 255
53in 022A0053 is 127
54in 022A0054 is 0
55in 022A0055 is 128
56in 022A0056 is 255
57in 022A0057 is 127
58in 022A0058 is 0
59in 022A0059 is 128
60in 022A005A is 255
61in 022A005B is 127
62in 022A005C is 0
63in 022A005D is 128
64in 022A005E is 14
65in 022A005F is 128
66in 022A0060 is 24
67in 022A0061 is 128
68in 022A0062 is 30
69in 022A0063 is 128
70in 022A0064 is 18
71in 022A0065 is 128
72in 022A0066 is 10
73in 022A0067 is 128
74in 022A0068 is 35
75in 022A0069 is 128
76in 022A006A is 31
77in 022A006B is 128

//edit: I noticed that I get the same values when I run it multiple times. I wonder if this means anything?

This makes me wonder if each byte of the sample memory is actually a member of a group of bytes that form a piece of sound data instead of just one byte forming a piece of sound data. Additionally, I know my sound card is probably junk, but I would think that the values that were to be stored would be closer to 0 since the microphone was unplugged.

miran
Quote:

how do the values correlate to the sounds

Depends on whether you record in 8 or 16 bits and stereo or mono. Your code snippet wrong. You should read the manual to find the meaning of the values returned by the get_sound_input_cap_xxx() functions. When you call create sample, you should pass it parameters that you want to and at the same time know your hardware supports (by looking at the previously mentioned values). Then pass the same parameters to start_sound_input().

For example if you want to record 4 seconds of sound in 8bit mono at 22kHz, you use the get_sound_input_cap_xxx() functions to see if that is supported on your hardware and if it is, pass 8, 0, 22050 and 4*22050 to the create_sample() function. Then pass 22050, 8 and 0 to start_sound_input(). Then when you read sound input, you will get 22050 8bit values (bytes) for each second of recorded sound. I'm not sure whether allegro uses signed or unsigned format, you should check with the documentation or source, it has to be documented somewhere. If you record in 16bit, then you will get 44100 bytes for each second, that is 22050 16bit values. And if you record in stereo, you will get twice as many values. Again I'm not sure, but I think in stereo Allegro uses the scheme that interlaces left and right channels. That is one value for left, one for right, one for left, one for right and so on.

clarkd
Quote:

You should read the manual to find the meaning of the values returned by the get_sound_input_cap_xxx() functions.

Well I went through that part of the code and cleaned it up so that it should be correct, but I know that the setup shouldn't have been a problem because it was playing the samples back just fine. However, I discovered that I cannot record at 8 bits. When I tried get_sound_input_cap_rate(8, stereocap); it was returning a zero, so I am assuming my sound card on my laptop is crap.

Regardless, this is the updated part of the code that checks my hardware, but I am absolutely sure that it is correct. Then on top of this I am using the ratecap, bitcap, and stereocap as parameters in create_sample

1//Let's find out what we are capable of
2bitcap = get_sound_input_cap_bits();
3if (bitcap == 0)
4 allegro_message("No Audio input capabilty?");
5stereocap = get_sound_input_cap_stereo();
6if (stereocap == 0)
7 allegro_message("No Stereo recording");
8ratecap = get_sound_input_cap_rate(bitcap, stereocap);
9working_format = get_sound_input_cap_parm(ratecap, bitcap, stereocap);
10if (working_format == 0)
11 allegro_message("We Are not a go!"); //
12else if (working_format == 1)
13 allegro_message("We Are a go but can't record and playback at the same time!");
14else if (working_format == 2)
15 allegro_message("We Are a go!");
16
17printf("We are capable of recording at %i bits, with %i stereo, at a rate of %i \n" , bitcap, stereocap, ratecap);

So I don't see how this can be wrong, however, I think I am going to put zero back in to simplify the data.

miran
Quote:

Then on top of this I am using the ratecap, bitcap, and stereocap as parameters in create_sample

According to the manual if your hardware supports both 8bit and 16bit, bitcap will be 24 (that is 8 & 16). Don't use the value of bitcap, use either 8 or 16, depending on what you're interested in.

clarkd

;D;D;DGOT IT!!!...sorta

I should have realized this. A char is only a byte long so it doesn't fit the data correctly for 16-bit sound which I am using. Therefore, I used unsigned short as the variable that I am using to store and and output the sound data, but I was getting around 32768 as the output. Now 2^15 equals 32768, so this means that the 16th bit is the sign bit.

Nice theory right? When I tried to use signed short I got numbers around -32768, so it sounds like it is including the 16th bit in the value of the number. However, I have to go to work, so I'll post my results later, but I think I am on the right path to understanding how the data is stored. Thanks miran for pounding the simple concepts into my head.

Now I just need to find a way to compare the data. Oh and the best part is when I increment the pointed it automatically skips the next byte.

Thread #588370. Printed from Allegro.cc