Creating a waveform graph

Creating a waveform graph

raynebc

Member #11,908

May 2010

Hi there. I'm working with a program that uses the ALOGG library. What I'm looking to accomplish is to build a waveform graph of the loaded OGG file so that it can be displayed to the user.

Being a beginner with Allegro and ALOGG, I'm not sure where to start, and was hoping somebody would impart their wisdom. What is usually the most suitable type of graph, a time/amplitude graph? I'm going for something along the lines of what you'd see in software such as Audacity. I realize that to build this graph, I'd have to process the entire file, but how can I use Allegro's or ALOGG's functionality to do that? Would the following be the general concept of what I'd need to perform?

counter = number of audio samples in the input file
width = number of pixels used for the graph's X axis
blocksize = counter/width (the number of samples represented per pixel)

for each block
record the root mean square of the samples in the block
record the peak (max) value of the samples in the block
render the block's peak value on the Y axis in a dark color
render the block's RMS value on the Y axis in a lighter color, on top of the peak
end loop

If that's basically the right method, I just need to know how to obtain the amplitude (or otherwise the most appropriate) value for each sample, and the rest should fall into place. Also, what's the best way to avoid processing the RMS and peaks for the audio file more than once? For example, in Audacity, once it draws the waveform, zooming in and out is quick and the waveform renders seemingly immediately without having to reprocess everything. Thanks in advance for whatever help that can be provided.

Tobias Dammers

Member #2,604

August 2002

First of all, you need a non-compressed representation of your signal one way or another. Floats are the easiest to work with, and with today's CPUs, there is no reason to use 32 bit integers over 32 bit floats. So, first of all, unpack the vorbis data (I'm assuming it is in fact ogg/vorbis, but ogg/flac would work similarly) into an array of floats. If there is a lot of data, it's probably best to do this in chunks.
Then find the samples that fall into the region represented by one pixel in the horizontal direction; which ones these are depends on the scroll position, zoom, and sample rate. Find the largest and smallest amplitude in this timeframe, and draw a vertical line accordingly. To make sure the graph doesn't show any gaps, it's best to include the first sample from the next region in the calculation. Oh, and just in case you didn't know already: The value of each uncompressed sample point is the amplitude (well, technically, it's a numerical representation).

RMS means "root mean square", and that's exactly how you calculate it: Take the square root of the average (a.k.a. mean) of all samples, squared individually, that is:

double root_mean_square(double* sample, int num_samples) {
  double accum = 0.0;
  for (int i = 0; i < num_samples; ++i)
    accum += pow(sample[i], 2.0);
  return sqrt(accum / (double)num_samples);
}

Also, this may prove valuable reading. You don't have to understand it all to benefit from it.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

raynebc

Member #11,908

May 2010

Thanks a bunch for your help.

I see that the ALOGG library has a function to return decoded OGG data, but only when the audio is being played.. What's the easiest way to decode the data without having to play it? Regarding the caching of this processed data, do I just need to recalculate and render the waveform graph each time there is a zoom or seek operation, or is there a best practice such as keeping all the uncompressed samples in memory?

Tobias Dammers

Member #2,604

August 2002

raynebc said:

What's the easiest way to decode the data without having to play it?

Both the ogg container format and the vorbis codec are open formats, and sample implementations can be found on the web. There are also libraries that you can use (libogg, libvorbis) which will do exactly this for you. Ogg-vorbis is maintained by the Xiph Foundation, which is a good starting point for information.

Quote:

Regarding the caching of this processed data, do I just need to recalculate and render the waveform graph each time there is a zoom or seek operation, or is there a best practice such as keeping all the uncompressed samples in memory?

No general rule. Depends on your needs, the size of the waveform, and the system you're going to run this on. As a general guideline, keep in mind that one second of uncompressed stereo PCM at 44.1 kHz and 16 bits takes up about 172 kB per second of audio data.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén