NAME
PDL::Audio - Some PDL functions intended for audio processing.
SYNOPSIS
use PDL;
use PDL::Audio;
DESCRIPTION
Oh well ;) Not much "introductory documentation" has been written yet :(
NOTATION
Brackets around parameters indicate that the respective parameter is
optional and will be replaced with some default value when absent (or
"undef", which might be different in other packages).
The sampling frequency and duration are by default (see individual
descriptions) given in cycles/sample (or samples in case of a duration).
That means if you want to specify a duration of two seconds, you have to
multiply by the sampling frequency in HZ, and if you want to specify a
frequency of 440 Hz, you have to divide by the sampling frequency:
# Syntax: gen_oscil duration*, frequency/
$signal = gen_oscil 2*HZ, 440/HZ;
# with a sampling frequency of 44100 Hertz:
$signal = gen_oscil 2*44100, 440/44100;
To help you, the required unit is given as a type suffix in the
parameter name. A "/" means that you have to divide by the sampling
frequency (to convert from Hertz) and a suffix of "*" indicates that a
multiplication is required.
Most parameters named "size", "duration" (or marked with "*") can be
replaced by a piddle, which is then used to give length and from
(mono/stereo).
HEADER ATTRIBUTES
The following header attributes are stored and evaluated by most
functions. PDL::Audio provides mutator methods for all them (e.g.
print "samplerate is ", $pdl->rate;
$pdl->comment("set the comment to this string");
rate
The sampling rate in hz.
filetype
The filetype (wav, au etc..). Must be one of:
FILE_NEXT FILE_AIFC FILE_RIFF FILE_BICSF FILE_NIST FILE_INRS FILE_ESPS
FILE_SVX FILE_VOC FILE_SNDT FILE_RAW FILE_SMP FILE_SD2 FILE_AVR
FILE_IRCAM FILE_SD1 FILE_SPPACK FILE_MUS10 FILE_HCOM FILE_PSION
FILE_MAUD FILE_IEEE FILE_DESKMATE FILE_DESKMATE_2500 FILE_MATLAB
FILE_ADC FILE_SOUNDEDIT FILE_SOUNDEDIT_16 FILE_DVSM FILE_MIDI
FILE_ESIGNAL FILE_SOUNDFONT FILE_GRAVIS FILE_COMDISCO FILE_GOLDWAVE
FILE_SRFS FILE_MIDI_SAMPLE_DUMP FILE_DIAMONDWARE FILE_REALAUDIO
FILE_ADF FILE_SBSTUDIOII FILE_DELUSION FILE_FARANDOLE FILE_SAMPLE_DUMP
FILE_ULTRATRACKER FILE_YAMAHA_SY85 FILE_YAMAHA_TX16 FILE_DIGIPLAYER
FILE_COVOX FILE_SPL FILE_AVI FILE_OMF FILE_QUICKTIME FILE_ASF
FILE_YAMAHA_SY99 FILE_KURZWEIL_2000 FILE_AIFF FILE_AU
path
The filename (or file specification) used to load or save a file.
format
Specifies the type the underlying file format uses. The samples will
always be in short or long signed format.
Must be one of
FORMAT_NO_SND FORMAT_16_LINEAR FORMAT_8_MULAW FORMAT_8_LINEAR
FORMAT_32_FLOAT FORMAT_32_LINEAR FORMAT_8_ALAW FORMAT_8_UNSIGNED
FORMAT_24_LINEAR FORMAT_64_DOUBLE FORMAT_16_LINEAR_LITTLE_ENDIAN
FORMAT_32_LINEAR_LITTLE_ENDIAN FORMAT_32_FLOAT_LITTLE_ENDIAN
FORMAT_64_DOUBLE_LITTLE_ENDIAN FORMAT_16_UNSIGNED
FORMAT_16_UNSIGNED_LITTLE_ENDIAN FORMAT_24_LINEAR_LITTLE_ENDIAN
FORMAT_32_VAX_FLOAT FORMAT_12_LINEAR FORMAT_12_LINEAR_LITTLE_ENDIAN
FORMAT_12_UNSIGNED FORMAT_12_UNSIGNED_LITTLE_ENDIAN COMPATIBLE_FORMAT
PDL::Audio conviniently defines the following aliases for the
following constants, that are already correct for the host
byteorder:
FORMAT_ULAW_BYTE FORMAT_ALAW_BYTE FORMAT_LINEAR_BYTE
FORMAT_LINEAR_SHORT FORMAT_LINEAR_USHORT FORMAT_LINEAR_LONG
FORMAT_LINEAR_FLOAT FORMAT_LINEAR_DOUBLE
comment
The file comment (if any).
device
The device to output audio. One of:
DEV_DEFAULT DEV_READ_WRITE DEV_ADAT_IN DEV_AES_IN DEV_LINE_OUT
DEV_LINE_IN DEV_MICROPHONE DEV_SPEAKERS DEV_DIGITAL_IN DEV_DIGITAL_OUT
DEV_DAC_OUT DEV_ADAT_OUT DEV_AES_OUT DEV_DAC_FILTER DEV_MIXER
DEV_LINE1 DEV_LINE2 DEV_LINE3 DEV_AUX_INPUT DEV_CD_IN DEV_AUX_OUTPUT
DEV_SPDIF_IN DEV_SPDIF_OUT
EXPORTED CONSTANTS
In addition to the exported constants described above (and later in the
function descriptions), this module also exports the mathematical
constants M_PI and M_2PI, so watch out for clashes!
FUNCTIONS
sound_format_name format_code
Return the human-readable name of the file format with code
"format_code".
sound_type_name type_code
Return the human-readable name of the sample type with code "type_code".
describe_audio piddle
Describe the audio stream contained in piddle and return it as a string.
A fresh piddle might return:
mono sound with 27411 samples
Whereas a freshly loaded soundfile might yield:
stereo sound with 27411 samples, original name "kongas.wav", type 2 (RIFF),
rate 11025/s (duration 2.49s), format 7 (8-bit unsigned)
raudio path, [option-hash], option => value, ...
Reads audio data into the piddle. Options can be anything, most useful
values are "filetype", "rate", "channels" and "format".
# read any file
$pdl = raudio "file.wav";
# read a file. if it is a raw file preset values
$pdl = raudio "file.raw", filetype => FILE_RAW, rate => 44100, channels => 2;
waudio pdl, [option-hash], option => value, ...
Writes a pdl as a file. The path is taken from the header (or the
options), e.g.:
# write a file, using the header of another piddle
$pdl->waudio($orig_file->gethdr);
# write pdl as au file, take rate from the header
$pdl->waudio(path => "piddle.au", filetype => FILE_AU, format => FORMAT_16_LINEAR;
cut_leading_silence pdl, level
Cuts the leading silence (i.e. all samples with absolute value < level)
and returns the resulting part.
cut_trailing_silence pdl, level
Cuts the trailing silence.
cut_silence pdl, level
Calls "cut_leading_silence" and "cut_trailing_silence" and returns the
result.
playaudio pdl, [option-hash], option => value ...
Play the piddle as an audio file. Options can be supplied either through
the option hash (a hash-reference), through the pdl header or the
options:
# play a piddle that has a valid header (e.g. from raudio)
$pdl->playaudio;
# play it with a different samplerate
$pdl->playaudio(rate => 22050);
ulaw2linear
Signature: (byte u(n); short [o] s(n))
conversion from (m)u-law into signed, linear, 16 bit samples (rather
slow)
linear2ulaw
Signature: (short s(n); byte [o] u(n))
conversion from signed, linear, 16 bit samples into (m)u-law (rather
slow)
alaw2linear
Signature: (byte u(n); short [o] s(n))
conversion from A-law into signed, linear, 16 bit samples (rather slow)
linear2alaw
Signature: (short s(n); byte [o] u(n))
conversion from signed, linear, 16 bit samples into A-law (rather slow)
gen_oscil duration*, freq/, phase-mod, [fm-mod/]
gen_sawtooth duration*, freq/, phase-mod, [fm-mod/]
gen_square duration*, freq/, phase-mod, duty, [fm-mod/]
gen_triangle duration*, freq/, phase-mod, [fm-mod/]
gen_pulse_train duration*, freq/, phase-mod, [fm-mod/]
gen_rand duration*, freq/
gen_rand_1f duration*
All of these functions generate appropriate waveforms with frequency
"freq" (cycles/sample) and phase "phase" (0..1).
The "duration" might be either a piddle (which gives the form of the
output) or the number of samples to generate.
The output samples are between -1 and +1 (i.e. "-1 <= s <= +1").
The "duty" parameter of the square generator influences the duty cycle
of the signal. Zero means 50%-50%, 0.5 means 75% on, 25% off, -0.8 means
10% on, 90% off etc... Of course, the "duty" parameter might also be a
vector of size "duration".
gen_env duration*, xvals, yvals, [base]
Generates an interpolated envelope between the points given by xvals and
yvals. When base == 1 (the default) then the values will be linearly
interpolated, otherwise they follow an exponential curve that is bend
inwards (base < 1) or outwards (base > 1).
# generate a linear envelope with attack in the first 10%
gen_env 5000, [0 1 2 9 10], [0 1 0.6 0.6 0];
gen_adsr duration*, sustain-level, attack-time, decay-time, sustain-time, release-time
Simple ADSR envelope generator. The "sustain-level" is the amplitude (0
to 1) of the sustain level. The other for parameters give the relative
interval times, in any unit you like, only their relative ratios are
important. Any of these times might be zero, in which case the
corresponding part is omitted from the envelope.
gen_asymmetric_fm duration*, freq/, phase, [r , [ratio]]
"gen_asymmetric_fm" provides a way around the symmetric spectra normally
produced by FM. See Palamin and Palamin, "A Method of Generating and
Controlling Asymmetrical Spectra" JAES vol 36, no 9, Sept 88, p671-685.
gen_sum_of_cosines duration*, freq/, phase, ncosines, [fm_mod/]
Generates a sum of "n" cosines "(1 + 2(cos(x) + cos(2x) + ... cos(nx)) =
sin((n+.5)x) / sin(x/2))". Other arguments are similar to to
"gen_oscil".
gen_sine_summation duration*, freq/, phase, [nsines, [a, [b_ratio, [fm_mod/]]]]
"gen_sine_summation" provides a kind of additive synthesis. See
J.A.Moorer, "Signal Processing Aspects of Computer Music" and "The
Synthesis of Complex Audio Spectra by means of Discrete Summation
Formulae" (Stan-M-5). The basic idea is very similar to that used in
gen_sum_of_cosines generator.
The default value for "nsines" is 1 (but zero is a valid value), for "a"
is 0.5 and for "b_ratio" is 1.
(btw, either my formula is broken or the output indeed does not lie
between -1 and +1, but rather -5 .. +5).
gen_from_table duration*, frequency/, table, [phase], [fm_mod/]
"gen_from_table" generates a waveform by repeating a waveform given in
"table", linearly interpolating between successive points of the
"waveform".
partials2waveshape size*, partials, amplitudes, [phase], [fm_mod/]
Take a (perl or pdl) list of (integer) "partials" and a list of
"amplitudes" and generate a single wave shape that results by adding
these partial sines.
This could (and should) be used by the "gen_from_table" generator.
gen_from_partials duration*, frequency/, partials, amplitudes, [phase], [fm_mod/]
Take a (perl list or pdl) list of (possibly noninteger) "partials" and a
list of "amplitudes" and generate the waveform resulting by summing up
all these partial sines.
filter_one_zero
Signature: (in(n); [o] out(n); double a0; double a1)
apply a one zero filter, y(n) = a0 x(n) + a1 x(n-1)
filter_one_pole
Signature: (in(n); [o] out(n); double a0; double b1)
apply a one pole filter, y(n) = a0 x(n) - b1 y(n-1)
filter_two_zero
Signature: (in(n); [o] out(n); double a0; double a1; double a2)
apply a two zero filter, y(n) = a0 x(n) + a1 x(n-1) + a2 x(n-2)
filter_two_pole
Signature: (in(n); [o] out(n); double a0; double b1; double b2)
apply a two pole filter, y(n) = a0 x(n) - b1 y(n-1) - b2 y(n-2)
filter_formant
Signature: (in(n); [o] out(n); double radius; double frequency; double gain)
apply a formant filter, y(n) = x(n) - r*x(n-2) +
2*r*cos(2*pi*frequency)*y(n-1) - r*r*y(n-2). A good value for "gain" is
1.
filter_ppolar pdl, radius/, frequency/
apply a two pole filter (given in polar form). The filter has two poles,
one at (radius,frequency), the other at (radius,-frequency). Radius is
between 0 and 1 (but less than 1), and frequency is between 0 and 0.5.
This is the standard resonator form with poles specified by the polar
coordinates of one pole.
filter_zpolar pdl, radius/, frequency/
apply a two zero filter (given in polar form). See "filter_ppolar".
partials2polynomial partials, [kind]
"partials2polynomial" takes a list of harmonic amplitudes and returns a
list of Chebychev polynomial coefficients. The argument "kind"
determines which kind of Chebychev polynomial we are interested in, 1st
kind or 2nd kind. (default is 1).
ring_modulate in1, in2
ring modulates in1 with in2 (this is just a multiply).
amplitude_modulate am_carrier, in1, in2
amplitude modulates am_carrier and in2 with in1 (this calculates in1 *
(am_carrier + in2)).
filter_sir
Signature: (x(n); a(an); b(bn); [o]y(n))
Generic (short delay) impulse response filter. "x" is the input signal
(which is supposed to be zero for negative indices). "a" contains the
input (x) coefficients (a0, a1, .. an), whereas "b" contains the output
(y) coefficients (b0, b1, ... bn), i.e.:
y(n) = a0 x(n) - b1 y(n-1) + a1 x(n-1) - b2 y(n-2) + a2 x(n-2) - b3 ...
This can be used to generate fir and iir filters of any length, or even
more complicated constructs.
"b0" (then first element of "b") is being ignored currently AND SHOULD
BE SPECIFIED AS ONE FOR FUTURE COMPATIBILITY
filter_lir
Signature: (x(n); int a_x(an); a_y(an); int b_x(bn); b_y(bn); [o]y(n))
Generic (long delay) impulse response filter. The difference to
"filter_sir" is that the filter coefficients need not be consecutive,
but instead their indices are given by the "a_x" and "b_x" (integer)
vectors, while the corresponding coefficients are in "a_y" and "b_y".
(All "a_x" must be >= 0, while all the "b_x" must be >= 1, as you should
expect).
See "filter_sir" for more info.
filter_fir input, xcoeffs
Apply a fir (finite impulse response) filter to "input". This is the
same as calling:
filter_sir input, xcoeffs, pdl()
filter_iir input, ycoeffs
Apply a iir (infinite impulse response) filter to "input". This is just
another way of saying:
filter_sir input, pdl(1), ycoeffs
That is, the first member of "ycoeffs" is being ignored AND SHOULD BE
SPECIFIED AS ONE FOR FUTURE COMPATIBILITY!
filter_comb input, delay*, scaler
Apply a comb filter to the piddle "input". This is implemented using a
delay line of length "delay" (which must be 1 or larger and can be
non-integer) and a feedback scaler.
y(n) = x(n-size-1) + scaler * y(n-size)
cf. "filter_notch" and
http://www.harmony-central.com/Effects/Articles/Reverb/comb.html
filter_notch input, delay*, scaler
Apply a comb filter to the piddle "input". This is implemented using a
delay line of length "delay" (which must be 1 or larger and can be
non-integer) and a feedforward scaler.
y(n) = x(n-size-1) * scaler + y(n-size)
As a rule of thumb, the decay time of the feedback part is
"7*delay/(1-scaler)" samples, so to get a decay of Dur seconds, "scaler
<= 1-7*delay/(Dur*Srate)". The peak gain is "1/(1-(abs scaler))". The
peaks (or valleys in notch's case) are evenly spaced at "srate/delay".
The height (or depth) thereof is determined by scaler -- the closer to
1.0, the more pronounced. See Julius Smith's "An Introduction to Digital
Filter Theory" in Strawn "Digital Audio Signal Processing", or Smith's
"Music Applications of Digital Waveguides"
filter_allpass input, delay*, scaler-feedback, scaler-feedforward
"filter_allpass" or "moving average comb" is just like "filter_comb" but
with an added feedforward term. If "scaler-feedback == 0", we get a
moving average comb filter. If both scaler terms == 0, we get a pure
delay line.
y(n) = feedforward*x(n-1) + x(n-size-1) + feedback*y(n-size)
cf. http://www.harmony-central.com/Effects/Articles/Reverb/allpass.html
design_remez_fir filter_size, bands(2,b), desired_gain(b), type, [weight(b)]
Calculates the optimal (in the Chebyshev/minimax sense) FIR filter
impulse response given a set of band edges, the desired reponse on those
bands, and the weight given to the error in those bands, using the
Parks-McClellan exchange algorithm.
The first argument sets the filter size: "design_remez_fir" returns as
many coefficients as specified by this parameter.
"bands" is a vector of band edge pairs (start - end), which specify the
start and end of the bands in the filter specification. These must be
non-overlapping and sorted in increasing order. Only values between 0 (0
Hz) and 0.5 (the Nyquist frequency) are allowed.
"des" specifies the desired gain in these bands.
"weight" can be used to give each band a different weight. If absent, a
vector of ones is used.
"type" is any of the exported constants "BANDPASS", "DIFFERENTIATOR" or
"HILBERT" and can be used to select various design types (use "BANDPASS"
until this is documented ;)
filter_src input, srate, [width], [sr-mod]
Generic sampling rate conversion, implemented by convoluting "input"
with a sinc function of size "width" (default when unspecified or zero:
5).
"srate" determines the input rate / output rate ratio, i.e. values > 1
speed up, values < 1 slow down. Values < 0 are allowed and reverse the
signal.
If "sr_mod" is omitted, the size of the output piddle is calculcated as
"length(input)/abs(srate)", e.g. it provides the full stretched or
shrinked input signal.
If "sr_mod" is specified it must be as large as the desired output, i.e.
it's size determines the output size. Each value in "sr_mod" is added to
"srate" at the given point in "time", so it can be used to "modulate"
the sampling rate change.
# create a sound effect in the style of "Forbidden Planet"
$osc = 0.3 * gen_oscil $osc, 30 / $pdl->rate;
$output = filter_src($input, 1, 0, $osc);
filter_contrast_enhance input, enhancement
Contrast-enhancement phase-modulates a sound file. It's like audio MSG.
The actual algorithm is (applied to the normalised sound)
"sin(input*pi/2 + (enhancement*sin(input*2*pi)))". The result is to
brighten the sound, helping it cut through a huge mix.
filter_granulate input, expansion, [option-hash], option => value...
"filter_granulate" "granulates" the sound file file. It is the poor
man's way to change the speed at which things happen in a recorded sound
without changing the pitches. It works by slicing the input file into
short pieces, then overlapping these slices to lengthen (or shorten) the
result; this process is sometimes known as granular synthesis, and is
similar to the "freeze" function. The duration of each slice is "length"
-- the longer, the more like reverb the effect. The portion of the
length (on a scale from 0 to 1.0) spent on each ramp (up or down) is
"ramp". This can control the smoothness of the result of the overlaps.
The more-or-less average time between successive segments is "hop". The
accuracy at which we handle this hopping is set by the float "jitter" --
if "jitter" is very small, you may get an annoying tremolo. The overall
amplitude scaler on each segment is "scaler" -- this is used to try to
to avoid overflows as we add all these zillions of segments together.
"expansion" determines the input hop in relation to the output hop; an
expansion-amount of 2.0 should more or less double the length of the
original, whereas an expansion-amount of 1.0 should return something
close to the original speed.
The defaults for the arguments/options are:
expansion 1.0
length(*) 0.15
scaler 0.6
hop(*) 0.05
ramp 0.4
jitter(*) 0.5
maxsize infinity
The parameters/options marked with (*) actually depend on the sampling
rate, and are always multiplied by the "rate" attribute of the piddle
internally. If the piddle lacks that attribute, 44100 is assumed. NOTE:
This is different to most other filters, but should be ok since
"filter_granulate" only makes sense for audiofiles.
audiomix pos1, data1, pos2, data2, ...
Generate a mix of all given piddles. The resulting piddle will contain
the sum of all data-piddles at their respective positions, so some
scaling will be necessary before or after the mixing operation (e.g.
scale2short).
# mix the sound gong1 at position 0, the sound bass5 at position 22100
# and gong2 at position 44100. The resulting piddle will be large enough
# to accomodate all the sounds:
$mix = audiomix 0, $gong1, 44100, $gong2, 22100, $gong2
filter_center piddle
Normalize the piddle so that it is centered around "y = 0" and has
maximal amplitude of 1.
scale2short piddle
This method takes a sound in any format (preferably float or double) and
scales it to fit into a signed short value, suitable for playback using
"playudio" or similar functions.
gen_fft_window size*, type, [$beta]
Creates and returns a specific fft window. The "type" is any of the
following. These are (case-insensitive) strings, so you might need to
quote them.
RECTANGULAR just ones (the identity window)
HANNING 0.50 - 0.50 * cos (0 .. 2pi)
HAMMING 0.54 - 0.46 * cos (0 .. 2pi)
WELCH 1 - (-1 .. 1) ** 2
PARZEN the triangle window
BARTLETT the symmetric triangle window
BLACKMAN2 blackman-harris window of order 2
BLACKMAN3 blackman-harris window of order 3
BLACKMAN4 blackman-harris window of order 4
EXPONENTIAL the exponential window
KAISER the kaiser/bessel window (using the parameter C)
CAUCHY the cauchy window (using the parameter )
POISSON the poisson window (exponential using parameter C)
RIEMANN the riemann window (sinc)
GAUSSIAN the gaussian window of order C)
TUKEY the tukey window (C specifies how much of the window
consists of ones).
COSCOS the cosine-squared window (a partition of unity)
SINC same as RIEMANN
HANN same as HANNING (his name was Hann, not Hanning)
LIST this "type" is special in that it returns a list of all types
cplx(2,n) = rfft real(n)
Do a (complex fft) of "real" (extended to complex so that the imaginary
part is zero), and return the complex fft result. This function tries to
use PDL::FFTW (which is faster for large vectors) when available, and
falls back to PDL::FFT, which is likely to return different phase signs
(due to different kernel functions), so beware! In fact, since "rfft"
has to shuffle the data when using PDL::FFT, the fallback is always
slower.
When using PDL::FFTW, a wisdom file ~/.pdl_wisdom is used and updated,
if possible.
real(n) = irfft cplx(2,n)
The inverse transformation (see "rfft"). "irfft rfft $pdl == $pdl"
always holds.
spectrum data, [norm], [window], [beta]
Returns the spectrum of a given pdl. If "norm" is absent (or "undef"),
it returns the magnitude of the fft of "data". When "norm" == 1 (or "eq
'NORM'", case-insensitive), it returns the magnitude, normalized to be
between zero and one. If "norm" == 0 (or "eq 'dB'", case-insensitive),
then it returns the magnitude in dB.
"data" is multiplied with "window" (if not "undef") before calculating
the fft, and usually contains a window created with "gen_fft_window"
(using "beta"). If "window" is a string, it is handed over to
"gen_fft_window" (together with the beta parameter) to create a window
of suitable size.
This function could be slightly faster.
concat pdl, pdl...
This is not really an audio-related function. It simply takes all
piddles and concats them into a larger one. At the moment it only
supports single-dimensional piddles and is implemented quite slowly
using perl and data-copying, but that might change...
filter_convolve
Signature: (input(n); kernel(m); int fftsize(); [o]output(n))
info not available
rshift
Signature: (x(n); int shift(); c(); [oca]y(n))
Shift vector elements without wrap and fill the free space with a
constant. Flows data back & forth, for values that overlap.
Positive values shift right, negative values shift left.
polynomial
Signature: (coeffs(n); x(m); [o]out(m))
evaluate the polynomial with coefficients "coeffs" at the position(s)
"x". "coeffs[0]" is the constant term.
linear_interpolate
Signature: (x(); fx(n); fy(n); [o]y())
Look up the ordinate "x" in the function given by "fx" and "fy" and
return a linearly interpolated value (somewhat optimized for many
lookups).
"fx" specifies the ordinates (x-coordinates) of the function and most be
sorted in increasing order. "fy" are the y-coordinates of the function
in these points.
bessi0
Signature: (a(); [o]b())
calculate the (approximate) modified bessel function of the first kind
fast_sin
Signature: (r(n); [o]s(n))
fast sine function (inaccurate table lookup with ~12 bits precision)
AUTHOR
Marc Lehmann . The ideas were mostly taken from
common lisp music (CLM), by Bill Schottstaedt "bil@ccrma.stanford.edu".
I also borrowed many explanations (and references) from the clm docs and
some code from clm.c. Highly inspiring!
SEE ALSO
perl(1), PDL.