Sevilla Speech Signal Analysis

Speech Signal Analysis

Hans D. Sevilla

back to CE150

The speech signal has been studied for various reasons and applications for many years. Some studies broke down the speech signal into its smallest portions, called phonemes. But to describe the speech signal in terms of its general characteristics, the traditional vocoders (voice coders) have classified the input speech signal either as voiced or unvoiced. The voiced speech segment is characterized by a high-energy content and periodicity while the unvoiced part is known by its relative appearance which is more of a random noise with no periodicity. Sometimes, there are parts of speech that are a mixture of the two and they are called transition regions. In the Voice recognition part of our project, it is necessary to remove the noise, which appears like near-zero signals. Removing this near-zero segments of the input speech signal requires MatLab functions such as WavChop and the Butterworth filter, which are discussed later in the report.

In speech coding schemes, the frequency domain representation of the speech signal is necessary. For this purpose, the short-time Fourier transform is very useful. This transform is very important especially in examining features of the speech signal not obvious in the time domain representation. The short-time Fourier transform is defined by:

where w(k - n) is a real window sequence used to isolate the portion of interest or the portion of the input signal that will be analyzed at a particular index, k. In the analysis of speech signals, the shape and length of the window affects the frequency representation of speech. In this case, several windowing techniques have been formulated.

Windowing determines the portion of the speech signal that is to be analyzed by zeroing out the signal outside the region of interest. Windowing techniques include the Rectangular, Bartlett, Hamming, Hanning, Blackman, and Kaiser. The most commonly used are the Rectangular and the Hamming methods because of their extremely opposite features. Their formulae are shown below:

Rectangular:

Hamming:

Due to these equations, the Rectangular window have greater frequency resolution but has very high undesirable frequency leakage. On the other hand, the Hamming window has lesser frequency resolution but has very low leakage. Thus, rectangular windows are not recommended for speech spectral analysis. So in our project, the windowing technique used is the Hamming.

After a speech spectrum is windowed through Hamming, which is also a function in MatLab, the resulting window can now be passed to the fast Fourier transform. After the FFT has been applied on the speech signal, speech signal analysis can now proceed with Linear Predictive Coding or other methods like the Pitch Prediction and the Threshold Method.

back to CE150