Digital Signal Processing

By Steven W. Smith, Ph.D.

- 1: The Breadth and Depth of DSP
- 2: Statistics, Probability and Noise
- 3: ADC and DAC
- 4: DSP Software
- 5: Linear Systems
- 6: Convolution
- 7: Properties of Convolution
- 8: The Discrete Fourier Transform
- 9: Applications of the DFT
- 10: Fourier Transform Properties
- 11: Fourier Transform Pairs
- 12: The Fast Fourier Transform
- 13: Continuous Signal Processing
- 14: Introduction to Digital Filters
- 15: Moving Average Filters
- 16: Windowed-Sinc Filters
- 17: Custom Filters
- 18: FFT Convolution
- 19: Recursive Filters
- 20: Chebyshev Filters
- 21: Filter Comparison
- 22: Audio Processing
- 23: Image Formation & Display
- 24: Linear Image Processing
- 25: Special Imaging Techniques
- 26: Neural Networks (and more!)
- 27: Data Compression
- 28: Digital Signal Processors
- 29: Getting Started with DSPs
- 30: Complex Numbers
- 31: The Complex Fourier Transform
- 32: The Laplace Transform
- 33: The z-Transform
- 34: Explaining Benford's Law

Your laser printer will thank you!

Why Does it Work?

The weights required to make a neural network carry out a particular task are
found by a learning algorithm, together with examples of how the system
*should* operate. For instance, the examples in the sonar problem would be a
database of several hundred (or more) of the 1000 sample segments. Some of
the example segments would correspond to submarines, others to whales, others
to random noise, etc. The learning algorithm uses these examples to calculate
a set of weights appropriate for the task at hand. The term learning is widely
used in the neural network field to describe this process; however, a better
description might be: *determining an optimized set of weights based on the
statistics of the examples. * Regardless of what the method is called, the resulting
weights are virtually impossible for humans to understand. Patterns may be
observable in some rare cases, but generally they appear to be random numbers.
A neural network using these weights can be observed to have the proper
input/output relationship, but *why* these particular weights work is quite
baffling. This mystic quality of neural networks has caused many scientists and
engineers to shy away from them. Remember all those science fiction movies
of renegade computers taking over the earth?

In spite of this, it is common to hear neural network advocates make statements
such as: "neural networks are well understood." To explore this claim, we will
first show that it is possible to pick neural network weights through traditional
DSP methods. Next, we will demonstrate that the learning algorithms provide
*better* solutions than the traditional techniques. While this doesn't explain *why*
a particular set of weights works, it does provide confidence in the method.

In the most sophisticated view, the neural network is a method of labeling the
various regions in *parameter space*. For example, consider the sonar system
neural network with 1000 inputs and a single output. With proper weight
selection, the output will be near *one* if the input signal is an echo from a
submarine, and near *zero* if the input is only noise. This forms a parameter
hyperspace of 1000 dimensions. The neural network is a method of assigning
a value to each location in this hyperspace. That is, the 1000 input values
define a *location* in the hyperspace, while the output of the neural network
provides the *value* at that location. A look-up table could perform this task
perfectly, having an output value stored for each possible input address. The
difference is that the neural network *calculates* the value at each location
(address), rather than the impossibly large task of *storing* each value. In fact,
neural network architectures are often evaluated by how well they separate the
hyperspace for a given number of weights.

This approach also provides a clue to the number of nodes required in the
hidden layer. A parameter space of *N* dimensions requires *N* numbers to specify
a location. Identifying a *region* in the hyperspace requires 2*N* values (i.e., a
minimum and maximum value along each axis defines a hyperspace rectangular
solid). For instance, these simple calculations would indicate that a neural
network with 1000 inputs needs 2000 weights to identify one region of the
hyperspace from another. In a fully interconnected network, this would require
two hidden nodes. The number of regions needed depends on the particular
problem, but can be expected to be far less than the number of dimensions in the
parameter space. While this is only a crude approximation, it generally explains
why most neural networks can operate with a hidden layer of 2% to 30% the
size of the input layer.

A completely different way of understanding neural networks uses the DSP
concept of *correlation*. As discussed in Chapter 7, correlation is the optimal
way of detecting if a known pattern is contained within a signal. It is carried out
by multiplying the signal with the pattern being looked for, and adding the
products. The higher the sum, the more the signal resembles the pattern. Now,
examine Fig. 26-5 and think of each hidden node as looking for a specific
pattern in the input data. That is, each of the hidden nodes *correlates* the input
data with the set of weights associated with that hidden node. If the pattern is
present, the sum passed to the sigmoid will be large, otherwise it will be small.

The action of the sigmoid is quite interesting in this viewpoint. Look back at
Fig. 26-1d and notice that the probability curve separating two bell shaped
distributions resembles a sigmoid. If we were manually designing a neural
network, we could make the output of each hidden node be the *fractional
probability* that a specific pattern is present in the input data. The output layer
repeats this operation, making the entire three-layer structure a correlation of
correlations, a network that looks for *patterns of patterns*.

Conventional DSP is based on two techniques, *convolution* and *Fourier
analysis*. It is reassuring that neural networks can carry out both these
operations, plus *much more*. Imagine an *N* sample signal being filtered to
produce another *N* sample signal. According to the output side view of
convolution, each sample in the output signal is a weighted sum of samples
from the input. Now, imagine a two-layer neural network with *N* nodes in each
layer. The value produced by each output layer node is also a weighted sum of
the input values. If each output layer node uses the same weights as all the
other output nodes, the network will implement linear convolution. Likewise,
the DFT can be calculated with a two layer neural network with *N* nodes in each
layer. Each output layer node finds the amplitude of one frequency component.
This is done by making the weights of each output layer node the same as the
sinusoid being looked for. The resulting network correlates the input signal
with each of the basis function sinusoids, thus calculating the DFT. Of course,
a two-layer neural network is much less powerful than the standard three layer
architecture. This means neural networks can carry out *nonlinear* as well as
*linear* processing.

Suppose that one of these conventional DSP strategies is used to design the
weights of a neural network. Can it be claimed that the network is *optimal*?
Traditional DSP algorithms are usually based on assumptions about the
characteristics of the input signal. For instance, Wiener filtering is optimal for
maximizing the signal-to-noise ratio *assuming* the signal and noise spectra are
both known;* *correlation is optimal for detecting targets *assuming* the noise is
white; deconvolution counteracts an undesired convolution *assuming* the
deconvolution kernel is the inverse of the original convolution kernel, etc. The
problem is, scientist and engineer's seldom have a perfect knowledge of the
input signals that will be encountered. While the underlying mathematics may
be elegant, the overall performance is limited by how well the data are
understood.

For instance, imagine testing a traditional DSP algorithm with actual input signals. Next, repeat the test with the algorithm changed slightly, say, by increasing one of the parameters by one percent. If the second test result is better than the first, the original algorithm is not optimized for the task at hand. Nearly all conventional DSP algorithms can be significantly improved by a trial-and-error evaluation of small changes to the algorithm's parameters and procedures. This is the strategy of the neural network.