Why Does it Work?

Book Search

Download this chapter in PDF format

Chapter26.pdf

1: The Breadth and Depth of DSP
- The Roots of DSP
- Telecommunications
- Audio Processing
- Echo Location
- Image Processing
2: Statistics, Probability and Noise
- Signal and Graph Terminology
- Mean and Standard Deviation
- Signal vs. Underlying Process
- The Histogram, Pmf and Pdf
- The Normal Distribution
- Digital Noise Generation
- Precision and Accuracy
3: ADC and DAC
- Quantization
- The Sampling Theorem
- Digital-to-Analog Conversion
- Analog Filters for Data Conversion
- Selecting The Antialias Filter
- Multirate Data Conversion
- Single Bit Data Conversion
4: DSP Software
- Computer Numbers
- Fixed Point (Integers)
- Floating Point (Real Numbers)
- Number Precision
- Execution Speed: Program Language
- Execution Speed: Hardware
- Execution Speed: Programming Tips
5: Linear Systems
- Signals and Systems
- Requirements for Linearity
- Static Linearity and Sinusoidal Fidelity
- Examples of Linear and Nonlinear Systems
- Special Properties of Linearity
- Superposition: the Foundation of DSP
- Common Decompositions
- Alternatives to Linearity
6: Convolution
- The Delta Function and Impulse Response
- Convolution
- The Input Side Algorithm
- The Output Side Algorithm
- The Sum of Weighted Inputs
7: Properties of Convolution
- Common Impulse Responses
- Mathematical Properties
- Correlation
- Speed
8: The Discrete Fourier Transform
- The Family of Fourier Transform
- Notation and Format of the Real DFT
- The Frequency Domain's Independent Variable
- DFT Basis Functions
- Synthesis, Calculating the Inverse DFT
- Analysis, Calculating the DFT
- Duality
- Polar Notation
- Polar Nuisances
9: Applications of the DFT
- Spectral Analysis of Signals
- Frequency Response of Systems
- Convolution via the Frequency Domain
10: Fourier Transform Properties
- Linearity of the Fourier Transform
- Characteristics of the Phase
- Periodic Nature of the DFT
- Compression and Expansion, Multirate methods
- Multiplying Signals (Amplitude Modulation)
- The Discrete Time Fourier Transform
- Parseval's Relation
11: Fourier Transform Pairs
- Delta Function Pairs
- The Sinc Function
- Other Transform Pairs
- Gibbs Effect
- Harmonics
- Chirp Signals
12: The Fast Fourier Transform
- Real DFT Using the Complex DFT
- How the FFT works
- FFT Programs
- Speed and Precision Comparisons
- Further Speed Increases
13: Continuous Signal Processing
- The Delta Function
- Convolution
- The Fourier Transform
- The Fourier Series
14: Introduction to Digital Filters
- Filter Basics
- How Information is Represented in Signals
- Time Domain Parameters
- Frequency Domain Parameters
- High-Pass, Band-Pass and Band-Reject Filters
- Filter Classification
15: Moving Average Filters
- Implementation by Convolution
- Noise Reduction vs. Step Response
- Frequency Response
- Relatives of the Moving Average Filter
- Recursive Implementation
16: Windowed-Sinc Filters
- Strategy of the Windowed-Sinc
- Designing the Filter
- Examples of Windowed-Sinc Filters
- Pushing it to the Limit
17: Custom Filters
- Arbitrary Frequency Response
- Deconvolution
- Optimal Filters
18: FFT Convolution
- The Overlap-Add Method
- FFT Convolution
- Speed Improvements
19: Recursive Filters
- The Recursive Method
- Single Pole Recursive Filters
- Narrow-band Filters
- Phase Response
- Using Integers
20: Chebyshev Filters
- The Chebyshev and Butterworth Responses
- Designing the Filter
- Step Response Overshoot
- Stability
21: Filter Comparison
- Match #1: Analog vs. Digital Filters
- Match #2: Windowed-Sinc vs. Chebyshev
- Match #3: Moving Average vs. Single Pole
22: Audio Processing
- Human Hearing
- Timbre
- Sound Quality vs. Data Rate
- High Fidelity Audio
- Companding
- Speech Synthesis and Recognition
- Nonlinear Audio Processing
23: Image Formation & Display
- Digital Image Structure
- Cameras and Eyes
- Television Video Signals
- Other Image Acquisition and Display
- Brightness and Contrast Adjustments
- Grayscale Transforms
- Warping
24: Linear Image Processing
- Convolution
- 3x3 Edge Modification
- Convolution by Separability
- Example of a Large PSF: Illumination Flattening
- Fourier Image Analysis
- FFT Convolution
- A Closer Look at Image Convolution
25: Special Imaging Techniques
- Spatial Resolution
- Sample Spacing and Sampling Aperture
- Signal-to-Noise Ratio
- Morphological Image Processing
- Computed Tomography
26: Neural Networks (and more!)
- Target Detection
- Neural Network Architecture
- Why Does it Work?
- Training the Neural Network
- Evaluating the Results
- Recursive Filter Design
27: Data Compression
- Data Compression Strategies
- Run-Length Encoding
- Huffman Encoding
- Delta Encoding
- LZW Compression
- JPEG (Transform Compression)
- MPEG
28: Digital Signal Processors
- How DSPs are Different from Other Microprocessors
- Circular Buffering
- Architecture of the Digital Signal Processor
- Fixed versus Floating Point
- C versus Assembly
- How Fast are DSPs?
- The Digital Signal Processor Market
29: Getting Started with DSPs
- The ADSP-2106x family
- The SHARC EZ-KIT Lite
- Design Example: An FIR Audio Filter
- Analog Measurements on a DSP System
- Another Look at Fixed versus Floating Point
- Advanced Software Tools
30: Complex Numbers
- The Complex Number System
- Polar Notation
- Using Complex Numbers by Substitution
- Complex Representation of Sinusoids
- Complex Representation of Systems
- Electrical Circuit Analysis
31: The Complex Fourier Transform
- The Real DFT
- Mathematical Equivalence
- The Complex DFT
- The Family of Fourier Transforms
- Why the Complex Fourier Transform is Used
32: The Laplace Transform
- The Nature of the s-Domain
- Strategy of the Laplace Transform
- Analysis of Electric Circuits
- The Importance of Poles and Zeros
- Filter Design in the s-Domain
33: The z-Transform
- The Nature of the z-Domain
- Analysis of Recursive Systems
- Cascade and Parallel Stages
- Spectral Inversion
- Gain Changes
- Chebyshev-Butterworth Filter Design
- The Best and Worst of DSP
34: Explaining Benford's Law
- Frank Benford's Discovery
- Homomorphic Processing
- The Ones Scaling Test
- Writing Benford's Law as a Convolution
- Solving in the Frequency Domain
- Solving Mystery #1
- Solving Mystery #2
- More on Following Benford's law
- Analysis of the Log-Normal Distribution
- The Power of Signal Processing

How to order your own hardcover copy

Wouldn't you rather have a bound book instead of 640 loose pages?
Your laser printer will thank you!
Order from Amazon.com.

Chapter 26 - Neural Networks (and more!) / Why Does it Work?

Chapter 26: Neural Networks (and more!)

Why Does it Work?

The weights required to make a neural network carry out a particular task are found by a learning algorithm, together with examples of how the system should operate. For instance, the examples in the sonar problem would be a database of several hundred (or more) of the 1000 sample segments. Some of the example segments would correspond to submarines, others to whales, others to random noise, etc. The learning algorithm uses these examples to calculate a set of weights appropriate for the task at hand. The term learning is widely used in the neural network field to describe this process; however, a better description might be: determining an optimized set of weights based on the statistics of the examples. Regardless of what the method is called, the resulting weights are virtually impossible for humans to understand. Patterns may be observable in some rare cases, but generally they appear to be random numbers. A neural network using these weights can be observed to have the proper input/output relationship, but why these particular weights work is quite baffling. This mystic quality of neural networks has caused many scientists and engineers to shy away from them. Remember all those science fiction movies of renegade computers taking over the earth?

In spite of this, it is common to hear neural network advocates make statements such as: "neural networks are well understood." To explore this claim, we will first show that it is possible to pick neural network weights through traditional DSP methods. Next, we will demonstrate that the learning algorithms provide better solutions than the traditional techniques. While this doesn't explain why a particular set of weights works, it does provide confidence in the method.

In the most sophisticated view, the neural network is a method of labeling the various regions in parameter space. For example, consider the sonar system neural network with 1000 inputs and a single output. With proper weight selection, the output will be near one if the input signal is an echo from a submarine, and near zero if the input is only noise. This forms a parameter hyperspace of 1000 dimensions. The neural network is a method of assigning a value to each location in this hyperspace. That is, the 1000 input values define a location in the hyperspace, while the output of the neural network provides the value at that location. A look-up table could perform this task perfectly, having an output value stored for each possible input address. The difference is that the neural network calculates the value at each location (address), rather than the impossibly large task of storing each value. In fact, neural network architectures are often evaluated by how well they separate the hyperspace for a given number of weights.

This approach also provides a clue to the number of nodes required in the hidden layer. A parameter space of N dimensions requires N numbers to specify a location. Identifying a region in the hyperspace requires 2N values (i.e., a minimum and maximum value along each axis defines a hyperspace rectangular solid). For instance, these simple calculations would indicate that a neural network with 1000 inputs needs 2000 weights to identify one region of the hyperspace from another. In a fully interconnected network, this would require two hidden nodes. The number of regions needed depends on the particular problem, but can be expected to be far less than the number of dimensions in the parameter space. While this is only a crude approximation, it generally explains why most neural networks can operate with a hidden layer of 2% to 30% the size of the input layer.

A completely different way of understanding neural networks uses the DSP concept of correlation. As discussed in Chapter 7, correlation is the optimal way of detecting if a known pattern is contained within a signal. It is carried out by multiplying the signal with the pattern being looked for, and adding the products. The higher the sum, the more the signal resembles the pattern. Now, examine Fig. 26-5 and think of each hidden node as looking for a specific pattern in the input data. That is, each of the hidden nodes correlates the input data with the set of weights associated with that hidden node. If the pattern is present, the sum passed to the sigmoid will be large, otherwise it will be small.

The action of the sigmoid is quite interesting in this viewpoint. Look back at Fig. 26-1d and notice that the probability curve separating two bell shaped distributions resembles a sigmoid. If we were manually designing a neural network, we could make the output of each hidden node be the fractional probability that a specific pattern is present in the input data. The output layer repeats this operation, making the entire three-layer structure a correlation of correlations, a network that looks for patterns of patterns.

Conventional DSP is based on two techniques, convolution and Fourier analysis. It is reassuring that neural networks can carry out both these operations, plus much more. Imagine an N sample signal being filtered to produce another N sample signal. According to the output side view of convolution, each sample in the output signal is a weighted sum of samples from the input. Now, imagine a two-layer neural network with N nodes in each layer. The value produced by each output layer node is also a weighted sum of the input values. If each output layer node uses the same weights as all the other output nodes, the network will implement linear convolution. Likewise, the DFT can be calculated with a two layer neural network with N nodes in each layer. Each output layer node finds the amplitude of one frequency component. This is done by making the weights of each output layer node the same as the sinusoid being looked for. The resulting network correlates the input signal with each of the basis function sinusoids, thus calculating the DFT. Of course, a two-layer neural network is much less powerful than the standard three layer architecture. This means neural networks can carry out nonlinear as well as linear processing.

Suppose that one of these conventional DSP strategies is used to design the weights of a neural network. Can it be claimed that the network is optimal? Traditional DSP algorithms are usually based on assumptions about the characteristics of the input signal. For instance, Wiener filtering is optimal for maximizing the signal-to-noise ratio assuming the signal and noise spectra are both known; correlation is optimal for detecting targets assuming the noise is white; deconvolution counteracts an undesired convolution assuming the deconvolution kernel is the inverse of the original convolution kernel, etc. The problem is, scientist and engineer's seldom have a perfect knowledge of the input signals that will be encountered. While the underlying mathematics may be elegant, the overall performance is limited by how well the data are understood.

For instance, imagine testing a traditional DSP algorithm with actual input signals. Next, repeat the test with the algorithm changed slightly, say, by increasing one of the parameters by one percent. If the second test result is better than the first, the original algorithm is not optimized for the task at hand. Nearly all conventional DSP algorithms can be significantly improved by a trial-and-error evaluation of small changes to the algorithm's parameters and procedures. This is the strategy of the neural network.

Next Section: Training the Neural Network

The Scientist and Engineer's Guide to
Digital Signal Processing
By Steven W. Smith, Ph.D.

Book Search

Download this chapter in PDF format

Table of contents

How to order your own hardcover copy

Chapter 26: Neural Networks (and more!)

The Scientist and Engineer's Guide toDigital Signal ProcessingBy Steven W. Smith, Ph.D.

Book Search

Download this chapter in PDF format

Table of contents

How to order your own hardcover copy

Chapter 26: Neural Networks (and more!)

The Scientist and Engineer's Guide to
Digital Signal Processing
By Steven W. Smith, Ph.D.