A single level complete forward neural network with m input nodes and n output nodes can express any linear transformation with an m by n matrix representation. Clearly if the activation function of the neurons is linear on the inputs then no matter how many layers of a feed forward network we had the result would always simply be a linear transformation of the inputs. With a non linear activation function though we can get any function of the inputs at all not just linear transformations. The most common activation functions are the hard threshold function which is 0 below some threshold and 1 above it and the logistic function which is close to 0 for low values and close to 1 for higher values and quickly rises from one to the other in the region of some threshold value.
Now consider for a moment that the wavelet transform is a linear transform and so can be represented as a neural network. Also remember that performing hard or soft thresholding on the output coefficients is simultaneously a means of compression and noise reduction. In fact I believe there is some evidence to believe that stuff very much like this is actually what happens during our brains processing of visual data.
The question I want to ask is does it make sense to think about multi layer neural networks in terms of a redundant dictionary of functions and can we use that to design network update methods. Or does the reverse hold true and could we use neural network update techniques to tell us something about the best m-term approximation in redundant dictionaries problem in wavelets.