Thursday, February 4, 2010

a best k term approximation problem

Here is an interesting little question.

For all dictionaries of M vectors chosen from a N dimensional vector space what is the minimum value of the maximum relative error in the best k term approximation for any vector in the N dimensional space.

The term "dictionary" in this usage comes from signal processing (or at least that is where I got it from). A "dictionary" in this case is just any set of vectors which we can use to express other vectors in terms of. Generally in the case where the vectors are not linearly dependent on one another we call such a collection of vectors a "basis" a dictionary is the same except we remove the restriction that the vectors be linearly independent.

For a particular dictionary of vectors the best k term approximation of a vector X is the vector whose difference from X has the smallest value which can be constructed as a linear combination of k vectors chosen from our dictionary. For instance take the example of the plane and our dictionary is the usual basis {(0,1), (1,0)} then the best 1 term approximation to the vector X = (x, y) is (x, 0) if x > y and (0, y) otherwise. Obviously no other dictionary with only 2 vectors in it will do a better job making 1 term approximations. Also obviously the best 2(or more) term approximation of any vector is just that vector itself since our dictionary vectors span the space of all vectors in the plane.

In fact if you take a moment to reflect you will quickly see the answer to the general question if k >= N is 0. This is clear since if k >= N then M >= k >= N so we must be able to choose a basis of our N dimensional space in which case the best k term approximation of a vector X is just X itself. Slightly more interesting but with just as obvious a solution is the case where M < N. If we have fewer than N vectors to choose from then it doesn't matter which particular vectors we put in our dictionary just so long as they are all linearly independent of one another. Then we can build a M dimensional subspace of the N dimensional one and any M dimensional subspace is "as good" as any other one. Of course actually it is important to choose linearly independent vectors only if we also care about the average relative error of the best k term approximation as well as its maximum value. Since we can't even span the space with our M terms when we consider any vector orthogonal to our M dimensional subspace we get a relative error of 1. (relative error is the number |X-Y|/|X| where X is the vector to be approximated and Y is the approximating vector)

Lets return to the plane and examine the first interesting case N = 2 M = 3 k = 1. A first thought is to chose three vectors such that they form an equilateral triangle. With this choice no vector will ever be more than 30 degrees different than its best approximation by one of the 3 vectors. Since adding in -1 times the three vectors gives 6 equally distributed vectors yielding 360/6 = 60 degrees between vectors meaning we can be at most 30 degrees off from one of them. This gives a maximum relative error of sin(30)=1/2. The observation that we should equally distribute the vectors as much as possible solves the problem for odd M but obviously such solutions do not hold for even numbers since then half of our vectors would be just -1 times other vectors leaving us with a very obviously non-optimal dictionary.

A slight modification of our proceedure remedies this problem. We simply equidistribute the vectors not through the angles 0 to 2*Pi but from 0 to Pi treating the first vector as occupying both the positions at 0 and at Pi. This works for both odd and even numbers of vectors in M and it is not hard to see that this is in fact the optimal solution. Since if we consider the case k>=2 for n=2 we can obviously always exactly match any vector so this is a complete solution of the problem for 2 dimensions.

One might expect for there to be similar unique and hopefully even simple and structured solutions for higher dimensions. On this I cannot pretend to know what the solution to the problem is like but I am certain that the 3 dimensional and higher cases of the problem whatever the solution is it isn't simple. For the 2d case finding the most equidistributed set of M vectors was not hard. There is a unique(up to a rotation) such configuration for every M. But for 3 dimensions and higher this is no longer the case. Though if we have say 6 vectors a good bet for the optimal solution is to put them at the centers of half the faces of a dodecahedron. Very probably the platonic solids each correspond to a provably optimal and unique(again up to rotation) solution/s of the problem for specific numbers of vectors M.

Clearly though these solutions do not provide the general solution. Since the solution of the general case in exact terms would certainly be at best very difficult and at worst practically impossible here we move into the realm of heuristics for a bit. The most obvious first approximation to picking equidistributed vectors in some vector space is simply to choose vectors off the unit n-sphere with a completely uniform probability density.

If M>>N then this is probably a good enough approximation to work and a dictionary of random vectors picked from N dimensional space will give us close to the minimum possible value of the maximum error. Just how much larger than N M would have to be in order for this to be sufficiently good is certainly not clear.

What if we were to choose a first vector at random and then choose subsequent vectors by picking the vector which has the smallest maximum magnitude dot product with a vector already in our dictionary with ties being broken randomly. Clearly this would first result in the building of an orthogonal basis of the space. Once a complete orthogonal basis has resulted we would begin picking vectors from a complimentary basis and so on down the line. Such a construction is tempting until you realize that such a prescription doesn't give the right answer even for the simple 2D case.

On the other hand a probabilistic version of the same algorithm is perhaps worth considering. Consider the same algorithm but now modified so that every time we have chosen M vectors we randomly delete half of them and then again choose deterministically and again randomly remove vectors and continue this process for some large number of steps.

No comments: