This is the mail archive of the gsl-discuss@sourceware.cygnus.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: multidimensional optimization


Mark Galassi wrote:
>> I would like to know if there are any design documents
>> about multidimensional optimization implementation in GSL
>> and if the mailing list might be a correct place to
>> discuss about this design.
> 
> I don't think there have been any specific discussions yet, so the
> only constraint would be to make the API fit smoothly with the rest of
> the library.
> 
> Please feel free to post ideas, first abstract and then concrete, to
> this list!

I guess that the solution used by onedimensional optimization for representing
functions can be easily extended to multidimensional functions. 

We have (in the CVS):
struct gsl_function_struct 
{
  double (* function) (double x, void * params);
  void * params;
};

So, we need at least something like:
struct gsl_vec_function_struct 
{
  double (* function) (const gsl_vector * x, void * params);
  void * params;
};
Maybe we can avoid using gsl_vector, but then, what is the point in having
vectors?

But for neural networks, we should focus on optimization techniques that use
the gradient, so we need another function in the struct, for instance:
void (* gradient) (const gsl_vector * x,gsl_vector *gradient, void * params);

Having gsl_vector* and not gsl_vector** is deliberate. For many functions,
asking the vector to be preallocated is a good idea, because reallocating the
vector at each iteration is a waste of time, and is not regligeable compared
to the time needed to compute the gradient itself.

The annoying thing with many neural network models is that computing together
the gradient of the error made by the network and this error (for the same
evaluation point) is far more efficient than computing these values
independantly: for NN, computing the gradient implies computing the error.
Moreover, computing the gradient when we know the intermediate values needed
to compute the error might even be faster than computing the error (for small
neural networks). So I think we need to add to the struct the following
function:
double (*gradient_function) (const gsl_vector * x,gsl_vector *gradient, void *
params);

For gradient based optimization, we will need to make onedimensional
optimization, which is very easy thanks to the params member of
gsl_function_struct. I think therefore that implementing toy version of a
simple conjugate gradient can be quite easy.

Is there something wrong in this design?

Fabrice Rossi

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]