This is the mail archive of the gsl-discuss@sources.redhat.com mailing list for the GSL project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: High-dimensional Minimization without analytical derivatives


On Wed, 2004-09-01 at 17:17, Anatoliy Belaygorod wrote:
> Thank you for replying. 
> It is funny that you are telling me this about Bayesian methods though. The fact is that I am as hard-core Bayesian as you can find :). In fact, my advisor is one of the biggest people in the Bayesian field.
Ha Ha. Just laughing because you said it is funny. Perhaps next time
when you ask questions, you should send a CV of yourself, and of your
advisor, .... oops almost forgot also CV of your co-workers. Then people
wont tickle you with answers in the field you all are "hard-core" in or
"Big" in.

I agree with Jason's statement which was:
"Maximizing a likelihood function with 17 parameters and
 80 observations is likely to get you odd answers, e.g., many
 of your estimates will be on the boundary of the parameter space."

> The maximization that I am writing is needed to implement namely M-H algorithm. The simplest form of it is Random Walk MH. However, in order to calculate the variance for the proposal density, I need to mind MLE and evaluate the negative inverse Hessian at MLE to get the variance for my proposal density.
> Another algorithm I will be writing involves something called MH Tailoring (my advisor developed it), which also requires maximization routine within each iteration for each MH-block. In this case though, the dimensionality of the maximization problem will be smaller, depending on Blocking.
> So, my question still stands:
> What is the most efficient way ( I am talking <computational speed>, not necessarily precision) to maximize a high-dimensional (anywhere between 4 and 20 dimensions) function using GSL <without> specifying analytic derivatives? Am I better of using Nelder Mead Simplex algorithm that doesn't require gradient at all, or should I write "my_df" function to include a loop for numerically evaluating gradient one direction-at-a-time (N-dimensional gradient is nothing but a collection of N one-dimensional derivatives, so I should be able to loop from 1 to N and use gsl_diff_central) and use gsl_multimin_fdfminimizer_vector_bfgs?
> My co-author uses Gauss, and he tells me that in Gauss there is a driver that uses BFGS, while calculating numerical derivatives automatically on the background. It seems like I should be able to replicate this driver with GSL (if I do what I just described above), but is there anything that I am missing? I would hate to see Gauss driver to run faster than my C code, because I missed some important point or some trick. Clearly, if I provided analytical derivatives, BFGS would be faster than Nelder Mead Simplex. But if BFGS has access only to numerical derivatives... Which one is better in most cases?
> I think this question should have come up for many people looking to solve a high dimensional min/max problem, because even if in theory analytic derivatives were available, it is simply not feasible to code them all up in, say, 100, or 1000-dimensional problem. So, one either has to loop through numerical derivatives, or live without gradient all together.
> But which one is better?
> 
> ________________________________
> 
> From: Jason Stover [mailto:jstover@sdf.lonestar.org]
> Sent: Wed 9/1/2004 9:42 AM
> To: Anatoliy Belaygorod
> Cc: gsl-discuss@sources.redhat.com
> Subject: Re: High-dimensional Minimization without analytical derivatives
> 
> 
> 
> 
> 
> Maximizing a likelihood function with 17 parameters and
> 80 observations is likely to get you odd answers, e.g., many
> of your estimates will be on the boundary of the parameter space.
> 
> Another possible approach is a Bayesian one: define some reasonable
> prior distributions for your parameters, then run a simulation to find
> the regions where your posterior density function is highest. A common
> simulation algorithm here is the Metropolis-Hastings algorithm, about
> which there is a lot of literature. One popular book is "Markov Chain
> Monte Carlo in Practice" by W. R. Gilks, S. Richardson,
> D. J. Spiegelhalter. Coding such a simulation isn't difficult.
> 
> With MCMC simulations, you can also specify the dimension of the
> parameter vector as a parameter itself. Peter Green wrote a paper
> about this:
> 
> Reversible jump Markov chain Monte Carlo computation and Bayesian
> model determination, Biometrika, 82, 711-732 (1995)
> 
> -Jason
> 
> On Mon, Aug 30, 2004 at 05:11:11PM -0500, Anatoliy Belaygorod wrote:
> > Hello,
> > I need to find a (local) maximum of my likelihood function with 80 datapoints over the 17-dimensional parameter space. I want to use gsl_multimin_fdfminimizer_vector_bfgs, (or some other gradient-based algorithm), but I would really hate to specify 17 (or maybe much more if we change the model) analytic derivatives.
> > Can you please tell me if I have better options? Can I use the one-dimensional numerical derivatives gsl_diff_central instead of analytic ones when I write "my_df" function for BFGS? How would this approach (if it is feasible at all) compare to Nelder Mead Simplex algorithm provided in my version of GSL 1.4? Is there a better option that would involve numerical gradient evaluation coupled with BFGS method?
> > 
> > I really appreciate any help or advice you can give me.
> > Sincerely,
> > Anatoliy
> > 
> >
> 
> --
> jstover@sdf.lonestar.org
> SDF Public Access UNIX System - http://sdf.lonestar.org
> 
> 
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]