This is the mail archive of the gsl-discuss@sourceware.org mailing list for the GSL project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Possible inconsistency in gsl_multifit_covar

From: Giulio Bottazzi <giulio dot bottazzi at libero dot it>
To: Brian Gough <bjg at network-theory dot co dot uk>
Cc: gsl-discuss at sources dot redhat dot com
Date: Tue, 23 Aug 2005 19:25:10 +0200
Subject: Re: Possible inconsistency in gsl_multifit_covar
References: <20050821010321.56bc297e.giulio.bottazzi@libero.it><17163.13001.143071.36081@network-theory.co.uk>

On Tue, 23 Aug 2005 15:29:29 +0100
Brian Gough <bjg@network-theory.co.uk> wrote:

> Hello,
> 
> The multifit functions are a translation of the corresponding MINPACK
> routines, which defined 'covar' as in GSL.  In MINPACK the definition
> of J includes a factor of sigma 
> 
>   J_ij = dF_i/dx_j
>   F_i = (y_i-y(x))/sigma_i
> 
> Does this account for the difference above?
> 
> -- 

Hello,
as far as I understand, given that I do not know MINPACK, I would say
yes and no. The point is subtle and has to do with the double naure of
the least-quare procedure: a generic tools for "fitting things" and a
maximum likelihood estimator of a well defined class of statistical
models. I hope you'll forgive me if I try to elaborate the issue a
little bit below.

>From a statistical point of view, you can assume that your data follow
the model

(I)   y_i ~ f(x_i,B) + \w_i * \epsilon_i

where \epsilon_i ~ N(0,1) (i.i.d. normally distributed with mean zero
and variance 1) or instead the model

(II)  y_i ~ f(x_i,B) + \w_i * \epsilon_i

where \epsilon_i ~ N(0,\sigma) (i.i.d. normally distributed with mean
zero and unknon variance). Both prescriptions allow for the
specification of weights w_i, but differ on the number of the unknowns.
In particular, the variance of the distribution \sigma is estimated from
data in Model II while it is assumed known in Model I.

Now in GSL the model assumption for the different linear routines
(f(x_i,B) = x_i . B) are:

function:            assumption:

gsl_fit_linear         Model II

gsl_fit_wlinear        Model I

gsl_multifit_linear    Model II

gsl_multifit_wlinear   Model I

which is consistent (even if, AFAIK, undocumented): if weights are
provided, than the variance is assumed know, if they are not, it is
estimated.

The possible inconsistency is with the non-linear function

gsl_multifit_covar   Model I

now weights are NOT explicitly required, but nonetheless the variance is
assumed known. For comparison, consider the analogous function in NAG C
library, "e04ycc", which instead assume Model II.

Probably it would help to explicitly mention the formula used to compute
variance-covariance matrix in ALL the routines, so that the average
dumb user (like me), by comparing the different formulas, can immediatly
understand were differences can possibly arise. What do you think?

Apologies for the lenght of the message.

	Giulio.

Attachment: pgp00000.pgp
Description: PGP signature

References:
- Possible inconsistency in gsl_multifit_covar
  - From: Giulio Bottazzi
- Re: Possible inconsistency in gsl_multifit_covar
  - From: Brian Gough

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]