![]() | ![]() | ||||||||||||
[bug-gsl] bugs in gsl_cdf_hypergeometric_P/gsl_cdf_hypergeometric_Q (2008-02-11)
Hi, I believe there is a bug in the functions gsl_cdf_hypergeometric_P and gsl_cdf_hypergeometric_Q in gsl-1.10 when computing the midpoint. This can result in severely wrong probabilities (as a result of computing the non-optimal tail), including negative probabilities. A demonstration program is included below. Explanation: The relevant statement is double midpoint = (int) (t * n1 / (n1 + n2)); The integer multiplication t * n1 may overflow. With certain input parameters it does; this causes the wrong tail to be computed, leading to accumulation of rounding errors, and erroneous results including negative probabilities. Running the example program yields midpoint should be: 3899.52, but it is: 9 midpoint should be: 3899.52, but it is: 9 hyper [ > 3511 1.000000006 ] [ <= 3511 -6.46390208e-09 ] The midpoint is then compared with the variable k which has value 3511, leading to a choice of the wrong tail to compute. (the first two lines are from a patched version of cdf/hypergeometric.c that prints out the midpoint). I believe that the correct computation of midpoint should be: double midpoint = (1.0 * t * n1) / (n1 + n2); A (trivial) diff to this effect is attached. On a side note, is there any knowledge/documentation available on the range of parameters for which these functions are supposed to give good/reasonable precision? regards, Stijn Example program (run on a system where integers are 4 bytes): ---------------------------------------- #include <stdio.h> #include <stdlib.h> #include <math.h> #include <gsl/gsl_cdf.h> int main ( int argc , char* argv[] ) { int k = 3511 ; int n1 = 76200 ; int bg = 13250474 ; int t = 678090 ; double f = gsl_cdf_hypergeometric_Q(k, n1, bg-n1, t) ; double g = gsl_cdf_hypergeometric_P(k, n1, bg-n1, t) ; fprintf(stdout, "hyper [ > %d %.10g ] [ <= %d %.10g ]\n", k, f, k, g) ; return 0 ; } ---------------------------------------- -- Stijn van Dongen >8< -o) O< forename pronunciation: [Stan] Wellcome Trust Sanger Institute, /\\ Tel: +44-(0)1223-494785 Hinxton, Cambridge, CB10 1SA, UK _\_/ http://www.sanger.ac.uk/Users/svd/ -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 131c131 < double midpoint = (int) (t * n1 / (n1 + n2)); --- > double midpoint = (1.0 * t * n1) / (n1 + n2); 148a149 > 170c171 < double midpoint = (int) (t * n1 / (n1 + n2)); --- > double midpoint = (1.0 * t * n1) / (n1 + n2);
| |||||||||||||
![]() | ![]() |







