![]() | ![]() | ||||||||||||
[bug-gsl] Improving gsl_blas_dsyrk routine (2008-04-12)
Dear GSL developers: I benifits a lot recently by using your powerful routines. Thus I became interested in the detail of your routines. When I look through the BLAS source code, I found gsl_blas_dsyrk routine can be significantly improved by a change of looping order. As I'm just a Ph.D. student of Statistical Genetics, I'm not sure my finding is generally true. I just wish to help the lovely GSL! Please let me know the result if possible. The version number of GSL: 1.8 The hardware and operating system: T7100 CPU, Windows XP The compiler used, including version number and compilation options: Dev-C++/MinGW , full optimization A description of the bug behavior: I use a real dataset with 6238 rows by 617 columns of double precision. The gsl_blas_dsyrk routine has 1/5 speed than my routine in some cases. Rows Columns Transpose Mine/sec GSL_BLAS/sec 6238 617 No 26.8 25.2 6238 617 Yes 3.4 15.9 617 6238 No 2.4 2.4 617 6238 Yes 54 256.5 A short program which exercises the bug: My gsl_blas_dsyrk routine analog is listed below. It is a function of my C++ matrix template class. Hope this will not trouble you. template <class type> void matrix<type>::square(matrix& M, bool T){ register uint r, c, k; register type s; if(!T){ resize(M.rn, M.rn); for(r=0; r<M.rn; r++) for(c=r; c<M.rn; c++){ s=type(0); for(k=0; k<M.cn; k++) s+=M(r, k)*M(c, k); (*this)(c, r)=(*this)(r, c)=s; } } else{ resize(M.cn, M.cn); // The follow code changes looping order, just like DGEMM does for(k=0; k<M.rn; k++) for(r=0; r<M.cn; r++){ s=M(k, r); for(c=r; c<M.cn; c++) (*this)(r, c)+=s*M(k, c); } for(r=0; r<M.cn; r++) for(c=r+1; c<M.cn; c++) (*this)(c, r)=(*this)(r, c); } } And your corresponding codes (similar with the Fortan code) are: if (uplo == CblasUpper && trans == CblasNoTrans) { for (i = 0; i < N; i++) { for (j = i; j < N; j++) { BASE temp = 0.0; for (k = 0; k < K; k++) { temp += A[i * lda + k] * A[j * lda + k]; } C[i * ldc + j] += alpha * temp; } } } else if (uplo == CblasUpper && trans == CblasTrans) { // The following code may be improved for (i = 0; i < N; i++) { for (j = i; j < N; j++) { BASE temp = 0.0; for (k = 0; k < K; k++) { temp += A[k * lda + i] * A[k * lda + j]; } C[i * ldc + j] += alpha * temp; } } 王一 2008-04-12
| |||||||||||||
![]() | ![]() |







