LSOLP 3 "VERSION 1.0" "January 2002"

PACKAGE LSOLP
PURPOSE
PARAMETERS
EXAMPLE
METHOD
REFERENCES
PROGRAMMERS / CONSULTING

PACKAGE LSOLP

LSOLP - Solves a linear system of equations MAT*x = b

call LSOLP(lmat,lindex,l,nproc,lsym,ia1,info,

MAT,x,b,index,rlin,ilin,lmatbk,tids,iconv,ierr)

integer, parameter :: ia2=7, nilin=50, nrlin=10
integer, intent(in) :: ia1,lmat,lindex,l,nproc
integer, intent(out) :: iconv,ierr
integer, intent(in) :: lmatbk(nproc),tids(nproc)
integer, intent(inout) :: index(lindex),info(ia1,ia2)
integer, intent(inout) :: ilin(nilin)
double precision, intent(inout) :: MAT(lmat),x(l)
double precision, intent(inout) :: b(l),rlin(nrlin)
logical, intent(in) :: lsym

PURPOSE

LSOLP is a routine to solve a linear system MAT*x = b with a large and sparse l x l real matrix MAT and a right hand side b of length l. Different solution methods of the generalized CG-method are used, which can be selected by the user. To accelerate the iteration and to get a better stability, the matrix MAT is normalized by a diagonal matrix and/or preconditioned from the left hand side and/or the right hand side. MAT is handed over in a sparse matrix storage scheme described in the manual pages <matrix>.

PARAMETERS

If a parameter has the attribute 'local', the parameter values can vary on different processors. If a parameter has the attribute 'global', it must have the same value on all processors. The label *PAR means that the parameter only has to be declared, but does not have to be set (initialized) on NON-PARALLEL computers. The label **PAR means that the parameter only has to be declared, but does not have to be set, if nproc is set (initialized) to 1 (on NON-PARALLEL computers nproc is internally (in LSOLP) set to 1).

lmat integer, scalar, input, local

Length of (local) matrix array MAT. IMPORTANT: if (optim >= 10000) then lmat must be enlarged by a factor > 1.

lindex integer, scalar, input, local

Length of the integer array index. IMPORTANT: if (optim >= 10000) then lindex must have the same size as lmat.

l integer, scalar, input, global

Maximal number of unknowns on a processor (it must hold: l = max{lmatbk(i),i=1,..,nproc}).

nproc integer, scalar, input, global <<< *PAR

Number of processors (processes).

lsym logical, scalar, input, global

is .true., if the matrix MAT is symmetric.

ia1 integer, scalar, input, local

First dimension of matrix information array info. IMPORTANT: if (optim >= 10000) then ia1 must be set >= l.

info integer, array: info(ia1,ia2), input, local

Information array for the matrix MAT (ia2 {= 7} is a constant).

MAT double precision, array: MAT(lmat), in/out, local

Matrix array.

x double precision, array: x(l), in/out, local

Solution vector.

b double precision, array: b(l), in/out, local

Right hand side.

index integer, array: index(lindex), input, local

Array of indices allowing indirect access to the matrix array MAT.

rlin double precision, array: rlin(nrlin), input, global

Information vector for double precision values (nrlin {=10} is a constant).
(1) = epslin, input, global
epslin is the relative stopping factor for all iterative methods (corresponds to THRESHOLD in manual page LINSOL).
     lcond1=	   (||(preconditioned) smoothed	residuum||2
		   less	or equal than
		   epslin * ||(preconditioned) r.h.s.||2);
     noprec=1:
     lcond2=	   (||original smoothed	residuum||max
		   less	or equal than
		   epslin * ||original r.h.s.||max)
     noprec=0:
     lcond2=	   (||(preconditioned) smoothed	residuum||max
		   less	or equal than
		   epslin * ||(preconitioned) r.h.s.||max)
If (lcond1 .and. lcond2) then the iteration is stopped. If (lcond1 .and. .not. lcond2) then a new stopping criterion is computed.
(2) = threshold1, input, global
threshold1 is the first threshold value within the overall Gauss algorithm; below this value all data are removed when the matrix MAT is being copied into the preconditioning matrix (corresponds to GAUSS_THRES1 in manual page LINSOL).
(3) = threshold2, input, global
threshold2 is the second threshold value within the overall Gauss algorithm; below this value all data are removed when the buffered matrix is stored back during the factorization process (corresponds to GAUSS_THRES2 in manual page LINSOL). If threshold2 = 1.0 is set, data are only stored back to the factorized matrix, if they are also elements of the original preconditioning matrix (pattern storing), i.e. the memory request of the preconditioned matrix is as high as the memory request of the original preconditioning matrix. If fImsprec = {63,66,69} is set, full LU factorization is performed in the buffer: >>> if fImsprec = {72,75,78} is set, the factorization is only performed on the pattern of the original preconditioning matrix <<< IMPLEMENTED ONLY FOR 1 PROCESSOR!
(4) = gaussfac, input, global
gaussfac specifies the fill-in factor for the preconditioning by GAUSS. A positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable gaussfac is used for the allocation. If gaussfac is less than zero, then ABS(gaussfac) is immediately used for the allocation of the preconditioning arrays. Exemplarily a value of -2.0 means that the preconditioning matrix can be enlarged by factor 2.0 by fill in (corresponds to GAUSS_FACTOR in manual page LINSOL).
(5) = memfac, input, global
memfac specifies the threshold value for packing of full storage patterns. A value of 0.66 means that all vector terms of type {2,7,8} are packed and stored as vector terms of type {3,9,11}, if less than 66 % of the values of the vector terms are nonzero (corresponds to MEMORY_FACTOR in manual page LINSOL) <<< NOT YET IMPLEMENTED!!!.
(6) = ldrop, input, global
ldrop is a threshold value that reduces the factorization time by omitting update terms under the following conditions. Only indices of the elimination coefficients with corresponding absolute elements greater than ldrop are used to update the factorization matrix (this holds for msprec = {72,75,78}); only rows of the actual factorization window with elements of and below the pivot column greater than ldrop\f are updated (this holds for msprec = {63,66,69,72,75,78}). ldrop corresponds to LDROP_FACTOR in manual page LINSOL.
(9) = internal use, global
Global size of preconditioning matrix L, if nproc>1.
(10) = internal use, global
Global size of preconditioning matrix U, if nproc>1.

ilin integer, array: ilin(nilin), in/out, global

Information vector (nilin {=50} is a constant).
(1) = ms, input, global
Method selection
     ms	= 1	:    PRES20
     ms	= 2	:    BICO
     ms	= 3	:    Bi-CGSTAB
     ms	= 4	:    AT-PRES
     ms	= 5	:    CGS
     ms	= 6	:    QMR-Simulator
     ms	= 7	:    Bi-CGSTAB2
     ms	= 9	:    CG-AT for non-symmetric matrices
     ms	=15	:    GMERR5
     ms	=16	:    GMERR20
     ms	=100	:    Classical CG for symmetric	matrices.
     ms	=101	:    GMERRS for	symmetric matrices.
     ms	=200	:    Polyalgorithm PRES20 -> Bi-CGSTAB2	-> AT-PRES
     ms	=201	:    Polyalg.  PRES20 <-> Bi-CGSTAB2 <-> AT-PRES
     ms	=202	:    Polyalgorithm
		     PRES20 -> Bi-CGSTAB2 -> GMERR5 -> AT-PRES
     ms	=203	:    Polyalgorithm
		     PRES20 -> Bi-CGSTAB2 -> GMERR5 <->	AT-PRES
     ms	=204	:    Polyalgorithm
		     PRES20 -> Bi-CGSTAB2 -> GMERR20 ->	AT-PRES
     ms	=210	:    Polyalgorithm GMERR5 -> AT-PRES
     ms	=211	:    Polyalgorithm GMERR5 <-> AT-PRES
     ms	=212	:    Polyalgorithm GMERR20 -> AT-PRES
     ms	=220	:    Polyalgorithm GMERRS -> CG
		     for symmetric matrices.
     ms	=221	:    Polyalgorithm GMERRS <-> CG
		     for symmetric matrices.
     ms	=999	:    {TEST-INTERFACE}
(2) = itmax, input, global
Maximal number of iteration steps (corresponds to ITERATION_MAXIMUM in manual page LINSOL).
(3) = nmsg, in/out, global <<< *PAR
Message counter counting the number of communication units.
(4) = nrhs, input, global
Number of right hand sides. A value of -1 means that the the right hand side is computed automatically (corresponds to NUMBER_RHS in manual page LINSOL).
(5) = nvt, input, local
Number of used vector terms.
(6) = isprec, in/out, global
indicates, if the matrix MAT is already normalized and/or preconditioned and/or reduced by the pre-forward elimination routine (corresponds to MATRIX_NORMALIZED in manual page LINSOL). The variable is automatically increased, if the matrix has been normalized/preconditioned/reduced in a previous call of LSOLP.
     isprec =  000     :    matrix is not normalized
			    and	not preconditioned,
     isprec =  001     :    matrix is normalized
			    and	not preconditioned,
     isprec =  010     :    matrix is not normalized
			    and	preconditioned,
     isprec =  011     :    matrix is normalized
			    and	preconditioned,
     isprec =  1yz     :    matrix is additionally reduced by
			    a pre-forward elimination step.
     isprec = 1xyz     :    matrix is additionally bandwidth-
			    optimized (x,y,z elem {0,1}).
     isprec = 2000     :    use	of the permutation vector
			    created in a former	bandwidth
			    optimization process; can be
			    used if the	structure of the
			    matrix does	not change!
(7) = ibuf_mb, input, global
ibuf_mb specifies the buffer size in MB for temporary buffers (corresponds to BUFFER_SIZE in manual page LINSOL).
(8) = msprec, input, global
selects the method of preconditioning (corresponds to PRECONDITIONING_METHOD in manual page LINSOL); works in connection with the iterative methods PRES20, BICO and Bi-CGSTAB2, if the matrix is not symmetric; works in connection with the classical CG method, if the matrix is symmetric (the Cholesky algorithm works up-to-now only on 1 processor). (The preconditioning matrices are stored as one-dimensional arrays in prec and accessed via information array iprec).
     msprec = 0	    :	 no preconditioning,
     msprec = 63    :	 preconditioning from the middle by
			 (I)LU algorithm
			 [(Incomplete) Gauss algorithm)]
			 or
			 (I)L[T]L
			 [(Incomplete) Cholesky	algorithm)],
			 this option implies optim = 111000,
     msprec = 66    :	 preconditioning from the right	by
			 (I)LU algorithm
			 [(Incomplete) Gauss algorithm)]
			 or
			 (I)L[T]L
			 [(Incomplete) Cholesky	algorithm)],
			 this option implies optim = 111000,
     msprec = 69    :	 preconditioning from the left by
			 (I)LU algorithm
			 [(Incomplete) Gauss algorithm)]
			 or
			 (I)L[T]L
			 [(Incomplete) Cholesky	algorithm)],
			 this option implies optim = 111000,
     msprec = 72    :	 preconditioning from the middle by
			 (I)LU algorithm on indexed data,
			 this option implies optim = 111000,
     msprec = 75    :	 preconditioning from the right	by
			 (I)LU algorithm on indexed data,
			 this option implies optim = 111000,
     msprec = 78    :	 preconditioning from the left by
			 (I)LU algorithm on indexed data,
			 this option implies optim = 111000.
(9) = noprec, input, global
     noprec = 0	   :	normalized system is considered
			for checking of	stopping criterion,
     noprec = 1	   :	original system	is considered
			for checking of	stopping criterion.
(noprec corresponds to CHECK_SYSTEM in manual page LINSOL.)
(10) = nmvm, output, global
Number of computed matrix-vector multiplications.
(11) = internal use, output, global
returns the number of steps used by the last called method.
(12) = lout, input, local
Unit number of the standard output file (default is 6)(corresponds to OUTPUT_DEVICE in manual page LINSOL).
(13) = idoku, input, global
Output control (corresponds to VERBOSE_LEVEL in manual page LINSOL).
     idoku = 0	     :	  print	only error messages,
     idoku = 1	     :	  print	further	informations, but no
			  output during	iteration,
     idoku = i>1     :	  output always	after i	iteration-
			  steps	on the processor with
			  logical process(or) number 1,
     idoku = -i<0    :	  it holds the same as for idoku>0
			  with the difference of output
			  on all processors.
(14) = isave, input, global
     isave = 0	  :    arrays prec, iprec and perm are
		       not saved beyond	the routine
		       LSOLP, i.e. preconditioning
		       matrix and preconditioning index
		       and information arrays and
		       permutation vector created by
		       bandwidth optimization and
		       eventually matrix entries that
		       are shifted to the right	hand
		       side by the pre-forward
		       elimination routine will	not be
		       available in the	next call of
		       LSOLP,
     isave = 1	  :    all above mentioned arrays (prec
		       and iprec and perm) are saved
		       beyond the routine LSOLP,
     isave = 2	  :    only the	permutation vector perm
		       is saved	beyond the routine
		       LSOLP; you can set ilin(6)=2000
		       before calling LSOLP next time,
		       if the structure	of your	matrix
		       does not	change!
(15) = msnorm, input, global
selects the method of normalization. Note: "A" stands for a usually stored matrix (matrix MAT is stored as a one-dimensional array and accessed via array info)(corresponds to NORMALIZATION_METHOD in manual page LINSOL).
     msnorm = 0	 : no normalization,
     msnorm = 1	 : normalization by row	sum
		   (prec(i) = sign A_ii/sum abs(A_ij),j=1,l),
     msnorm = 2	 : normalization by main diagonal
		   elements (prec(i) = 1/A_ii),
     msnorm = 3	 : normalization by square sum
		   (prec(i) = 1/sqrt(sum A_ij**2,j=1,l)),
     msnorm = 4	 : Froboenius normalization
		   (prec(i) = A_ii/sum A_ij**2,j=1,l),
     msnorm = 11 : same	as 1-4,	but with square	root of
     msnorm = 12 : preconditioning matrix from left
     msnorm = 13 : and right (automatically selected if
     msnorm = 14 : the matrix MAT is symmetric).
(16) = startx, input, global
     startx <> 4711 :	 iteration starts from zero vector,
     startx =  4711 :	 an initial guess is given in x.
(startx corresponds to INITGUESS in manual page LINSOL.)
(17) = myproc, input, local <<< **PAR
Logical process(or) number.
(18) = smooth, input, global
     smooth = 0	    :	 iteration returns the non-smoothed
			 solution,
     smooth <> 0    :	 iteration returns the smoothed
			 solution.
(smooth corresponds to SMOOTHED in manual page LINSOL.)
(19) = misc, input, global
     misc = 00000 :    Standard	LSOLP,
     misc = 00001 :    Printing	and checking - as far
		       as possible - of	vector terms
		       after all optimization steps
		       (see optim=ilin(20))
		       (you can	set ilin(41)),
		       Note: optim values {1|2|3}11000
			     can modify	the matrix,
     misc = 00002 :    Printing	and checking - as far
		       as possible - of	vector terms
		       adapted to parallel processing
		       after all optimization steps
		       (see optim=ilin(20))
		       and normalization and cache
		       optimization
		       (you can	set ilin(41)),
		       Note: optim values {1|2|3}11100
			     can modify	the matrix,
     misc = 00003 :    misc = 00001 plus misc =	00002
     misc = 00010 :    Check, if recursions emerge
		       (only of	use on vector computers)
		       <not yet	implemented>,
     misc = 00100 :    Storing of matrix (matrices) MAT
		       in LINSOL-format	after all
		       optimization steps
		       (you can	set ilin(43)),
		       Note: optim values {1|2|3}11000
			     can modify	the matrix,
     misc = 00200 :    Storing of matrix (matrices) MAT
		       in binary LINSOL-format after all
		       optimization steps
		       (you can	set ilin(43)),
		       Note: optim values {1|2|3}11000
			     can modify	the matrix,
     misc = 00300 :    Storing of normalized matrix
		       (matrices) MAT adapted
		       to parallel processing in
		       LINSOL-format
		       (you can	set ilin(43)),
		       Note: optim values {1|2|3}11000
			     can modify	the matrix,
     misc = 00400 :    Storing of normalized matrix
		       (matrices) MAT adapted
		       to parallel processing in binary
		       LINSOL-format
		       (you can	set ilin(43)),
		       Note: optim values {1|2|3}11000
			     can modify	the matrix,
     misc = 01000 :    Output of error,
     misc = 10000 :    Print residuals (and errors, if
		       the output of errors is enabled)
		       in file plot_<date>_<time>
		       (you can	set ilin(44)).
(misc corresponds to MISC in manual page LINSOL.)
(20) = optim, input, global
     optim = 0000000 :	no optimization	and no statistics,
     optim = 0000001 :	time statistics,
     optim = 0000010 :	printing of statistics regarding the
			data structures,
     optim = 0000100 :	safe cache reuse, i.e. arrays
			MAT and	INDEX do not change,
     optim = 0000200 :	unsafe cache reuse, i.e. arrays
			MAT and	INDEX do change,
     optim = 0001000 :	remove zero entries in matrix MAT,
     optim = 0010000 :	restore	matrix MAT completely in
			storage	pattern	'packed	row',
     optim = 0100000 :	search single entries in rows and
			store them on the main diagonal	by
			permuting the matrix MAT rowwise;
			if the matrix is not symmetric,	then
			the matrix size	is reduced by
			shifting entries of known solution
			components to the right	hand side,
     optim = 0200000 :	bandwidth optimizer (BWO),
     optim = 0300000 :	search single entries in rows
			(see above) + BWO,
     optim = 1000000 :	optimization of	data strucures;
			matrix will be restored	in storage
			patterns 'full diagonal',
			'packed	diagonal' and 'starry sky'
			<<< NOT	YET IMPLEMENTED!,
(optim corresponds to OPTIMIZE in manual page LINSOL.)
(21) = runacc, input, global
     runacc = 0	: version with minimum storage requirements
		  is used,
     runacc = 1	: runtime is accelerated, but additional
		  storage is allocated.
(runacc corresponds to RUNTIME_ACCELERATION in manual page LINSOL.)
(22) = bwoalg, input, global
selects the bandwidth optimization algorithm (corresponds to BANDW_ALG in manual page LINSOL).
     bwoalg = 1	: Gibbs-Poole-Stockmeyer,
     bwoalg = 2	: fast single pass algorithm,
     bwoalg = 3	: reverse Cuthill/McKee
     bwoalg = 4	: reverse Cuthill/McKee	(no iteration)
(30) = internal use, output, global
global size of matrix mat.
(31) = internal use, output, global
global size of index array index.
(32) = internal use, output, global
global size of first dimension of information array info.
(33) = internal use, output, local
local size of matrix mat.
(34) = internal use, output, local
local size of index array index.
(35) = internal use, output, local
local size of first dimension of information array info.
(36) = internal use, output, global
global dimension of linear system.
(37) = internal use, output, global
size of preconditioning matrix L, if nproc==1.
(38) = internal use, output, global
size of preconditioning matrix U, if nproc==1.
(39) = internal use (iertag), output, local

(41) = vtunit, input, global
I/O unit for printing of vector terms (should be greater 20 and less 100 - if not set, then printing on standard output).
(43) = matout, input, global
I/O unit for storing of matrix MAT (should be greater 20 and less 100 - if not set, then printing on unit 23).
(44) = resout, input, global
I/O unit for the printing of the residuals (and errors) (should be greater 20 and less 100 - if not set, then printing on unit 24).
(45) = preout, input, global
I/O unit for the printing of the matrix coefficients shifted to the right hand side by the pre-forward elimination step (should be greater 20 and less 100 - if not set, then printing on unit 25).
(49) = internal use, input, global

(50) = internal use, output, global
is -1, if stand-alone version is used with optim=10 and nproc=1 or infmt=1, is -2 after the call of LSPCHK in routine LSOLP.

lmatbk integer, array: lmatbk(nproc), input, global

<<< **PAR

lmatbk(i) returns the number of unknowns on process(or) i.

tids integer, array: tids(nproc), input, global

<<< **PAR

tids defines the mapping of the logical process(or) numbers to the physical process ID's (see section on parallel computers).

iconv integer, output, global

iconv indicates the behaviour of LSOLP.

     iconv = 1	   :	convergence,
     iconv = 2	   :	iteration reached the maximal number of
			matrix-vector-multiplications
			(declared in ilin(2)),
     iconv = 3	   :	divergence,
     iconv = 4	   :	matrix is not suitable for LSOLP,
		   :	e.g. nonregular	matrix,
     iconv = 99	   :	fatal error.

ierr integer, output, local

ierr is a error number with informations about the error.

     ierr = ijk	   :	i = 0,..,9 indicates the level of routines,
			on which the error occurs,
			j = 0 means: wrong parameters,
			j = 1 means: error during normalization,
			j = 2 means: error during iteration process,
			j = 3 means: error during communication,
			k is a general error counter with 0<k<100.

EXAMPLE

see EXAM01 and EXAM01S

METHOD

Normalization

At the beginning the matrix MAT is normalized from the left by a diagonal matrix prec(1..l), if MAT is not normalized (ilin(6)=isprec = 0) and a normalization method is chosen (ilin(15)=msnorm <> 0). If MAT is symmetric or a symmetric normalization is selected, it is additionally normalized from the right to preserve symmetry. Different methods of normalization can be choosen by the user. We recommend the normalization by row sum (ilin(15) = 1), because in the most cases it is the best method with regard to the performed matrix-vector multiplications.

Preconditioning

You can choose the (Incomplete) LU algorithm (Gauss factorization) and the (Incomplete) Cholesky algorithm for symmetric matrices respectively in six variants as Preconditioner. The Cholesky algorithm is used automatically, if the matrix is symmetric, i.e. if lsym = .true. is set. We want to remember here that the Cholesky algorithm only works for symmetric positive definite (SPD) matrices. The first three variants are preconditioning from the left, from the middle and from the right (ilin(8) = {63|66|69}) with access to the data by storage pattern "full row" within the factorization process. The further three variants are preconditioning from the left, from the middle and from the right (ilin(8) = {72|75|78}) with access to the data by storage pattern "packed row" within the factorization process. Three drop parameters control the factorization process. The first one (rlin(2)=threshold1) drops small elements when the matrix MAT is copied into the preconditiong matrix; the second one (rlin(3)=threshold2) drops small elements when factorized data are stored back to the preconditioning matrix; the third one (rlin(6)=ldrop) saves computation time within the factorization process by omitting update terms, if one of the coefficients of the update term is smaller than rlin(6)=ldrop. Additionally an incomplete factorization on the pattern of MAT is allowed (rlin(3)=threshold2 = 1.0). If all drop parameters are zero, the complete LU (L[T]L) factorization is performed; then the iteration usually should converge after one (!) iteration step. If one of the drop parameters is greater than zero, the Incomplete LU (L[T]L) factorization is performed. As larger one of the drop parameters is as worse the convergence of the preconditioned iteration process is. By the variable rlin(4)=gaussfac you can specify the fill-in factor for the matrices L and U (L[T] and L) in comparison to MAT; a positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable gaussfac is used for the allocation. If gaussfac is less than zero, then ABS(gaussfac) is immediately used for the allocation of the preconditioning arrays. If you choose e.g. rlin(4)=gaussfac = 5.0, a safe fill-in factor is determined automatically. If the allocation of the preconditioning arrays for the automatically determined fill-in factor fails, 5.0 is chosen as fill-in factor; this means that maximum 5 times the number of elements can be stored in the matrices L and U (L[T] and L) in comparison to the matrix MAT. If gaussfac is less than zero, then ABS(gaussfac) is immediately used for the allocation of the preconditioning arrays. If the chosen fill-in factor is too small, the factorization will stop; from the chosen fill-in factor and the number of factorized rows you can estimate how large a fill-in factor that allows the complete storing of the matrices L and U (L[T] and L) must be. E.g if the chosen fill-in factor is 10 and the dimension of the matrix is 10000 and the factorization has been stopped in row 4000, a fill-in factor greater than 10*10000/4000 = 25 should work. If the factorization works and you have set optim=10 the minimum working fill-in factor is printed on one processor under "LSOLPP statistics" and on more than one processor it is called "Gauss factor" and printed after the factorization.

Repeated Use of LSOLP

If you want to use the program package LSOLP by calling the routine LSOLP repeatedly the following must be considered: if you call the routine LSOLP the fist time the matrix MAT is usually not normalized, i.e. ilin(6)=isprec = 0 must be chosen; returning from the routine LSOLP the matrix MAT is normalized and ilin(6)=isprec = 1 is automatically set, if you have set ilin(15)=msnorm <> 0, and additionally ilin(6) = ilin(6) + 10 is automatically set, if you have set ilin(8)=msprec >= 63. If you want to call the routine LSOLP with the same matrix MAT repeatedly, you have to set ilin(14)=isave = 1; only then the normalization vector and the preconditioning matrix are saved till the routine LSOLP is called again.

Smoothing (smo) [8]

The residual of a generalized CG-method can oscillate strongly. Therefore both the residual and the approximated solution are smoothed in every iteration step to ensure that the Euclidean norm of the residual does not increase. By setting the parameter ilin(18) to zero it is possible to obtain the original solution instead of the smoothed one, but the iteration process is always controlled by the smoothed residual.

Nonsymmetric Methods

For nonsymmetric problems, LSOLP provides seven different solution algorithms. These algorithms are implemented both with smoothing and without smoothing. Thus we get from seven implemented algorithms twelve methods. In the sequel the most popular and "well-known" ten basic methods, one error-minimizing method and the polyalgorithm are described. 1. PRES20 [1] (Pseudo-RESidual 20), ilin(1)=1, ilin(18)<>0

For sufficient diagonal dominance of the matrix this is a quickly converging method. If the norm of the residual decreases only in the first iteration steps, then choose one of the other methods. The residuals decrease monotonically.

2a. BICO [1] smoothed, ilin(1)=2, ilin(18)<>0

(BiConjugate Gradients)

BICO is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals decrease monotonically.

2b. BCG [2] (BiConjugate Gradients), ilin(1)=2, ilin(18)=0

BCG is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals oscillate heavily. Therefore, use BICO or QMR.

3a. Bi-CGSTAB smoothed [3,8] , ilin(1)=3, ilin(18)<>0

The Bi-Conjugate Gradient STABilized (Bi-CGSTAB) method is a modification of CGS that should be more stable. The residuals decrease monotonically.

3b. Bi-CGSTAB [3] , ilin(1)=3, ilin(18)=0

This is a modification of CGS that should be more stable.

4a. AT-PRES = CGNR [4] , ilin(1)=4, ilin(18)<>0

The A_Transposed-Pseudo_RESidual (AT-PRES) method is aequivalent to the CG Normal equations Residual-minimizing (CGNR) method and should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The residuals decrease monotonically.

4b. CGNE [5], ilin(1)=4, ilin(18)=0

(CG Normal equation Error-minimizing)

CGNE should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The errors decrease monotonically.

5a. CGS smoothed, ilin(1)=5, ilin(18)<>0

(Conjugate Gradient Squared)

The smoothed CGS [6,8] is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers. The residuals decrease monotonically.

5b. CGS [6], ilin(1)=5, ilin(18)=0

(Conjugate Gradient Squared)

CGS is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers.

6. QMR simplified [7,9], ilin(1)=6, ilin(18)<>0

The Quasi-Minimal Residual (QMR) method behaves similar as BICO. In this implementation the look-ahead process for avoiding break-downs is omitted.

7a. Bi-CGSTAB2 smoothed, ilin(1)=7, ilin(18)<>0

The 2-dimensional optimizing Bi-Conjugate Gradient STABilized (Bi-CGSTAB2) method is a modification of Bi-CGSTAB that should be more stable. The residuals decrease monotonically.

7b. Bi-CGSTAB2, ilin(1)=7, ilin(18)=0

The residuals do not decrease monotonically during the iteration process of the non-smoothed variant of Bi-CGSTAB2.

9. CG-AT (CG with matrix A*A^T), ilin(1)=9

LSOLP computes the linear system (MAT)(MAT^T)x=b; you have only to hand over the matrix MAT to LSOLP. The matrix MAT must not be symmetric. The matrix (MAT)(MAT^T) then is symmetric; thus LSOLP chooses the CG method as solution process.

>=200. Polyalgorithms [1]

The Polyalgorithms try to exploit the advantages of the methods PRES20, Bi-CGSTAB2, GMERR5 (GMERR20), AT-PRES and CG, GMERRS for symmetric matrices respectively. Exemplarily the idea will be explained for the Polyalgorithm 200. It starts with the most efficient - but least robust - method PRES20. If PRES20 falls asleep, it switches to Bi-CGSTAB2. If Bi-CGSTAB2 does not converge fast enough, the Polyalgorithm changes to the "emergency exit" AT-PRES. The Polyalgorithm should be used if no detailed knowledge of an optimal iterative method exists. In 95% of all cases the Polyalgorithm is as fast as the best method in the set {PRES20,Bi-CGSTAB2,AT-PRES}.

15. GMERR5 (General Minimal ERRor 5), ilin(1)=15

GMERR5 is the generalization for arbitrary matrices of Fridman's algorithm with a resart after every 5th iteration step.

16. GMERR20 (General Minimal ERRor 20), ilin(1)=16

GMERR20 is the generalization for arbitrary matrices of Fridman's algorithm with a resart after every 20th iteration step.

If you use the above mentioned methods, the matrix must not be stored in the symmetric storage scheme.

Symmetric matrices

The matrix MAT has to be given in the symmetric storage scheme. This reduces both the amount of workspace and CPU-time needed. You again get two different methods, if you choose smoothing and no smoothing respectively. For symmetric matrices both CG methods are faster and more robust than the above mentioned methods. 100a. CG smoothed = GMRES, ilin(1)=100, ilin(18)<>0

The smoothed Conjugate Gradient (CG) method is mathematically aequivalent to the GMRES method.

100b. Classical CG, ilin(1)=100, ilin(18)=0

101. GMERRS, ilin(1)=101

GMERRS (General Minimal ERRor for Symmetric matrices) does not have to be restarted because the residuals being more than 2 iteration steps down from the actual iteration step must be zero. GMERRS should be competitive with GMRES for symmetric matrices.

REFERENCES

[1]

W. Schoenauer, Scientific Computing on Vector Computers, North-Holland, 1987, Amsterdam, New York, Oxford, Tokyo.

[2]

R. Fletcher, Conjugate Gradient Methods for Indefinite Systems, Proc. Dundee Biennial Conf. on Num. Anal., 1975, ed. G.A. Watson, pp. 73-89, Springer-Verlag.

[3]

H.A. van der Vorst, BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comp., 1992, Vol. 13, No. 2, pp. 631-644.

[4]

W. Schoenauer, M. Schlichte, R. Weiss, Wrong Ways and Promising Ways towards Robust and Efficient Iterative Linear Solvers, Advances in Computer Methods for Partial Differential Equations VI, Publ. IMACS, 1987, pp. 7-14.

[5]

E.J. Craig, The N-Step Iteration Procedures, Math. Phys., 1955, Vol. 34, pp. 64--73.

[6]

P. Sonneveld, CGS, A fast Lanczos-type Solver for nonsymmetric linear Systems, SIAM J. Sci. Stat. Comp., 1989, Vol. 10, No.1, pp. 36-52.

[7]

R.W. Freund and N.M. Nachtigal, QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems, Numer. Math., 1991, Vol. 60, pp. 315--339.,

[8]

R. Weiss, Properties of Generalized Conjugate Gradient Methods, Num. Lin. Alg. with Appl., 1994, Vol. 1, No. 1, pp. 45--63.

[9]

L. Zhou and H.F. Walker, Residual Smoothing Techniques for Iterative Methods, SIAM J. Sci. Comput., 1994, Vol. 15, No. 2, pp. 297--312.

PROGRAMMERS / CONSULTING

Original Program by H. Haefner, L. Grosz, C. Roll, P. Sternecker, R. Weiss 1989-2001

Consulting by

     Numerikforschung fuer Supercomputer
     Rechenzentrum der Universitaet Karlsruhe
     Am	Zirkel 2
     D-76128 Karlsruhe
     Germany
     Tel.      :   (+721) 608/4869
     Fax.      :   (+721) 32550
     e-mail    :   haefner@rz.uni-karlsruhe.de

LSOLP 3 "VERSION 1.0" "January 2002"

Table of contents