LINSOL 3 "VERSION 1.0" "January 2002"

Table of contents


PACKAGE LINSOL

LINSOL - Solves a linear system of equations MAT*x = b


PURPOSE

LINSOL is a program package that solves a linear system MAT*x = b with a large and sparse l x l real matrix MAT and a right hand side b of length l. Different solution methods of the generalized CG-method are used, which can be selected by the user. To accelerate the iteration and to get a better stability, the matrix MAT is normalized by a diagonal matrix and/or preconditioned from the left hand side and/or the right hand side. The matrix MAT, the right hand side and an initial guess - if available - are read from two or three files. The matrix can be read in two formats - the well-known Harwell-Boeing (H/B) file format and the LINSOL file format that is optimally suited to the available storage patterns of the program package LINSOL. For the LINSOL file format the file can be either formatted (ASCII mode) or unformatted (binary mode) { --> INPUT_FORMAT, MATRIX_INPUT_FILENAME}. The LINSOL file format is explained in the manual page <linsol_file_format>. An arbitrary number of right hand sides must be stored in an additional (second) file. The file can only be omitted if you set the environment variable NUMBER_RHS = -1 { --> NUMBER_RHS, RHS_INPUT_FILENAME}. Setting NUMBER_RHS = -1 means that a solution (x_ex) is automatically chosen by LINSOL and then a right hand side (b=MAT*x_ex) is computed by LINSOL. If you want to use initial guesses you must declare a third file containing the initial guesses { --> INITGUESS, INITGUESS_INPUT_FILENAME}. The file is read in from the program package LINSOL if you set the environment variable INITGUESS = 1. The number of initial guesses stored in the (third) file can be less than the number of right hand sides stored in the (second) file. The solution(s) of the linear equation(s) is (are) stored in another file { --> SOLUTIONS_FILENAME}. The file format for the vectors right hand side(s), initial guess(es) and solution is explained in the manual page <vectors_file_format>. If you set MATRIX_OUTPUT_FORMAT = {1|2}, the matrix is stored locally on each processor running the program package LINSOL on more than one processor { --> MATRIX_OUTPUT_FORMAT, MATRIX_OUTPUT_FILENAME}; on more than one processor the filename of the written matrix automatically gets the suffix _myid with myid being a four-digit number containing the processor number. Then you can read the distributed matrix by using LINSOL on the same processor number if you set the environment variable MATRIX_DISTRIBUTED = 1 { --> MATRIX_DISTRIBUTED}. The suffices are internally appended to the filename of the matrix { --> MATRIX_INPUT_FILENAME}.

STARTING LINSOL

The program package LINSOL is controlled by parameters that are stored in (UNIX) environment variables and are gathered in a parameter-file. The environment variables you need to declare are described in the section ENVIRONMENT VARIABLES. Entering

linnox <parameter-file>

starts the executable LINSOL (job) interactively on 1 processor. The shellscript linnox starting the job has the following parameters:

      -p <procs>    :	 job is	run on <procs> processors,
      -n <nodes>    :	 job is	run on <nodes> nodes,
      -d	    :	 debug option -	is usually not available
			 to the	users,
!!!Starting the job as a batch job the following parameters usually must be declared!!!
      -b		 :    the job is run as	batch job,
      -c <jobclass>	 :    the job is submitted to the queue
			      <jobclass>,
      -T <time_limit>	 :    time limit in minutes for	the batch job,
      -M <mem_limit>	 :    memory limit in MBytes for the batch job.

E.g. if you want to start a batch job on 4 processors in the queue <parallel> with a time limit of 100 minutes and a memory limit of 500 MBytes, you have to enter

linnox -b -p 4 -c parallel -T 100 -M 500 <parameter-file>

Examples for parameter-files can be found under

http::http://www.rz.uni-karlsruhe.de/~linsol/examples.

The parameter-files exam01_param, gre512_param and oilgen_param can be used as template.


ENVIRONMENT VARIABLES

INPUT_FORMAT
     INPUT_FORMAT = 0	 :    matrix is	read in	unformatted
			      (binary) LINSOL file format,
     INPUT_FORMAT = 1	 :    matrix is	read in	formatted LINSOL
			      file format,
     INPUT_FORMAT = 2	 :    matrix is	read in	Harwell/Boeing
			      file format.
MATRIX_DISTRIBUTED
     MATRIX_DISTRIBUTED	= 0    :    matrix is read from	a single file,
     MATRIX_DISTRIBUTED	= 1    :    each processor reads the file with
				    the	own processor number as	suffix
				    of the filename of the matrix.
MATRIX_INPUT_FILENAME
Specify the filename of the matrix to be read with full directory path.
MATRIX_OUTPUT_FORMAT
     MATRIX_OUTPUT_FORMAT = 0	 :    matrix will not be stored	in a
				      file,
     MATRIX_OUTPUT_FORMAT = 1	 :    matrix is	stored in the formatted
				       LINSOL file format,
     MATRIX_OUTPUT_FORMAT = 2	 :    matrix is	stored in the
				      unformatted (binary) LINSOL file
				      format.
MATRIX_OUTPUT_FILENAME
Specify the filename of the matrix to be written with full directory path.
NUMBER_RHS
     NUMBER_RHS	> 0	:    means that	NUMBER_RHS linear equations
			     will be calculated	with NUMBER_RHS
			     right hand	sides by reading them from
			     file
     NUMBER_RHS	= 0	:    means that	the right hand side is taken
			     from the same file	as the matrix
			     (works only with matrices stored in
			      H/B format) <<< not yet available!!!,
     NUMBER_RHS	= -1	:    means that	the right hand side is
			     calculated	automatically by an implicit
			     choice of a solution vector.
RHS_INPUT_FILENAME
Specify the filename of the right hand side(s) with full directory path; must be specified only, if NUMBER_RHS > 0.
INITGUESS
     INITGUESS = 0    :	   no initial guess(es)	is (are) used,
     INITGUESS = 1    :	   initial guess(es) is	(are) used; they
			   will	be read	from file
INITGUESS_INPUT_FILENAME
Specify the filename of the initial guess(es) with full directory path; must be specified only, if INITGUESS = 1.
SOLUTIONS_FILENAME
Specify the filename of the solution vector(s) with full directory path.
METHOD
Selection of iterative method.
     METHOD = 1	     :	  PRES20
     METHOD = 2	     :	  BICO
     METHOD = 3	     :	  Bi-CGSTAB
     METHOD = 4	     :	  AT-PRES
     METHOD = 5	     :	  CGS
     METHOD = 6	     :	  QMR-Simulator
     METHOD = 7	     :	  Bi-CGSTAB2
     METHOD = 9	     :	  CG-AT	for non-symmetric matrices
     METHOD = 15     :	  GMERR5
     METHOD = 16     :	  GMERR20
     METHOD = 100    :	  Classical CG for symmetric matrices.
     METHOD = 101    :	  GMERRS for symmetric matrices.
     METHOD = 200    :	  Polyalgorithm	PRES20 -> Bi-CGSTAB2 ->	AT-PRES
     METHOD = 201    :	  Polyalg.  PRES20 <-> Bi-CGSTAB2 <-> AT-PRES
     METHOD = 202    :	  Polyalgorithm
			  PRES20 -> Bi-CGSTAB2 -> GMERR5 -> AT-PRES
     METHOD = 203    :	  Polyalgorithm
			  PRES20 -> Bi-CGSTAB2 -> GMERR5 <-> AT-PRES
     METHOD = 204    :	  Polyalgorithm
			  PRES20 -> Bi-CGSTAB2 -> GMERR20 -> AT-PRES
     METHOD = 210    :	  Polyalgorithm	GMERR5 -> AT-PRES
     METHOD = 211    :	  Polyalgorithm	GMERR5 <-> AT-PRES
     METHOD = 212    :	  Polyalgorithm	GMERR20	-> AT-PRES
     METHOD = 220    :	  Polyalgorithm	GMERRS -> CG
			  for symmetric	matrices
     METHOD = 221    :	  Polyalgorithm	GMERRS <-> CG
			  for symmetric	matrices
     METHOD = 999    :	  {TEST-INTERFACE}.
NORMALIZATION_METHOD (NM)
selects the method of normalization. Note: "A" stands for a regular stored matrix.
     NM	= 0  : no normalization,
     NM	= 1  : normalization by	row sum
		   (prec(i) = sign A_ii/sum abs(A_ij),j=1,l),
     NM	= 2  : normalization by	main diagonal
		   elements (prec(i) = 1/A_ii),
     NM	= 3  : normalization by	square sum
		   (prec(i) = 1/sqrt(sum A_ij**2,j=1,l)),
     NM	= 4  : Froboenius normalization
		   (prec(i) = A_ii/sum A_ij**2,j=1,l),
     NM	= 11 : same as 1-4, but	with square root of
     NM	= 12 : normalization matrix from left
     NM	= 13 : and right (automatically	selected if
     NM	= 14 : the matrix MAT is symmetric).
PRECONDITIONING_METHOD (PM)
selects the method of preconditioning; works in connection with the iterative methods PRES20, BICO and Bi-CGSTAB2, if the matrix is not symmetric; works in connection with the classical CG method, if the matrix is symmetric (the Cholesky algorithm works up-to-now only on 1 processor).
     PM	= 0	:    no	preconditioning,
     PM	= 63	:    preconditioning from the middle by
		     (I)LU algorithm [(Incomplete) Gauss algorithm)]
		     or
		     (I)L[T]L [(Incomplete) Cholesky algorithm)],
		     this option implies OPTIMIZE = 111000,
     PM	= 66	:    preconditioning from the right by
		     (I)LU algorithm [(Incomplete) Gauss algorithm)]
		     or
		     (I)L[T]L [(Incomplete) Cholesky algorithm)],
		     this option implies OPTIMIZE = 111000,
     PM	= 69	:    preconditioning from the left by
		     (I)LU algorithm [(Incomplete) Gauss algorithm)]
		     or
		     (I)L[T]L [(Incomplete) Cholesky algorithm)],
		     this option implies OPTIMIZE = 111000,
     PM	= 72	:    preconditioning from the middle by
		     (I)LU algorithm on	indexed	data,
		     this option implies optim = 111000,
     PM	= 75	:    preconditioning from the right by
		     (I)LU algorithm on	indexed	data,
		     this option implies optim = 111000,
     PM	= 78	:    preconditioning from the left by
		     (I)LU algorithm on	indexed	data,
		     this option implies optim = 111000.
GAUSS_FACTOR
specifies the fill-in factor for the preconditioning by GAUSS. A positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable GAUSS_FACTOR is used for the allocation. If GAUSS_FACTOR is less than zero, then ABS(GAUSS_FACTOR) is immediately used for the allocation of the preconditioning arrays. Exemplarily a value of 3.0 means that the sum of the sizes of the matrices U and L can be maximum 3.0 times greater than the size of the LINSOL matrix MAT.
GAUSS_THRES1
is the first threshold value for the factorization process (ILU algorithm); from $GAUSS_THRES1 a threshold value is computed so that about $GAUSS_THRES1*100 percent of all entries being smaller than the computed threshold value are removed when the matrix MAT is being copied into the preconditioning matrix.
GAUSS_THRES2
is the second threshold value for the factorization process (ILU algorithm); below this value all data are removed when storing back the matrices U and L (L[T] and L), that are computed in the factorization window, to the preconditioning matrix A(P).
LDROP_FACTOR
is the third threshold value that reduces the factorization time by omitting update terms within the factorization process under the following conditions. Only indices of the elimination coefficients with corresponding absolute elements greater than LDROP_FACTOR are used to update the factorization matrix (this holds for PRECONDITIONING_METHOD = {72|75|78}); only rows of the actual factorization window with elements of and below the pivot column greater than LDROP_FACTOR are updated (this holds for PRECONDITIONING_METHOD = {63|66|69|72|75|78}).
MATRIX_NORMALIZED (MN)
indicates, if the matrix MAT is already normalized. (Comment: The variable is automatically increased, if the matrix has been normalized/preconditioned/reduced/bandwidth-optimized in a previous run of LINSOL with a different right hand side (see NUMBER_RHS and manual page <lsolp>).
     MN	= 0000	   :	matrix is not normalized,
     MN	= 0001	   :	matrix is normalized.
CHECK_SYSTEM (CS)
     CS	= 0    :    normalized system is considered
		    for	checking of stopping criterion,
     CS	= 1    :    original system is considered
		    for	checking of stopping criterion.
SMOOTHED
     SMOOTHED =	0     :	   iteration returns the non-smoothed
			   solution,
     SMOOTHED <> 0    :	   iteration returns the smoothed
			   solution.
ITERATION_MAXIMUM
Maximum number of iteration steps.
OUTPUT_DEVICE (OD)
Specify if output messages are printed to stdout or to files:
     OD	= 6    :    print to stdout,
     OD	= 7    :    print to files with	filenames
		    stdout_{0001...0016} (e.g. for 16 procs).
VERBOSE_LEVEL (VL)
Output control
     VL	= 0	  :    print only error	messages,
     VL	= 1	  :    print further informations, but no
		       output during iteration,
     VL	= i>1	  :    output always after i iteration-
		       steps on	the processor with
		       logical process(or) number 1,
     VL	= -i<0	  :    it holds	the same as for	idoku>0
		       with the	difference of output
		       on all processors.
MISC
     MISC = 00000    :	  Standard LINSOL
     MISC = 00001    :	  Printing and checking	- as far
			  as possible -	of vector terms	after all
			  optimization steps (see OPTIMIZE)
			  (in file with	name linout_{0001...0016}
			   e.g.	for 16 procs),
     MISC = 00002    :	  Printing and checking	- as far
			  as possible -	of vector terms	adapted
			  to parallel processing after all
			  optimization steps (see OPTIMIZE)
			  and normalization and	cache optimization
			  (in file with	name linout_{0001...0016}
			   e.g.	for 16 procs),
     MISC = 00003    :	  MISC = 00001 plus MISC = 00002,
     MISC = 00010    :	  Check, if recursions emerge
			  (only	of use on vector computers)
			  <not yet implemented>,
     MISC = 00100    :	  Storing of original matrix
			  (matrices) MAT in LINSOL-format
			  (you must set	MATRIX_OUTPUT_FORMAT=1);
			  Note:	optim values {1|2|3}11000
				can modify the matrix,
     MISC = 00200    :	  Storing of original matrix
			  (matrices) MAT in binary LINSOL-format
			  (you must set	MATRIX_OUTPUT_FORMAT=2);
			  Note:	optim values {1|2|3}11000
				can modify the matrix,
     MISC = 00300    :	  Storing of normalized	matrix
			  (matrices) MAT adapted
			  to parallel processing in LINSOL-format
			  (you must set	MATRIX_OUTPUT_FORMAT=1);
			  Note:	optim values {1|2|3}11000
				can modify the matrix,
     MISC = 00400    :	  Storing of normalized	matrix
			  (matrices) MAT adapted
			  to parallel processing in binary
			  LINSOL-format	(you must set
			  MATRIX_OUTPUT_FORMAT=2);
			  Note:	optim values {1|2|3}11000
				can modify the matrix,
     MISC = 01000    :	  Output of error,
     MISC = 10000    :	  Print	residuals (and errors, if
			  the output of	errors is enabled)
			  in file plot_<date>_<time>.
OPTIMIZE (OPT)
     OPT = 0000000    :	   no optimization and no statistics,
     OPT = 0000001    :	   time	statistics,
     OPT = 0000010    :	   printing of statistics regarding the
			   data	structures,
     OPT = 0000100    :	   safe	cache reuse, i.e. arrays
			   MAT and INDEX do not	change,
     OPT = 0000200    :	   unsafe cache	reuse, i.e. arrays
			   MAT and INDEX do change,
     OPT = 0001000    :	   remove zero entries in matrix MAT,
     OPT = 0010000    :	   restore matrix MAT completely in
			   storage pattern 'packed row',
     OPT = 0100000    :	   search single entries in rows and
			   store them on the main diagonal by
			   permuting the matrix	MAT rowwise;
			   if the matrix is not	symmetric, then
			   the matrix size is reduced by
			   shifting entries of known solution
			   components to the right hand	side,
     OPT = 0200000    :	   bandwidth optimizer (BWO),
     OPT = 0300000    :	   search single entries in rows
			   (see	above) + BWO,
     OPT = 1000000    :	   optimization	of data	strucures;
			   matrix will be restored in storage
			   patterns 'full diagonal',
			   'packed diagonal' and 'starry sky'
			   <not	yet implemented>.
BANDW_ALG (BA)
selects the bandwidth optimization algorithm.
     BA	= 1    :    Gibbs-Poole-Stockmeyer,
     BA	= 2    :    fast single	pass algorithm,
     BA	= 3    :    reverse Cuthill/McKee,
     BA	= 4    :    reverse Cuthill/McKee (no iteration).
RUNTIME_ACCELERATION (RA)
     RA	= 0    :    version with minimum storage requirements
		    is used,
     RA	= 1    :    runtime is accelerated on vectorcomputers,
		    but	additional storage is allocated.
THRESHOLD
is the relative stopping factor for all iterative methods.
     LC1 :	     (||(preconditioned) smoothed residuum||2
		      <= epslin	* ||(preconditioned) r.h.s.||2);
     noprec = 1	:
     LC2 :	     (||original smoothed residuum||max
		      <= epslin	* ||original r.h.s.||max),
     noprec = 0	:
     LC2 :	     (||(preconditioned) smoothed residuum||max
		      <= epslin	* ||(preconditioned) r.h.s.||max).
If (LC1 .and. LC2) then the iteration is stopped. If (LC1 .and. .not. LC2) then a new stopping criterion is computed.
TEMP_DIR
Specify a temporary directory for scratch files.
BUFFER_SIZE
Size of the buffer in MByte that is used in data transfer operations.
MATRIX_FACTOR
Specify by which factor the matrix size to be read in from file is enlarged. The parameter is only relevant on parallel computers.
INDEX_FACTOR
Specify by which factor the size of the index array to be read in from file is enlarged. The parameter is only relevant on parallel computers.
INFO_FACTOR
Specify by which factor the size of the index array to be read in from file is enlarged. The parameter is only relevant on parallel computers.

EXAMPLE

see EXAM01 and EXAM01S

METHOD

Normalization

At the beginning the matrix MAT is normalized from the left by a diagonal matrix N, if MAT is not normalized (MATRIX_NORMALIZED = {0|10}) and a normalization method is chosen (NORMALIZATION_METHOD > 0). If MAT is symmetric or a symmetric normalization is selected, it is additionally normalized from the right to preserve symmetry. Different methods of normalization can be choosen by the user. We recommend the normalization by row sum (NORMALIZATION_METHOD = 1), because in the most cases it is the best method with regard to the performed matrix-vector multiplications. If you want to store the matrix (MATRIX_OUTOUT_FORMAT > 0) and you choose MISC = {100|200}, the original matrix will be stored; if you choose MISC = {300|400}, the normalized matrix will be stored.

Preconditioning

You can choose the (Incomplete) LU algorithm (Gauss factorization) and the (Incomplete) Cholesky algorithm for symmetric matrices respectively in three variants as Preconditioner. The Cholesky algorithm is used automatically, if the matrix is symmetric, i.e. if lsym = .true. is set. We want to remember here that the Cholesky algorithm only works for symmetric positive definite (SPD) matrices. The three variants are preconditioning from the left, from the middle and from the right {--> PRECONDITIONING_METHOD}. Three drop parameters control the factorization process. The first one, GAUSS_THRES1, drops small elements when the matrix MAT is copied into the preconditiong matrix; the second one, GAUSS_THRES2, drops small elements when factorized data are stored back to the preconditioning matrix; the third one, LDROP_FACTOR, saves computation time within the factorization process by omitting update terms, if one of the coefficients of the update term is smaller than LDROP_FACTOR. Additionally an incomplete factorization on the pattern of MAT is allowed (GAUSS_THRES2 = 1.0). If all drop parameters are zero, the complete LU (L[T]L) factorization is performed; then the iteration usually should converge after one (!) iteration step. If one of the drop parameters is greater than zero, the Incomplete LU (L[T]L) factorization is performed. As larger one of the drop parameters is as worse the convergence of the preconditioned iteration process is. By the variable GAUSS_FACTOR you can specify the fill-in factor for the matrices L and U (L[T] and L) in comparison to MAT; a positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable GAUSS_FACTOR is used for the allocation. If GAUSS_FACTOR is less than zero, then ABS(GAUSS_FACTOR) is immediately used for the allocation of the preconditioning arrays. If you choose e.g. GAUSS_FACTOR = 5.0, a safe fill-in factor is determined automatically. If the allocation of the preconditioning arrays for the automatically determined fill-in factor fails, 5.0 is chosen as fill-in factor; this means that maximum 5 times the number of elements can be stored in the matrices L and U (L[T] and L) in comparison to the matrix MAT. If GAUSS_FACTOR is less than zero, then ABS(GAUSS_FACTOR) is immediately used for the allocation of the preconditioning arrays. If the chosen fill-in factor is too small, the factorization will stop; from the chosen fill-in factor and the number of factorized rows you can estimate how large a fill-in factor that allows the complete storing of the matrices L and U (L[T] and L) must be. E.g if the chosen fill-in factor is 10 and the dimension of the matrix is 10000 and the factorization has been stopped in row 4000, a fill-in factor greater than 10*10000/4000 = 25 should work. If the factorization works and you have set OPTIMIZE = 10 the minimum working fill-in factor is printed on one processor under "LSOLPP statistics" and on more than one processor it is called "Gauss factor" and printed after the factorization.

Smoothing [8]

The residual of a generalized CG-method can oscillate strongly. Therefore both the residual and the approximated solution are smoothed in every iteration step to ensure that the Euclidean norm of the residual does not increase. By setting the environment variable SMOOTHED = 0 it is possible to obtain the original solution instead of the smoothed one, but the iteration process is always controlled by the smoothed residual.

Nonsymmetric Methods

For nonsymmetric problems, LINSOL provides seven different solution algorithms. These algorithms are implemented both with smoothing and without smoothing. Thus we get from seven implemented algorithms twelve methods. In the sequel the most popular and "well-known" ten basic methods, one error-minimizing method and one polyalgorithm representative for all others are described. 1. PRES20 [1] (Pseudo-RESidual 20), METHOD=1, SMOOTHED<>0
For sufficient diagonal dominance of the matrix this is a quickly converging method. If the norm of the residual decreases only in the first iteration steps, then choose one of the other methods. The residuals decrease monotonically.
2a. BICO [1] smoothed, METHOD = 2, SMOOTHED <> 0
(BiConjugate Gradients)
BICO is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals decrease monotonically.
2b. BCG [2] (BiConjugate Gradients), METHOD=2, SMOOTHED=0
BCG is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals oscillate heavily. Therefore, use BICO or QMR.
3a. Bi-CGSTAB smoothed [3,8] , METHOD = 3, SMOOTHED <> 0
The Bi-Conjugate Gradient STABilized (Bi-CGSTAB) method is a modification of CGS that should be more stable. The residuals decrease monotonically.
3b. Bi-CGSTAB [3] , METHOD = 3, SMOOTHED = 0
This is a modification of CGS that should be more stable.
4a. AT-PRES = CGNR [4] , METHOD = 4, SMOOTHED <> 0
The A_Transposed-Pseudo_RESidual (AT-PRES) method is aequivalent to the CG Normal equations Residual-minimizing (CGNR) method and should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The residuals decrease monotonically.
4b. CGNE [5], METHOD = 4, SMOOTHED = 0
(CG Normal equation Error-minimizing)
CGNE should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The errors decrease monotonically.
5a. CGS smoothed, METHOD = 5, SMOOTHED <> 0
(Conjugate Gradient Squared)
The smoothed CGS [6,8] is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers. The residuals decrease monotonically.
5b. CGS [6], METHOD = 5, SMOOTHED = 0
(Conjugate Gradient Squared)
CGS is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers.
6. QMR simplified [7,9], METHOD = 6, SMOOTHED <> 0
The Quasi-Minimal Residual (QMR) method behaves similar as BICO. In this implementation the look-ahead process for avoiding break-downs is omitted.
7a. Bi-CGSTAB2 smoothed, METHOD = 7, SMOOTHED <> 0
The 2-dimensional optimizing Bi-Conjugate Gradient STABilized (Bi-CGSTAB2) method is a modification of Bi-CGSTAB that should be more stable. The residuals decrease monotonically.
7b. Bi-CGSTAB2, METHOD = 7, SMOOTHED = 0
The residuals do not decrease monotonically during the iteration process of the non-smoothed variant of Bi-CGSTAB2.
9. CG-AT (CG with matrix A*A^T), METHOD = 9
LINSOL computes the linear system (MAT)(MAT^T)x=b; you have only to hand over the matrix MAT to LINSOL. The matrix MAT must not be symmetric. The matrix (MAT)(MAT^T) then is symmetric; thus LINSOL chooses the CG method as solution process.
>=200. Polyalgorithms [1]
The Polyalgorithms try to exploit the advantages of the methods PRES20, Bi-CGSTAB2, GMERR5 (GMERR20), AT-PRES and CG, GMERRS for symmetric matrices respectively. Exemplarily the idea will be expIained for the Polyalgorithm 200. It starts with the most efficient - but least robust - method PRES20. If PRES20 falls asleep, it switches to Bi-CGSTAB2. If Bi-CGSTAB2 does not converge fast enough, the Polyalgorithm changes to the "emergency exit" AT-PRES. The Polyalgorithm should be used if no detailed knowledge of an optimal iterative method exists. In 95% of all cases the Polyalgorithm is as fast as the best method in the set {PRES20,Bi-CGSTAB2,AT-PRES}.
15. GMERR5 (General Minimal ERRor 5), METHOD = 15
GMERR5 is the generalization for arbitrary matrices of Fridman's algorithm with a restart after every 5th iteration step.
16. GMERR20 (General Minimal ERRor 20), METHOD = 16
GMERR20 is the generalization for arbitrary matrices of Fridman's algorithm with a restart after every 20th iteration step.
If you use the above mentioned methods, the matrix must not be stored in the symmetric storage scheme.

Symmetric matrices

The matrix MAT has to be given in the symmetric storage scheme. This reduces both the amount of workspace and CPU-time needed. You again get two different methods, if you choose smoothing and no smoothing respectively. For symmetric matrices both CG methods are faster and more robust than the above mentioned methods. 100a. CG smoothed = GMRES, METHOD=100, SMOOTHED<>0
The smoothed Conjugate Gradient (CG) method is mathematically aequivalent to the GMRES method.
100b. Classical CG, METHOD = 100, SMOOTHED = 0
101. GMERRS, METHOD = 101
GMERRS (General Minimal ERRor for Symmetric matrices) does not have to be restarted because the residuals being more than 2 iteration steps down from the actual iteration step must be zero. GMERRS should be competitive with GMRES for symmetric matrices.

REFERENCES

[1]
W. Schoenauer, Scientific Computing on Vector Computers, North-Holland, 1987, Amsterdam, New York, Oxford, Tokyo.
[2]
R. Fletcher, Conjugate Gradient Methods for Indefinite Systems, Proc. Dundee Biennial Conf. on Num. Anal., 1975, ed. G.A. Watson, pp. 73-89, Springer-Verlag.
[3]
H.A. van der Vorst, BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comp., 1992, Vol. 13, No. 2, pp. 631-644.
[4]
W. Schoenauer, M. Schlichte, R. Weiss, Wrong Ways and Promising Ways towards Robust and Efficient Iterative Linear Solvers, Advances in Computer Methods for Partial Differential Equations VI, Publ. IMACS, 1987, pp. 7-14.
[5]
E.J. Craig, The N-Step Iteration Procedures, Math. Phys., 1955, Vol. 34, pp. 64--73.
[6]
P. Sonneveld, CGS, A fast Lanczos-type Solver for nonsymmetric linear Systems, SIAM J. Sci. Stat. Comp., 1989, Vol. 10, No.1, pp. 36-52.
[7]
R.W. Freund and N.M. Nachtigal, QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems, Numer. Math., 1991, Vol. 60, pp. 315--339.,
[8]
R. Weiss, Properties of Generalized Conjugate Gradient Methods, Num. Lin. Alg. with Appl., 1994, Vol. 1, No. 1, pp. 45--63.
[9]
L. Zhou and H.F. Walker, Residual Smoothing Techniques for Iterative Methods, SIAM J. Sci. Comput., 1994, Vol. 15, No. 2, pp. 297--312.

PROGRAMMERS / CONSULTING

Original Program by H. Haefner, L. Grosz, C. Roll, P. Sternecker, R. Weiss 1989-2001

Consulting by

     Numerikforschung fuer Supercomputer
     Rechenzentrum der Universitaet Karlsruhe
     Am	Zirkel 2
     D-76128 Karlsruhe
     Germany
     Tel.      :   (+721) 608/4869
     Fax.      :   (+721) 32550
     e-mail    :   haefner@rz.uni-karlsruhe.de