linnox <parameter-file>
starts the executable LINSOL (job) interactively on 1 processor. The shellscript linnox starting the job has the following parameters:
!!!Starting the job as a batch job the following parameters usually must be declared!!!-p <procs> : job is run on <procs> processors, -n <nodes> : job is run on <nodes> nodes, -d : debug option - is usually not available to the users,
-b : the job is run as batch job, -c <jobclass> : the job is submitted to the queue <jobclass>, -T <time_limit> : time limit in minutes for the batch job, -M <mem_limit> : memory limit in MBytes for the batch job.
E.g. if you want to start a batch job on 4 processors in the queue <parallel> with a time limit of 100 minutes and a memory limit of 500 MBytes, you have to enter
linnox -b -p 4 -c parallel -T 100 -M 500 <parameter-file>
Examples for parameter-files can be found under
http::http://www.rz.uni-karlsruhe.de/~linsol/examples.
The parameter-files exam01_param, gre512_param and oilgen_param can be used as template.
MATRIX_DISTRIBUTEDINPUT_FORMAT = 0 : matrix is read in unformatted (binary) LINSOL file format, INPUT_FORMAT = 1 : matrix is read in formatted LINSOL file format, INPUT_FORMAT = 2 : matrix is read in Harwell/Boeing file format.
MATRIX_INPUT_FILENAMEMATRIX_DISTRIBUTED = 0 : matrix is read from a single file, MATRIX_DISTRIBUTED = 1 : each processor reads the file with the own processor number as suffix of the filename of the matrix.
Specify the filename of the matrix to be read with full directory path.MATRIX_OUTPUT_FORMAT
MATRIX_OUTPUT_FILENAMEMATRIX_OUTPUT_FORMAT = 0 : matrix will not be stored in a file, MATRIX_OUTPUT_FORMAT = 1 : matrix is stored in the formatted LINSOL file format, MATRIX_OUTPUT_FORMAT = 2 : matrix is stored in the unformatted (binary) LINSOL file format.
Specify the filename of the matrix to be written with full directory path.NUMBER_RHS
RHS_INPUT_FILENAMENUMBER_RHS > 0 : means that NUMBER_RHS linear equations will be calculated with NUMBER_RHS right hand sides by reading them from file NUMBER_RHS = 0 : means that the right hand side is taken from the same file as the matrix (works only with matrices stored in H/B format) <<< not yet available!!!, NUMBER_RHS = -1 : means that the right hand side is calculated automatically by an implicit choice of a solution vector.
Specify the filename of the right hand side(s) with full directory path; must be specified only, if NUMBER_RHS > 0.INITGUESS
INITGUESS_INPUT_FILENAMEINITGUESS = 0 : no initial guess(es) is (are) used, INITGUESS = 1 : initial guess(es) is (are) used; they will be read from file
Specify the filename of the initial guess(es) with full directory path; must be specified only, if INITGUESS = 1.SOLUTIONS_FILENAME
Specify the filename of the solution vector(s) with full directory path.METHOD
Selection of iterative method.NORMALIZATION_METHOD (NM)METHOD = 1 : PRES20 METHOD = 2 : BICO METHOD = 3 : Bi-CGSTAB METHOD = 4 : AT-PRES METHOD = 5 : CGS METHOD = 6 : QMR-Simulator METHOD = 7 : Bi-CGSTAB2 METHOD = 9 : CG-AT for non-symmetric matrices METHOD = 15 : GMERR5 METHOD = 16 : GMERR20 METHOD = 100 : Classical CG for symmetric matrices. METHOD = 101 : GMERRS for symmetric matrices. METHOD = 200 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> AT-PRES METHOD = 201 : Polyalg. PRES20 <-> Bi-CGSTAB2 <-> AT-PRES METHOD = 202 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> GMERR5 -> AT-PRES METHOD = 203 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> GMERR5 <-> AT-PRES METHOD = 204 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> GMERR20 -> AT-PRES METHOD = 210 : Polyalgorithm GMERR5 -> AT-PRES METHOD = 211 : Polyalgorithm GMERR5 <-> AT-PRES METHOD = 212 : Polyalgorithm GMERR20 -> AT-PRES METHOD = 220 : Polyalgorithm GMERRS -> CG for symmetric matrices METHOD = 221 : Polyalgorithm GMERRS <-> CG for symmetric matrices METHOD = 999 : {TEST-INTERFACE}.
selects the method of normalization. Note: "A" stands for a regular stored matrix.PRECONDITIONING_METHOD (PM)NM = 0 : no normalization, NM = 1 : normalization by row sum (prec(i) = sign A_ii/sum abs(A_ij),j=1,l), NM = 2 : normalization by main diagonal elements (prec(i) = 1/A_ii), NM = 3 : normalization by square sum (prec(i) = 1/sqrt(sum A_ij**2,j=1,l)), NM = 4 : Froboenius normalization (prec(i) = A_ii/sum A_ij**2,j=1,l), NM = 11 : same as 1-4, but with square root of NM = 12 : normalization matrix from left NM = 13 : and right (automatically selected if NM = 14 : the matrix MAT is symmetric).
selects the method of preconditioning; works in connection with the iterative methods PRES20, BICO and Bi-CGSTAB2, if the matrix is not symmetric; works in connection with the classical CG method, if the matrix is symmetric (the Cholesky algorithm works up-to-now only on 1 processor).GAUSS_FACTORPM = 0 : no preconditioning, PM = 63 : preconditioning from the middle by (I)LU algorithm [(Incomplete) Gauss algorithm)] or (I)L[T]L [(Incomplete) Cholesky algorithm)], this option implies OPTIMIZE = 111000, PM = 66 : preconditioning from the right by (I)LU algorithm [(Incomplete) Gauss algorithm)] or (I)L[T]L [(Incomplete) Cholesky algorithm)], this option implies OPTIMIZE = 111000, PM = 69 : preconditioning from the left by (I)LU algorithm [(Incomplete) Gauss algorithm)] or (I)L[T]L [(Incomplete) Cholesky algorithm)], this option implies OPTIMIZE = 111000, PM = 72 : preconditioning from the middle by (I)LU algorithm on indexed data, this option implies optim = 111000, PM = 75 : preconditioning from the right by (I)LU algorithm on indexed data, this option implies optim = 111000, PM = 78 : preconditioning from the left by (I)LU algorithm on indexed data, this option implies optim = 111000.
specifies the fill-in factor for the preconditioning by GAUSS. A positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable GAUSS_FACTOR is used for the allocation. If GAUSS_FACTOR is less than zero, then ABS(GAUSS_FACTOR) is immediately used for the allocation of the preconditioning arrays. Exemplarily a value of 3.0 means that the sum of the sizes of the matrices U and L can be maximum 3.0 times greater than the size of the LINSOL matrix MAT.GAUSS_THRES1
is the first threshold value for the factorization process (ILU algorithm); from $GAUSS_THRES1 a threshold value is computed so that about $GAUSS_THRES1*100 percent of all entries being smaller than the computed threshold value are removed when the matrix MAT is being copied into the preconditioning matrix.GAUSS_THRES2
is the second threshold value for the factorization process (ILU algorithm); below this value all data are removed when storing back the matrices U and L (L[T] and L), that are computed in the factorization window, to the preconditioning matrix A(P).LDROP_FACTOR
is the third threshold value that reduces the factorization time by omitting update terms within the factorization process under the following conditions. Only indices of the elimination coefficients with corresponding absolute elements greater than LDROP_FACTOR are used to update the factorization matrix (this holds for PRECONDITIONING_METHOD = {72|75|78}); only rows of the actual factorization window with elements of and below the pivot column greater than LDROP_FACTOR are updated (this holds for PRECONDITIONING_METHOD = {63|66|69|72|75|78}).MATRIX_NORMALIZED (MN)
indicates, if the matrix MAT is already normalized. (Comment: The variable is automatically increased, if the matrix has been normalized/preconditioned/reduced/bandwidth-optimized in a previous run of LINSOL with a different right hand side (see NUMBER_RHS and manual page <lsolp>).CHECK_SYSTEM (CS)MN = 0000 : matrix is not normalized, MN = 0001 : matrix is normalized.
SMOOTHEDCS = 0 : normalized system is considered for checking of stopping criterion, CS = 1 : original system is considered for checking of stopping criterion.
ITERATION_MAXIMUMSMOOTHED = 0 : iteration returns the non-smoothed solution, SMOOTHED <> 0 : iteration returns the smoothed solution.
Maximum number of iteration steps.OUTPUT_DEVICE (OD)
Specify if output messages are printed to stdout or to files:VERBOSE_LEVEL (VL)OD = 6 : print to stdout, OD = 7 : print to files with filenames stdout_{0001...0016} (e.g. for 16 procs).
Output controlMISCVL = 0 : print only error messages, VL = 1 : print further informations, but no output during iteration, VL = i>1 : output always after i iteration- steps on the processor with logical process(or) number 1, VL = -i<0 : it holds the same as for idoku>0 with the difference of output on all processors.
OPTIMIZE (OPT)MISC = 00000 : Standard LINSOL MISC = 00001 : Printing and checking - as far as possible - of vector terms after all optimization steps (see OPTIMIZE) (in file with name linout_{0001...0016} e.g. for 16 procs), MISC = 00002 : Printing and checking - as far as possible - of vector terms adapted to parallel processing after all optimization steps (see OPTIMIZE) and normalization and cache optimization (in file with name linout_{0001...0016} e.g. for 16 procs), MISC = 00003 : MISC = 00001 plus MISC = 00002, MISC = 00010 : Check, if recursions emerge (only of use on vector computers) <not yet implemented>, MISC = 00100 : Storing of original matrix (matrices) MAT in LINSOL-format (you must set MATRIX_OUTPUT_FORMAT=1); Note: optim values {1|2|3}11000 can modify the matrix, MISC = 00200 : Storing of original matrix (matrices) MAT in binary LINSOL-format (you must set MATRIX_OUTPUT_FORMAT=2); Note: optim values {1|2|3}11000 can modify the matrix, MISC = 00300 : Storing of normalized matrix (matrices) MAT adapted to parallel processing in LINSOL-format (you must set MATRIX_OUTPUT_FORMAT=1); Note: optim values {1|2|3}11000 can modify the matrix, MISC = 00400 : Storing of normalized matrix (matrices) MAT adapted to parallel processing in binary LINSOL-format (you must set MATRIX_OUTPUT_FORMAT=2); Note: optim values {1|2|3}11000 can modify the matrix, MISC = 01000 : Output of error, MISC = 10000 : Print residuals (and errors, if the output of errors is enabled) in file plot_<date>_<time>.
BANDW_ALG (BA)OPT = 0000000 : no optimization and no statistics, OPT = 0000001 : time statistics, OPT = 0000010 : printing of statistics regarding the data structures, OPT = 0000100 : safe cache reuse, i.e. arrays MAT and INDEX do not change, OPT = 0000200 : unsafe cache reuse, i.e. arrays MAT and INDEX do change, OPT = 0001000 : remove zero entries in matrix MAT, OPT = 0010000 : restore matrix MAT completely in storage pattern 'packed row', OPT = 0100000 : search single entries in rows and store them on the main diagonal by permuting the matrix MAT rowwise; if the matrix is not symmetric, then the matrix size is reduced by shifting entries of known solution components to the right hand side, OPT = 0200000 : bandwidth optimizer (BWO), OPT = 0300000 : search single entries in rows (see above) + BWO, OPT = 1000000 : optimization of data strucures; matrix will be restored in storage patterns 'full diagonal', 'packed diagonal' and 'starry sky' <not yet implemented>.
selects the bandwidth optimization algorithm.RUNTIME_ACCELERATION (RA)BA = 1 : Gibbs-Poole-Stockmeyer, BA = 2 : fast single pass algorithm, BA = 3 : reverse Cuthill/McKee, BA = 4 : reverse Cuthill/McKee (no iteration).
THRESHOLDRA = 0 : version with minimum storage requirements is used, RA = 1 : runtime is accelerated on vectorcomputers, but additional storage is allocated.
is the relative stopping factor for all iterative methods.TEMP_DIRLC1 : (||(preconditioned) smoothed residuum||2 <= epslin * ||(preconditioned) r.h.s.||2); noprec = 1 : LC2 : (||original smoothed residuum||max <= epslin * ||original r.h.s.||max), noprec = 0 : LC2 : (||(preconditioned) smoothed residuum||max <= epslin * ||(preconditioned) r.h.s.||max).If (LC1 .and. LC2) then the iteration is stopped. If (LC1 .and. .not. LC2) then a new stopping criterion is computed.
Specify a temporary directory for scratch files.BUFFER_SIZE
Size of the buffer in MByte that is used in data transfer operations.MATRIX_FACTOR
Specify by which factor the matrix size to be read in from file is enlarged. The parameter is only relevant on parallel computers.INDEX_FACTOR
Specify by which factor the size of the index array to be read in from file is enlarged. The parameter is only relevant on parallel computers.INFO_FACTOR
Specify by which factor the size of the index array to be read in from file is enlarged. The parameter is only relevant on parallel computers.
For sufficient diagonal dominance of the matrix this is a quickly converging method. If the norm of the residual decreases only in the first iteration steps, then choose one of the other methods. The residuals decrease monotonically.2a. BICO [1] smoothed, METHOD = 2, SMOOTHED <> 0
(BiConjugate Gradients)
BICO is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals decrease monotonically.2b. BCG [2] (BiConjugate Gradients), METHOD=2, SMOOTHED=0
BCG is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals oscillate heavily. Therefore, use BICO or QMR.3a. Bi-CGSTAB smoothed [3,8] , METHOD = 3, SMOOTHED <> 0
The Bi-Conjugate Gradient STABilized (Bi-CGSTAB) method is a modification of CGS that should be more stable. The residuals decrease monotonically.3b. Bi-CGSTAB [3] , METHOD = 3, SMOOTHED = 0
This is a modification of CGS that should be more stable.4a. AT-PRES = CGNR [4] , METHOD = 4, SMOOTHED <> 0
The A_Transposed-Pseudo_RESidual (AT-PRES) method is aequivalent to the CG Normal equations Residual-minimizing (CGNR) method and should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The residuals decrease monotonically.4b. CGNE [5], METHOD = 4, SMOOTHED = 0
(CG Normal equation Error-minimizing)
CGNE should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The errors decrease monotonically.5a. CGS smoothed, METHOD = 5, SMOOTHED <> 0
(Conjugate Gradient Squared)
The smoothed CGS [6,8] is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers. The residuals decrease monotonically.5b. CGS [6], METHOD = 5, SMOOTHED = 0
(Conjugate Gradient Squared)
CGS is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers.6. QMR simplified [7,9], METHOD = 6, SMOOTHED <> 0
The Quasi-Minimal Residual (QMR) method behaves similar as BICO. In this implementation the look-ahead process for avoiding break-downs is omitted.7a. Bi-CGSTAB2 smoothed, METHOD = 7, SMOOTHED <> 0
The 2-dimensional optimizing Bi-Conjugate Gradient STABilized (Bi-CGSTAB2) method is a modification of Bi-CGSTAB that should be more stable. The residuals decrease monotonically.7b. Bi-CGSTAB2, METHOD = 7, SMOOTHED = 0
The residuals do not decrease monotonically during the iteration process of the non-smoothed variant of Bi-CGSTAB2.9. CG-AT (CG with matrix A*A^T), METHOD = 9
LINSOL computes the linear system (MAT)(MAT^T)x=b; you have only to hand over the matrix MAT to LINSOL. The matrix MAT must not be symmetric. The matrix (MAT)(MAT^T) then is symmetric; thus LINSOL chooses the CG method as solution process.>=200. Polyalgorithms [1]
The Polyalgorithms try to exploit the advantages of the methods PRES20, Bi-CGSTAB2, GMERR5 (GMERR20), AT-PRES and CG, GMERRS for symmetric matrices respectively. Exemplarily the idea will be expIained for the Polyalgorithm 200. It starts with the most efficient - but least robust - method PRES20. If PRES20 falls asleep, it switches to Bi-CGSTAB2. If Bi-CGSTAB2 does not converge fast enough, the Polyalgorithm changes to the "emergency exit" AT-PRES. The Polyalgorithm should be used if no detailed knowledge of an optimal iterative method exists. In 95% of all cases the Polyalgorithm is as fast as the best method in the set {PRES20,Bi-CGSTAB2,AT-PRES}.15. GMERR5 (General Minimal ERRor 5), METHOD = 15
GMERR5 is the generalization for arbitrary matrices of Fridman's algorithm with a restart after every 5th iteration step.16. GMERR20 (General Minimal ERRor 20), METHOD = 16
GMERR20 is the generalization for arbitrary matrices of Fridman's algorithm with a restart after every 20th iteration step.If you use the above mentioned methods, the matrix must not be stored in the symmetric storage scheme.
The smoothed Conjugate Gradient (CG) method is mathematically aequivalent to the GMRES method.100b. Classical CG, METHOD = 100, SMOOTHED = 0
101. GMERRS, METHOD = 101
GMERRS (General Minimal ERRor for Symmetric matrices) does not have to be restarted because the residuals being more than 2 iteration steps down from the actual iteration step must be zero. GMERRS should be competitive with GMRES for symmetric matrices.
W. Schoenauer, Scientific Computing on Vector Computers, North-Holland, 1987, Amsterdam, New York, Oxford, Tokyo.[2]
R. Fletcher, Conjugate Gradient Methods for Indefinite Systems, Proc. Dundee Biennial Conf. on Num. Anal., 1975, ed. G.A. Watson, pp. 73-89, Springer-Verlag.[3]
H.A. van der Vorst, BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comp., 1992, Vol. 13, No. 2, pp. 631-644.[4]
W. Schoenauer, M. Schlichte, R. Weiss, Wrong Ways and Promising Ways towards Robust and Efficient Iterative Linear Solvers, Advances in Computer Methods for Partial Differential Equations VI, Publ. IMACS, 1987, pp. 7-14.[5]
E.J. Craig, The N-Step Iteration Procedures, Math. Phys., 1955, Vol. 34, pp. 64--73.[6]
P. Sonneveld, CGS, A fast Lanczos-type Solver for nonsymmetric linear Systems, SIAM J. Sci. Stat. Comp., 1989, Vol. 10, No.1, pp. 36-52.[7]
R.W. Freund and N.M. Nachtigal, QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems, Numer. Math., 1991, Vol. 60, pp. 315--339.,[8]
R. Weiss, Properties of Generalized Conjugate Gradient Methods, Num. Lin. Alg. with Appl., 1994, Vol. 1, No. 1, pp. 45--63.[9]
L. Zhou and H.F. Walker, Residual Smoothing Techniques for Iterative Methods, SIAM J. Sci. Comput., 1994, Vol. 15, No. 2, pp. 297--312.
Consulting by
Numerikforschung fuer Supercomputer Rechenzentrum der Universitaet Karlsruhe Am Zirkel 2 D-76128 Karlsruhe Germany Tel. : (+721) 608/4869 Fax. : (+721) 32550 e-mail : haefner@rz.uni-karlsruhe.de