linnox <parameter-file>
starts the executable LINSOL (job) interactively on 1 processor. The shellscript linnox starting the job has the following parameters:
-p <procs> : job is run on <procs> processors,
-n <nodes> : job is run on <nodes> nodes,
-d : debug option - is usually not available
to the users,
!!!Starting the job as a batch job the following parameters
usually must be declared!!!
-b : the job is run as batch job,
-c <jobclass> : the job is submitted to the queue
<jobclass>,
-T <time_limit> : time limit in minutes for the batch job,
-M <mem_limit> : memory limit in MBytes for the batch job.
E.g. if you want to start a batch job on 4 processors in the queue <parallel> with a time limit of 100 minutes and a memory limit of 500 MBytes, you have to enter
linnox -b -p 4 -c parallel -T 100 -M 500 <parameter-file>
Examples for parameter-files can be found under
http::http://www.rz.uni-karlsruhe.de/~linsol/examples.
The parameter-files exam01_param, gre512_param and oilgen_param can be used as template.
INPUT_FORMAT = 0 : matrix is read in unformatted
(binary) LINSOL file format,
INPUT_FORMAT = 1 : matrix is read in formatted LINSOL
file format,
INPUT_FORMAT = 2 : matrix is read in Harwell/Boeing
file format.
MATRIX_DISTRIBUTED
MATRIX_DISTRIBUTED = 0 : matrix is read from a single file,
MATRIX_DISTRIBUTED = 1 : each processor reads the file with
the own processor number as suffix
of the filename of the matrix.
MATRIX_INPUT_FILENAME
Specify the filename of the matrix to be read with full directory path.MATRIX_OUTPUT_FORMAT
MATRIX_OUTPUT_FORMAT = 0 : matrix will not be stored in a
file,
MATRIX_OUTPUT_FORMAT = 1 : matrix is stored in the formatted
LINSOL file format,
MATRIX_OUTPUT_FORMAT = 2 : matrix is stored in the
unformatted (binary) LINSOL file
format.
MATRIX_OUTPUT_FILENAME
Specify the filename of the matrix to be written with full directory path.NUMBER_RHS
NUMBER_RHS > 0 : means that NUMBER_RHS linear equations
will be calculated with NUMBER_RHS
right hand sides by reading them from
file
NUMBER_RHS = 0 : means that the right hand side is taken
from the same file as the matrix
(works only with matrices stored in
H/B format) <<< not yet available!!!,
NUMBER_RHS = -1 : means that the right hand side is
calculated automatically by an implicit
choice of a solution vector.
RHS_INPUT_FILENAME
Specify the filename of the right hand side(s) with full directory path; must be specified only, if NUMBER_RHS > 0.INITGUESS
INITGUESS = 0 : no initial guess(es) is (are) used,
INITGUESS = 1 : initial guess(es) is (are) used; they
will be read from file
INITGUESS_INPUT_FILENAME
Specify the filename of the initial guess(es) with full directory path; must be specified only, if INITGUESS = 1.SOLUTIONS_FILENAME
Specify the filename of the solution vector(s) with full directory path.METHOD
Selection of iterative method.
METHOD = 1 : PRES20
METHOD = 2 : BICO
METHOD = 3 : Bi-CGSTAB
METHOD = 4 : AT-PRES
METHOD = 5 : CGS
METHOD = 6 : QMR-Simulator
METHOD = 7 : Bi-CGSTAB2
METHOD = 9 : CG-AT for non-symmetric matrices
METHOD = 15 : GMERR5
METHOD = 16 : GMERR20
METHOD = 100 : Classical CG for symmetric matrices.
METHOD = 101 : GMERRS for symmetric matrices.
METHOD = 200 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> AT-PRES
METHOD = 201 : Polyalg. PRES20 <-> Bi-CGSTAB2 <-> AT-PRES
METHOD = 202 : Polyalgorithm
PRES20 -> Bi-CGSTAB2 -> GMERR5 -> AT-PRES
METHOD = 203 : Polyalgorithm
PRES20 -> Bi-CGSTAB2 -> GMERR5 <-> AT-PRES
METHOD = 204 : Polyalgorithm
PRES20 -> Bi-CGSTAB2 -> GMERR20 -> AT-PRES
METHOD = 210 : Polyalgorithm GMERR5 -> AT-PRES
METHOD = 211 : Polyalgorithm GMERR5 <-> AT-PRES
METHOD = 212 : Polyalgorithm GMERR20 -> AT-PRES
METHOD = 220 : Polyalgorithm GMERRS -> CG
for symmetric matrices
METHOD = 221 : Polyalgorithm GMERRS <-> CG
for symmetric matrices
METHOD = 999 : {TEST-INTERFACE}.
NORMALIZATION_METHOD (NM)
selects the method of normalization. Note: "A" stands
for a regular stored matrix.
NM = 0 : no normalization,
NM = 1 : normalization by row sum
(prec(i) = sign A_ii/sum abs(A_ij),j=1,l),
NM = 2 : normalization by main diagonal
elements (prec(i) = 1/A_ii),
NM = 3 : normalization by square sum
(prec(i) = 1/sqrt(sum A_ij**2,j=1,l)),
NM = 4 : Froboenius normalization
(prec(i) = A_ii/sum A_ij**2,j=1,l),
NM = 11 : same as 1-4, but with square root of
NM = 12 : normalization matrix from left
NM = 13 : and right (automatically selected if
NM = 14 : the matrix MAT is symmetric).
PRECONDITIONING_METHOD (PM)
selects the method of preconditioning; works in connection with the
iterative methods PRES20, BICO and Bi-CGSTAB2, if the matrix is not
symmetric; works in connection with the classical CG method, if the
matrix is symmetric (the Cholesky algorithm works up-to-now only on
1 processor).
PM = 0 : no preconditioning,
PM = 63 : preconditioning from the middle by
(I)LU algorithm [(Incomplete) Gauss algorithm)]
or
(I)L[T]L [(Incomplete) Cholesky algorithm)],
this option implies OPTIMIZE = 111000,
PM = 66 : preconditioning from the right by
(I)LU algorithm [(Incomplete) Gauss algorithm)]
or
(I)L[T]L [(Incomplete) Cholesky algorithm)],
this option implies OPTIMIZE = 111000,
PM = 69 : preconditioning from the left by
(I)LU algorithm [(Incomplete) Gauss algorithm)]
or
(I)L[T]L [(Incomplete) Cholesky algorithm)],
this option implies OPTIMIZE = 111000,
PM = 72 : preconditioning from the middle by
(I)LU algorithm on indexed data,
this option implies optim = 111000,
PM = 75 : preconditioning from the right by
(I)LU algorithm on indexed data,
this option implies optim = 111000,
PM = 78 : preconditioning from the left by
(I)LU algorithm on indexed data,
this option implies optim = 111000.
GAUSS_FACTOR
specifies the fill-in factor for the preconditioning by GAUSS. A positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable GAUSS_FACTOR is used for the allocation. If GAUSS_FACTOR is less than zero, then ABS(GAUSS_FACTOR) is immediately used for the allocation of the preconditioning arrays. Exemplarily a value of 3.0 means that the sum of the sizes of the matrices U and L can be maximum 3.0 times greater than the size of the LINSOL matrix MAT.GAUSS_THRES1
is the first threshold value for the factorization process (ILU algorithm); from $GAUSS_THRES1 a threshold value is computed so that about $GAUSS_THRES1*100 percent of all entries being smaller than the computed threshold value are removed when the matrix MAT is being copied into the preconditioning matrix.GAUSS_THRES2
is the second threshold value for the factorization process (ILU algorithm); below this value all data are removed when storing back the matrices U and L (L[T] and L), that are computed in the factorization window, to the preconditioning matrix A(P).LDROP_FACTOR
is the third threshold value that reduces the factorization time by omitting update terms within the factorization process under the following conditions. Only indices of the elimination coefficients with corresponding absolute elements greater than LDROP_FACTOR are used to update the factorization matrix (this holds for PRECONDITIONING_METHOD = {72|75|78}); only rows of the actual factorization window with elements of and below the pivot column greater than LDROP_FACTOR are updated (this holds for PRECONDITIONING_METHOD = {63|66|69|72|75|78}).MATRIX_NORMALIZED (MN)
indicates, if the matrix MAT is already normalized.
(Comment: The variable is automatically increased, if the matrix has been
normalized/preconditioned/reduced/bandwidth-optimized in a previous run
of LINSOL with a different right hand side (see NUMBER_RHS and manual
page <lsolp>).
MN = 0000 : matrix is not normalized,
MN = 0001 : matrix is normalized.
CHECK_SYSTEM (CS)
CS = 0 : normalized system is considered
for checking of stopping criterion,
CS = 1 : original system is considered
for checking of stopping criterion.
SMOOTHED
SMOOTHED = 0 : iteration returns the non-smoothed
solution,
SMOOTHED <> 0 : iteration returns the smoothed
solution.
ITERATION_MAXIMUM
Maximum number of iteration steps.OUTPUT_DEVICE (OD)
Specify if output messages are printed to stdout or to files:
OD = 6 : print to stdout,
OD = 7 : print to files with filenames
stdout_{0001...0016} (e.g. for 16 procs).
VERBOSE_LEVEL (VL)
Output control
VL = 0 : print only error messages,
VL = 1 : print further informations, but no
output during iteration,
VL = i>1 : output always after i iteration-
steps on the processor with
logical process(or) number 1,
VL = -i<0 : it holds the same as for idoku>0
with the difference of output
on all processors.
MISC
MISC = 00000 : Standard LINSOL
MISC = 00001 : Printing and checking - as far
as possible - of vector terms after all
optimization steps (see OPTIMIZE)
(in file with name linout_{0001...0016}
e.g. for 16 procs),
MISC = 00002 : Printing and checking - as far
as possible - of vector terms adapted
to parallel processing after all
optimization steps (see OPTIMIZE)
and normalization and cache optimization
(in file with name linout_{0001...0016}
e.g. for 16 procs),
MISC = 00003 : MISC = 00001 plus MISC = 00002,
MISC = 00010 : Check, if recursions emerge
(only of use on vector computers)
<not yet implemented>,
MISC = 00100 : Storing of original matrix
(matrices) MAT in LINSOL-format
(you must set MATRIX_OUTPUT_FORMAT=1);
Note: optim values {1|2|3}11000
can modify the matrix,
MISC = 00200 : Storing of original matrix
(matrices) MAT in binary LINSOL-format
(you must set MATRIX_OUTPUT_FORMAT=2);
Note: optim values {1|2|3}11000
can modify the matrix,
MISC = 00300 : Storing of normalized matrix
(matrices) MAT adapted
to parallel processing in LINSOL-format
(you must set MATRIX_OUTPUT_FORMAT=1);
Note: optim values {1|2|3}11000
can modify the matrix,
MISC = 00400 : Storing of normalized matrix
(matrices) MAT adapted
to parallel processing in binary
LINSOL-format (you must set
MATRIX_OUTPUT_FORMAT=2);
Note: optim values {1|2|3}11000
can modify the matrix,
MISC = 01000 : Output of error,
MISC = 10000 : Print residuals (and errors, if
the output of errors is enabled)
in file plot_<date>_<time>.
OPTIMIZE (OPT)
OPT = 0000000 : no optimization and no statistics,
OPT = 0000001 : time statistics,
OPT = 0000010 : printing of statistics regarding the
data structures,
OPT = 0000100 : safe cache reuse, i.e. arrays
MAT and INDEX do not change,
OPT = 0000200 : unsafe cache reuse, i.e. arrays
MAT and INDEX do change,
OPT = 0001000 : remove zero entries in matrix MAT,
OPT = 0010000 : restore matrix MAT completely in
storage pattern 'packed row',
OPT = 0100000 : search single entries in rows and
store them on the main diagonal by
permuting the matrix MAT rowwise;
if the matrix is not symmetric, then
the matrix size is reduced by
shifting entries of known solution
components to the right hand side,
OPT = 0200000 : bandwidth optimizer (BWO),
OPT = 0300000 : search single entries in rows
(see above) + BWO,
OPT = 1000000 : optimization of data strucures;
matrix will be restored in storage
patterns 'full diagonal',
'packed diagonal' and 'starry sky'
<not yet implemented>.
BANDW_ALG (BA)
selects the bandwidth optimization algorithm.
BA = 1 : Gibbs-Poole-Stockmeyer,
BA = 2 : fast single pass algorithm,
BA = 3 : reverse Cuthill/McKee,
BA = 4 : reverse Cuthill/McKee (no iteration).
RUNTIME_ACCELERATION (RA)
RA = 0 : version with minimum storage requirements
is used,
RA = 1 : runtime is accelerated on vectorcomputers,
but additional storage is allocated.
THRESHOLD
is the relative stopping factor for all iterative methods.
LC1 : (||(preconditioned) smoothed residuum||2
<= epslin * ||(preconditioned) r.h.s.||2);
noprec = 1 :
LC2 : (||original smoothed residuum||max
<= epslin * ||original r.h.s.||max),
noprec = 0 :
LC2 : (||(preconditioned) smoothed residuum||max
<= epslin * ||(preconditioned) r.h.s.||max).
If (LC1 .and. LC2) then the iteration is stopped.
If (LC1 .and. .not. LC2) then a new stopping criterion is computed.
TEMP_DIR
Specify a temporary directory for scratch files.BUFFER_SIZE
Size of the buffer in MByte that is used in data transfer operations.MATRIX_FACTOR
Specify by which factor the matrix size to be read in from file is enlarged. The parameter is only relevant on parallel computers.INDEX_FACTOR
Specify by which factor the size of the index array to be read in from file is enlarged. The parameter is only relevant on parallel computers.INFO_FACTOR
Specify by which factor the size of the index array to be read in from file is enlarged. The parameter is only relevant on parallel computers.
For sufficient diagonal dominance of the matrix this is a quickly converging method. If the norm of the residual decreases only in the first iteration steps, then choose one of the other methods. The residuals decrease monotonically.2a. BICO [1] smoothed, METHOD = 2, SMOOTHED <> 0
(BiConjugate Gradients)
BICO is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals decrease monotonically.2b. BCG [2] (BiConjugate Gradients), METHOD=2, SMOOTHED=0
BCG is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals oscillate heavily. Therefore, use BICO or QMR.3a. Bi-CGSTAB smoothed [3,8] , METHOD = 3, SMOOTHED <> 0
The Bi-Conjugate Gradient STABilized (Bi-CGSTAB) method is a modification of CGS that should be more stable. The residuals decrease monotonically.3b. Bi-CGSTAB [3] , METHOD = 3, SMOOTHED = 0
This is a modification of CGS that should be more stable.4a. AT-PRES = CGNR [4] , METHOD = 4, SMOOTHED <> 0
The A_Transposed-Pseudo_RESidual (AT-PRES) method is aequivalent to the CG Normal equations Residual-minimizing (CGNR) method and should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The residuals decrease monotonically.4b. CGNE [5], METHOD = 4, SMOOTHED = 0
(CG Normal equation Error-minimizing)
CGNE should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The errors decrease monotonically.5a. CGS smoothed, METHOD = 5, SMOOTHED <> 0
(Conjugate Gradient Squared)
The smoothed CGS [6,8] is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers. The residuals decrease monotonically.5b. CGS [6], METHOD = 5, SMOOTHED = 0
(Conjugate Gradient Squared)
CGS is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers.6. QMR simplified [7,9], METHOD = 6, SMOOTHED <> 0
The Quasi-Minimal Residual (QMR) method behaves similar as BICO. In this implementation the look-ahead process for avoiding break-downs is omitted.7a. Bi-CGSTAB2 smoothed, METHOD = 7, SMOOTHED <> 0
The 2-dimensional optimizing Bi-Conjugate Gradient STABilized (Bi-CGSTAB2) method is a modification of Bi-CGSTAB that should be more stable. The residuals decrease monotonically.7b. Bi-CGSTAB2, METHOD = 7, SMOOTHED = 0
The residuals do not decrease monotonically during the iteration process of the non-smoothed variant of Bi-CGSTAB2.9. CG-AT (CG with matrix A*A^T), METHOD = 9
LINSOL computes the linear system (MAT)(MAT^T)x=b; you have only to hand over the matrix MAT to LINSOL. The matrix MAT must not be symmetric. The matrix (MAT)(MAT^T) then is symmetric; thus LINSOL chooses the CG method as solution process.>=200. Polyalgorithms [1]
The Polyalgorithms try to exploit the advantages of the methods PRES20, Bi-CGSTAB2, GMERR5 (GMERR20), AT-PRES and CG, GMERRS for symmetric matrices respectively. Exemplarily the idea will be expIained for the Polyalgorithm 200. It starts with the most efficient - but least robust - method PRES20. If PRES20 falls asleep, it switches to Bi-CGSTAB2. If Bi-CGSTAB2 does not converge fast enough, the Polyalgorithm changes to the "emergency exit" AT-PRES. The Polyalgorithm should be used if no detailed knowledge of an optimal iterative method exists. In 95% of all cases the Polyalgorithm is as fast as the best method in the set {PRES20,Bi-CGSTAB2,AT-PRES}.15. GMERR5 (General Minimal ERRor 5), METHOD = 15
GMERR5 is the generalization for arbitrary matrices of Fridman's algorithm with a restart after every 5th iteration step.16. GMERR20 (General Minimal ERRor 20), METHOD = 16
GMERR20 is the generalization for arbitrary matrices of Fridman's algorithm with a restart after every 20th iteration step.If you use the above mentioned methods, the matrix must not be stored in the symmetric storage scheme.
The smoothed Conjugate Gradient (CG) method is mathematically aequivalent to the GMRES method.100b. Classical CG, METHOD = 100, SMOOTHED = 0
101. GMERRS, METHOD = 101
GMERRS (General Minimal ERRor for Symmetric matrices) does not have to be restarted because the residuals being more than 2 iteration steps down from the actual iteration step must be zero. GMERRS should be competitive with GMRES for symmetric matrices.
W. Schoenauer, Scientific Computing on Vector Computers, North-Holland, 1987, Amsterdam, New York, Oxford, Tokyo.[2]
R. Fletcher, Conjugate Gradient Methods for Indefinite Systems, Proc. Dundee Biennial Conf. on Num. Anal., 1975, ed. G.A. Watson, pp. 73-89, Springer-Verlag.[3]
H.A. van der Vorst, BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comp., 1992, Vol. 13, No. 2, pp. 631-644.[4]
W. Schoenauer, M. Schlichte, R. Weiss, Wrong Ways and Promising Ways towards Robust and Efficient Iterative Linear Solvers, Advances in Computer Methods for Partial Differential Equations VI, Publ. IMACS, 1987, pp. 7-14.[5]
E.J. Craig, The N-Step Iteration Procedures, Math. Phys., 1955, Vol. 34, pp. 64--73.[6]
P. Sonneveld, CGS, A fast Lanczos-type Solver for nonsymmetric linear Systems, SIAM J. Sci. Stat. Comp., 1989, Vol. 10, No.1, pp. 36-52.[7]
R.W. Freund and N.M. Nachtigal, QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems, Numer. Math., 1991, Vol. 60, pp. 315--339.,[8]
R. Weiss, Properties of Generalized Conjugate Gradient Methods, Num. Lin. Alg. with Appl., 1994, Vol. 1, No. 1, pp. 45--63.[9]
L. Zhou and H.F. Walker, Residual Smoothing Techniques for Iterative Methods, SIAM J. Sci. Comput., 1994, Vol. 15, No. 2, pp. 297--312.
Consulting by
Numerikforschung fuer Supercomputer
Rechenzentrum der Universitaet Karlsruhe
Am Zirkel 2
D-76128 Karlsruhe
Germany
Tel. : (+721) 608/4869
Fax. : (+721) 32550
e-mail : haefner@rz.uni-karlsruhe.de