call LSOLP(lmat,lindex,l,nproc,lsym,ia1,info,
MAT,x,b,index,rlin,ilin,lmatbk,tids,iconv,ierr)
integer, parameter :: ia2=7, nilin=50, nrlin=10integer, intent(in) :: ia1,lmat,lindex,l,nproc
integer, intent(out) :: iconv,ierr
integer, intent(in) :: lmatbk(nproc),tids(nproc)
integer, intent(inout) :: index(lindex),info(ia1,ia2)
integer, intent(inout) :: ilin(nilin)
double precision, intent(inout) :: MAT(lmat),x(l)
double precision, intent(inout) :: b(l),rlin(nrlin)
logical, intent(in) :: lsym
lmat integer, scalar, input, local
Length of (local) matrix array MAT. IMPORTANT: if (optim >= 10000) then lmat must be enlarged by a factor > 1.lindex integer, scalar, input, local
Length of the integer array index. IMPORTANT: if (optim >= 10000) then lindex must have the same size as lmat.l integer, scalar, input, global
Maximal number of unknowns on a processor (it must hold: l = max{lmatbk(i),i=1,..,nproc}).nproc integer, scalar, input, global <<< *PAR
Number of processors (processes).lsym logical, scalar, input, global
is .true., if the matrix MAT is symmetric.ia1 integer, scalar, input, local
First dimension of matrix information array info. IMPORTANT: if (optim >= 10000) then ia1 must be set >= l.info integer, array: info(ia1,ia2), input, local
Information array for the matrix MAT (ia2 {= 7} is a constant).MAT double precision, array: MAT(lmat), in/out, local
Matrix array.x double precision, array: x(l), in/out, local
Solution vector.b double precision, array: b(l), in/out, local
Right hand side.index integer, array: index(lindex), input, local
Array of indices allowing indirect access to the matrix array MAT.rlin double precision, array: rlin(nrlin), input, global
Information vector for double precision values (nrlin {=10} is a constant).ilin integer, array: ilin(nilin), in/out, global(1) = epslin, input, global
epslin is the relative stopping factor for all iterative methods (corresponds to THRESHOLD in manual page LINSOL).(2) = threshold1, input, globallcond1= (||(preconditioned) smoothed residuum||2 less or equal than epslin * ||(preconditioned) r.h.s.||2); noprec=1: lcond2= (||original smoothed residuum||max less or equal than epslin * ||original r.h.s.||max) noprec=0: lcond2= (||(preconditioned) smoothed residuum||max less or equal than epslin * ||(preconitioned) r.h.s.||max)If (lcond1 .and. lcond2) then the iteration is stopped. If (lcond1 .and. .not. lcond2) then a new stopping criterion is computed.threshold1 is the first threshold value within the overall Gauss algorithm; below this value all data are removed when the matrix MAT is being copied into the preconditioning matrix (corresponds to GAUSS_THRES1 in manual page LINSOL).(3) = threshold2, input, globalthreshold2 is the second threshold value within the overall Gauss algorithm; below this value all data are removed when the buffered matrix is stored back during the factorization process (corresponds to GAUSS_THRES2 in manual page LINSOL). If threshold2 = 1.0 is set, data are only stored back to the factorized matrix, if they are also elements of the original preconditioning matrix (pattern storing), i.e. the memory request of the preconditioned matrix is as high as the memory request of the original preconditioning matrix. If fImsprec = {63,66,69} is set, full LU factorization is performed in the buffer: >>> if fImsprec = {72,75,78} is set, the factorization is only performed on the pattern of the original preconditioning matrix <<< IMPLEMENTED ONLY FOR 1 PROCESSOR!(4) = gaussfac, input, globalgaussfac specifies the fill-in factor for the preconditioning by GAUSS. A positive value means that the fill-in factor is determined automatically. If this value is too large for an allocation of the preconditioning array, the variable gaussfac is used for the allocation. If gaussfac is less than zero, then ABS(gaussfac) is immediately used for the allocation of the preconditioning arrays. Exemplarily a value of -2.0 means that the preconditioning matrix can be enlarged by factor 2.0 by fill in (corresponds to GAUSS_FACTOR in manual page LINSOL).(5) = memfac, input, globalmemfac specifies the threshold value for packing of full storage patterns. A value of 0.66 means that all vector terms of type {2,7,8} are packed and stored as vector terms of type {3,9,11}, if less than 66 % of the values of the vector terms are nonzero (corresponds to MEMORY_FACTOR in manual page LINSOL) <<< NOT YET IMPLEMENTED!!!.(6) = ldrop, input, globalldrop is a threshold value that reduces the factorization time by omitting update terms under the following conditions. Only indices of the elimination coefficients with corresponding absolute elements greater than ldrop are used to update the factorization matrix (this holds for msprec = {72,75,78}); only rows of the actual factorization window with elements of and below the pivot column greater than ldrop\f are updated (this holds for msprec = {63,66,69,72,75,78}). ldrop corresponds to LDROP_FACTOR in manual page LINSOL.(9) = internal use, globalGlobal size of preconditioning matrix L, if nproc>1.(10) = internal use, globalGlobal size of preconditioning matrix U, if nproc>1.
Information vector (nilin {=50} is a constant).lmatbk integer, array: lmatbk(nproc), input, global(1) = ms, input, global
Method selection(2) = itmax, input, globalms = 1 : PRES20 ms = 2 : BICO ms = 3 : Bi-CGSTAB ms = 4 : AT-PRES ms = 5 : CGS ms = 6 : QMR-Simulator ms = 7 : Bi-CGSTAB2 ms = 9 : CG-AT for non-symmetric matrices ms =15 : GMERR5 ms =16 : GMERR20 ms =100 : Classical CG for symmetric matrices. ms =101 : GMERRS for symmetric matrices. ms =200 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> AT-PRES ms =201 : Polyalg. PRES20 <-> Bi-CGSTAB2 <-> AT-PRES ms =202 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> GMERR5 -> AT-PRES ms =203 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> GMERR5 <-> AT-PRES ms =204 : Polyalgorithm PRES20 -> Bi-CGSTAB2 -> GMERR20 -> AT-PRES ms =210 : Polyalgorithm GMERR5 -> AT-PRES ms =211 : Polyalgorithm GMERR5 <-> AT-PRES ms =212 : Polyalgorithm GMERR20 -> AT-PRES ms =220 : Polyalgorithm GMERRS -> CG for symmetric matrices. ms =221 : Polyalgorithm GMERRS <-> CG for symmetric matrices. ms =999 : {TEST-INTERFACE}Maximal number of iteration steps (corresponds to ITERATION_MAXIMUM in manual page LINSOL).(3) = nmsg, in/out, global <<< *PARMessage counter counting the number of communication units.(4) = nrhs, input, globalNumber of right hand sides. A value of -1 means that the the right hand side is computed automatically (corresponds to NUMBER_RHS in manual page LINSOL).(5) = nvt, input, localNumber of used vector terms.(6) = isprec, in/out, globalindicates, if the matrix MAT is already normalized and/or preconditioned and/or reduced by the pre-forward elimination routine (corresponds to MATRIX_NORMALIZED in manual page LINSOL). The variable is automatically increased, if the matrix has been normalized/preconditioned/reduced in a previous call of LSOLP.(7) = ibuf_mb, input, globalisprec = 000 : matrix is not normalized and not preconditioned, isprec = 001 : matrix is normalized and not preconditioned, isprec = 010 : matrix is not normalized and preconditioned, isprec = 011 : matrix is normalized and preconditioned, isprec = 1yz : matrix is additionally reduced by a pre-forward elimination step. isprec = 1xyz : matrix is additionally bandwidth- optimized (x,y,z elem {0,1}). isprec = 2000 : use of the permutation vector created in a former bandwidth optimization process; can be used if the structure of the matrix does not change!ibuf_mb specifies the buffer size in MB for temporary buffers (corresponds to BUFFER_SIZE in manual page LINSOL).(8) = msprec, input, globalselects the method of preconditioning (corresponds to PRECONDITIONING_METHOD in manual page LINSOL); works in connection with the iterative methods PRES20, BICO and Bi-CGSTAB2, if the matrix is not symmetric; works in connection with the classical CG method, if the matrix is symmetric (the Cholesky algorithm works up-to-now only on 1 processor). (The preconditioning matrices are stored as one-dimensional arrays in prec and accessed via information array iprec).(9) = noprec, input, globalmsprec = 0 : no preconditioning, msprec = 63 : preconditioning from the middle by (I)LU algorithm [(Incomplete) Gauss algorithm)] or (I)L[T]L [(Incomplete) Cholesky algorithm)], this option implies optim = 111000, msprec = 66 : preconditioning from the right by (I)LU algorithm [(Incomplete) Gauss algorithm)] or (I)L[T]L [(Incomplete) Cholesky algorithm)], this option implies optim = 111000, msprec = 69 : preconditioning from the left by (I)LU algorithm [(Incomplete) Gauss algorithm)] or (I)L[T]L [(Incomplete) Cholesky algorithm)], this option implies optim = 111000, msprec = 72 : preconditioning from the middle by (I)LU algorithm on indexed data, this option implies optim = 111000, msprec = 75 : preconditioning from the right by (I)LU algorithm on indexed data, this option implies optim = 111000, msprec = 78 : preconditioning from the left by (I)LU algorithm on indexed data, this option implies optim = 111000.(10) = nmvm, output, globalnoprec = 0 : normalized system is considered for checking of stopping criterion, noprec = 1 : original system is considered for checking of stopping criterion.(noprec corresponds to CHECK_SYSTEM in manual page LINSOL.)Number of computed matrix-vector multiplications.(11) = internal use, output, globalreturns the number of steps used by the last called method.(12) = lout, input, localUnit number of the standard output file (default is 6)(corresponds to OUTPUT_DEVICE in manual page LINSOL).(13) = idoku, input, globalOutput control (corresponds to VERBOSE_LEVEL in manual page LINSOL).(14) = isave, input, globalidoku = 0 : print only error messages, idoku = 1 : print further informations, but no output during iteration, idoku = i>1 : output always after i iteration- steps on the processor with logical process(or) number 1, idoku = -i<0 : it holds the same as for idoku>0 with the difference of output on all processors.(15) = msnorm, input, globalisave = 0 : arrays prec, iprec and perm are not saved beyond the routine LSOLP, i.e. preconditioning matrix and preconditioning index and information arrays and permutation vector created by bandwidth optimization and eventually matrix entries that are shifted to the right hand side by the pre-forward elimination routine will not be available in the next call of LSOLP, isave = 1 : all above mentioned arrays (prec and iprec and perm) are saved beyond the routine LSOLP, isave = 2 : only the permutation vector perm is saved beyond the routine LSOLP; you can set ilin(6)=2000 before calling LSOLP next time, if the structure of your matrix does not change!selects the method of normalization. Note: "A" stands for a usually stored matrix (matrix MAT is stored as a one-dimensional array and accessed via array info)(corresponds to NORMALIZATION_METHOD in manual page LINSOL).(16) = startx, input, globalmsnorm = 0 : no normalization, msnorm = 1 : normalization by row sum (prec(i) = sign A_ii/sum abs(A_ij),j=1,l), msnorm = 2 : normalization by main diagonal elements (prec(i) = 1/A_ii), msnorm = 3 : normalization by square sum (prec(i) = 1/sqrt(sum A_ij**2,j=1,l)), msnorm = 4 : Froboenius normalization (prec(i) = A_ii/sum A_ij**2,j=1,l), msnorm = 11 : same as 1-4, but with square root of msnorm = 12 : preconditioning matrix from left msnorm = 13 : and right (automatically selected if msnorm = 14 : the matrix MAT is symmetric).(17) = myproc, input, local <<< **PARstartx <> 4711 : iteration starts from zero vector, startx = 4711 : an initial guess is given in x.(startx corresponds to INITGUESS in manual page LINSOL.)Logical process(or) number.(18) = smooth, input, global(19) = misc, input, globalsmooth = 0 : iteration returns the non-smoothed solution, smooth <> 0 : iteration returns the smoothed solution.(smooth corresponds to SMOOTHED in manual page LINSOL.)(20) = optim, input, globalmisc = 00000 : Standard LSOLP, misc = 00001 : Printing and checking - as far as possible - of vector terms after all optimization steps (see optim=ilin(20)) (you can set ilin(41)), Note: optim values {1|2|3}11000 can modify the matrix, misc = 00002 : Printing and checking - as far as possible - of vector terms adapted to parallel processing after all optimization steps (see optim=ilin(20)) and normalization and cache optimization (you can set ilin(41)), Note: optim values {1|2|3}11100 can modify the matrix, misc = 00003 : misc = 00001 plus misc = 00002 misc = 00010 : Check, if recursions emerge (only of use on vector computers) <not yet implemented>, misc = 00100 : Storing of matrix (matrices) MAT in LINSOL-format after all optimization steps (you can set ilin(43)), Note: optim values {1|2|3}11000 can modify the matrix, misc = 00200 : Storing of matrix (matrices) MAT in binary LINSOL-format after all optimization steps (you can set ilin(43)), Note: optim values {1|2|3}11000 can modify the matrix, misc = 00300 : Storing of normalized matrix (matrices) MAT adapted to parallel processing in LINSOL-format (you can set ilin(43)), Note: optim values {1|2|3}11000 can modify the matrix, misc = 00400 : Storing of normalized matrix (matrices) MAT adapted to parallel processing in binary LINSOL-format (you can set ilin(43)), Note: optim values {1|2|3}11000 can modify the matrix, misc = 01000 : Output of error, misc = 10000 : Print residuals (and errors, if the output of errors is enabled) in file plot_<date>_<time> (you can set ilin(44)).(misc corresponds to MISC in manual page LINSOL.)(21) = runacc, input, globaloptim = 0000000 : no optimization and no statistics, optim = 0000001 : time statistics, optim = 0000010 : printing of statistics regarding the data structures, optim = 0000100 : safe cache reuse, i.e. arrays MAT and INDEX do not change, optim = 0000200 : unsafe cache reuse, i.e. arrays MAT and INDEX do change, optim = 0001000 : remove zero entries in matrix MAT, optim = 0010000 : restore matrix MAT completely in storage pattern 'packed row', optim = 0100000 : search single entries in rows and store them on the main diagonal by permuting the matrix MAT rowwise; if the matrix is not symmetric, then the matrix size is reduced by shifting entries of known solution components to the right hand side, optim = 0200000 : bandwidth optimizer (BWO), optim = 0300000 : search single entries in rows (see above) + BWO, optim = 1000000 : optimization of data strucures; matrix will be restored in storage patterns 'full diagonal', 'packed diagonal' and 'starry sky' <<< NOT YET IMPLEMENTED!,(optim corresponds to OPTIMIZE in manual page LINSOL.)(22) = bwoalg, input, globalrunacc = 0 : version with minimum storage requirements is used, runacc = 1 : runtime is accelerated, but additional storage is allocated.(runacc corresponds to RUNTIME_ACCELERATION in manual page LINSOL.)selects the bandwidth optimization algorithm (corresponds to BANDW_ALG in manual page LINSOL).(30) = internal use, output, globalbwoalg = 1 : Gibbs-Poole-Stockmeyer, bwoalg = 2 : fast single pass algorithm, bwoalg = 3 : reverse Cuthill/McKee bwoalg = 4 : reverse Cuthill/McKee (no iteration)global size of matrix mat.(31) = internal use, output, globalglobal size of index array index.(32) = internal use, output, globalglobal size of first dimension of information array info.(33) = internal use, output, locallocal size of matrix mat.(34) = internal use, output, locallocal size of index array index.(35) = internal use, output, locallocal size of first dimension of information array info.(36) = internal use, output, globalglobal dimension of linear system.(37) = internal use, output, globalsize of preconditioning matrix L, if nproc==1.(38) = internal use, output, globalsize of preconditioning matrix U, if nproc==1.(39) = internal use (iertag), output, local(41) = vtunit, input, globalI/O unit for printing of vector terms (should be greater 20 and less 100 - if not set, then printing on standard output).(43) = matout, input, globalI/O unit for storing of matrix MAT (should be greater 20 and less 100 - if not set, then printing on unit 23).(44) = resout, input, globalI/O unit for the printing of the residuals (and errors) (should be greater 20 and less 100 - if not set, then printing on unit 24).(45) = preout, input, globalI/O unit for the printing of the matrix coefficients shifted to the right hand side by the pre-forward elimination step (should be greater 20 and less 100 - if not set, then printing on unit 25).(49) = internal use, input, global(50) = internal use, output, globalis -1, if stand-alone version is used with optim=10 and nproc=1 or infmt=1, is -2 after the call of LSPCHK in routine LSOLP.
<<< **PAR
lmatbk(i) returns the number of unknowns on process(or) i.tids integer, array: tids(nproc), input, global
<<< **PAR
tids defines the mapping of the logical process(or) numbers to the physical process ID's (see section on parallel computers).iconv integer, output, global
iconv indicates the behaviour of LSOLP.ierr integer, output, localiconv = 1 : convergence, iconv = 2 : iteration reached the maximal number of matrix-vector-multiplications (declared in ilin(2)), iconv = 3 : divergence, iconv = 4 : matrix is not suitable for LSOLP, : e.g. nonregular matrix, iconv = 99 : fatal error.
ierr is a error number with informations about the error.ierr = ijk : i = 0,..,9 indicates the level of routines, on which the error occurs, j = 0 means: wrong parameters, j = 1 means: error during normalization, j = 2 means: error during iteration process, j = 3 means: error during communication, k is a general error counter with 0<k<100.
For sufficient diagonal dominance of the matrix this is a quickly converging method. If the norm of the residual decreases only in the first iteration steps, then choose one of the other methods. The residuals decrease monotonically.2a. BICO [1] smoothed, ilin(1)=2, ilin(18)<>0
(BiConjugate Gradients)
BICO is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals decrease monotonically.2b. BCG [2] (BiConjugate Gradients), ilin(1)=2, ilin(18)=0
BCG is more robust than PRES20 and converges quickly even if the matrix is not diagonally dominant, but BICO uses two matrix-vector multiplications (MVM) by both the matrix MAT and its transposed matrix MAT^T per iteration step. The residuals oscillate heavily. Therefore, use BICO or QMR.3a. Bi-CGSTAB smoothed [3,8] , ilin(1)=3, ilin(18)<>0
The Bi-Conjugate Gradient STABilized (Bi-CGSTAB) method is a modification of CGS that should be more stable. The residuals decrease monotonically.3b. Bi-CGSTAB [3] , ilin(1)=3, ilin(18)=0
This is a modification of CGS that should be more stable.4a. AT-PRES = CGNR [4] , ilin(1)=4, ilin(18)<>0
The A_Transposed-Pseudo_RESidual (AT-PRES) method is aequivalent to the CG Normal equations Residual-minimizing (CGNR) method and should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The residuals decrease monotonically.4b. CGNE [5], ilin(1)=4, ilin(18)=0
(CG Normal equation Error-minimizing)
CGNE should only be used if one of the methods BICO, CGS or Bi-CGSTAB does not converge. It may converge slowly but it is very robust. The errors decrease monotonically.5a. CGS smoothed, ilin(1)=5, ilin(18)<>0
(Conjugate Gradient Squared)
The smoothed CGS [6,8] is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers. The residuals decrease monotonically.5b. CGS [6], ilin(1)=5, ilin(18)=0
(Conjugate Gradient Squared)
CGS is as robust as BICO. An advantage of CGS compared to BICO is that CGS only uses matrix-vector multiplications by MAT and not by the transposed matrix MAT^T. Therefore it could run a little better (-: It's getting better all the time :-> on parallel computers.6. QMR simplified [7,9], ilin(1)=6, ilin(18)<>0
The Quasi-Minimal Residual (QMR) method behaves similar as BICO. In this implementation the look-ahead process for avoiding break-downs is omitted.7a. Bi-CGSTAB2 smoothed, ilin(1)=7, ilin(18)<>0
The 2-dimensional optimizing Bi-Conjugate Gradient STABilized (Bi-CGSTAB2) method is a modification of Bi-CGSTAB that should be more stable. The residuals decrease monotonically.7b. Bi-CGSTAB2, ilin(1)=7, ilin(18)=0
The residuals do not decrease monotonically during the iteration process of the non-smoothed variant of Bi-CGSTAB2.9. CG-AT (CG with matrix A*A^T), ilin(1)=9
LSOLP computes the linear system (MAT)(MAT^T)x=b; you have only to hand over the matrix MAT to LSOLP. The matrix MAT must not be symmetric. The matrix (MAT)(MAT^T) then is symmetric; thus LSOLP chooses the CG method as solution process.>=200. Polyalgorithms [1]
The Polyalgorithms try to exploit the advantages of the methods PRES20, Bi-CGSTAB2, GMERR5 (GMERR20), AT-PRES and CG, GMERRS for symmetric matrices respectively. Exemplarily the idea will be explained for the Polyalgorithm 200. It starts with the most efficient - but least robust - method PRES20. If PRES20 falls asleep, it switches to Bi-CGSTAB2. If Bi-CGSTAB2 does not converge fast enough, the Polyalgorithm changes to the "emergency exit" AT-PRES. The Polyalgorithm should be used if no detailed knowledge of an optimal iterative method exists. In 95% of all cases the Polyalgorithm is as fast as the best method in the set {PRES20,Bi-CGSTAB2,AT-PRES}.15. GMERR5 (General Minimal ERRor 5), ilin(1)=15
GMERR5 is the generalization for arbitrary matrices of Fridman's algorithm with a resart after every 5th iteration step.16. GMERR20 (General Minimal ERRor 20), ilin(1)=16
GMERR20 is the generalization for arbitrary matrices of Fridman's algorithm with a resart after every 20th iteration step.If you use the above mentioned methods, the matrix must not be stored in the symmetric storage scheme.
The smoothed Conjugate Gradient (CG) method is mathematically aequivalent to the GMRES method.100b. Classical CG, ilin(1)=100, ilin(18)=0
101. GMERRS, ilin(1)=101
GMERRS (General Minimal ERRor for Symmetric matrices) does not have to be restarted because the residuals being more than 2 iteration steps down from the actual iteration step must be zero. GMERRS should be competitive with GMRES for symmetric matrices.
W. Schoenauer, Scientific Computing on Vector Computers, North-Holland, 1987, Amsterdam, New York, Oxford, Tokyo.[2]
R. Fletcher, Conjugate Gradient Methods for Indefinite Systems, Proc. Dundee Biennial Conf. on Num. Anal., 1975, ed. G.A. Watson, pp. 73-89, Springer-Verlag.[3]
H.A. van der Vorst, BI-CGSTAB: A Fast and Smoothly Converging Variant of BI-CG for the Solution of Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comp., 1992, Vol. 13, No. 2, pp. 631-644.[4]
W. Schoenauer, M. Schlichte, R. Weiss, Wrong Ways and Promising Ways towards Robust and Efficient Iterative Linear Solvers, Advances in Computer Methods for Partial Differential Equations VI, Publ. IMACS, 1987, pp. 7-14.[5]
E.J. Craig, The N-Step Iteration Procedures, Math. Phys., 1955, Vol. 34, pp. 64--73.[6]
P. Sonneveld, CGS, A fast Lanczos-type Solver for nonsymmetric linear Systems, SIAM J. Sci. Stat. Comp., 1989, Vol. 10, No.1, pp. 36-52.[7]
R.W. Freund and N.M. Nachtigal, QMR: a Quasi-Minimal Residual Method for Non-Hermitian Linear Systems, Numer. Math., 1991, Vol. 60, pp. 315--339.,[8]
R. Weiss, Properties of Generalized Conjugate Gradient Methods, Num. Lin. Alg. with Appl., 1994, Vol. 1, No. 1, pp. 45--63.[9]
L. Zhou and H.F. Walker, Residual Smoothing Techniques for Iterative Methods, SIAM J. Sci. Comput., 1994, Vol. 15, No. 2, pp. 297--312.
Consulting by
Numerikforschung fuer Supercomputer Rechenzentrum der Universitaet Karlsruhe Am Zirkel 2 D-76128 Karlsruhe Germany Tel. : (+721) 608/4869 Fax. : (+721) 32550 e-mail : haefner@rz.uni-karlsruhe.de