Constraint solvers

Constraint solvers are used to find approximate a solution to the constraints equations

\[\mathbf{C}\left(\mathbf{U},\mathbf{R}\right)=\mathbf{0}\]

expressing joints and contact conditions, together with the local dynamics equations

\[\mathbf{U}=\mathbf{B}+\mathbf{WR}\]

The merit function used as one of the termination conditions by all solvers, and the algorithms of the Solfec-1.0 solvers themselves, are described in sections below.

Merit function

As discussed on the basics page, at every time step an implicit equation \(\mathbf{C}\left(\mathbf{R}\right)=\mathbf{0}\) is solved. This solution is approximate. In order to express its accuracy as a scalar value, we formulate \(\mathbf{C}\left(\mathbf{R}\right)\) in terms of velocity (see the non-smooth velocity equation formulation) and use

\[g\left(\mathbf{R}\right)=\sum_{\alpha}\left\langle \mathbf{W}_{\alpha\alpha}^{-1}\mathbf{C}_{\alpha}\left(\mathbf{R}\right), \mathbf{C}_{\alpha}\left(\mathbf{R}\right)\right\rangle /\sum_{\alpha}\left\langle \mathbf{W}_{\alpha\alpha}^{-1}\mathbf{B}_{\alpha}, \mathbf{B}_{\alpha}\right\rangle\]

in order to approximately measure the relative amount of “energy”, due to an inexact satisfaction of constraints. The denominator corresponds to the kinetic energy of the relative free motion, hence \(g\left(\mathbf{R}\right)\) is the ratio of the “spurious energy” over the nominal amount of the “energy available at the constraints”. Since inverting \(\mathbf{W}\) would be unpractical or impossible due to its singularity, we only use the diagonal blocks, which are always positive definite. To recapitulate, in short

\[g\left(\mathbf{R}\right)\simeq\frac{\mbox{"spurious energy due to inaccurate solution"}}{\mbox{"free energy available at the constraints"}}\]

Such merit function is used as one of the stopping criterions for the solvers described below.

Gauss–Seidel solver

The equations of local dynamics read

\[\mathbf{U}_{\alpha}=\mathbf{B}_{\alpha}+\sum_{\beta}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta}\]

where \(\mathbf{U}_{\alpha}\) are relative velocities and \(\mathbf{R}_{\alpha}\) are reactions at constraint points. \(\mathbf{U}_{\alpha}\), \(\mathbf{R}_{\alpha}\), \(\mathbf{B}_{\alpha}\) are 3–vectors, while \(\mathbf{W}_{\alpha\beta}\) are \(3\times3\) matrix blocks. Each constraint equation can be formulated as

\[\mathbf{C}_{\alpha}\left(\mathbf{U}_{\alpha},\mathbf{R}_{\alpha}\right)=\mathbf{0}\]

or in other words

(91)\[\mathbf{C}_{\alpha}\left(\mathbf{B}_{\alpha}+\sum_{\beta}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta},\mathbf{R}_{\alpha}\right)=\mathbf{0}\]

A series of 3–component non–linear equations (91) for all constrains can be approximately solved using the below serial Gauss–Seidel paradigm

SERIAL_GS \(\left(Constraints,\epsilon,\gamma\right)\)
1 \(\,\,\) do
2 \(\,\,\,\,\,\,\) for each \(\alpha in Constraints\) do
3 \(\,\,\,\,\,\,\) \(\mathbf{S}_{\alpha}=\mathbf{R}_{\alpha}\)
4 \(\,\,\,\,\,\,\) find \(\mathbf{R}_{\alpha}\) such that \(\mathbf{C}_{\alpha}\left(\mathbf{B}_{\alpha}+\sum_{\beta}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta},\mathbf{R}_{\alpha}\right)=\mathbf{0}\)
5 \(\,\,\,\,\,\,\,\,\,\,\) assuming \(\mathbf{R}_{\beta}=\mbox{constant}\) for \(\beta\ne\alpha\)
6 \(\,\,\) while \(\left\Vert \mathbf{S}-\mathbf{R}\right\Vert /\left\Vert \mathbf{R}\right\Vert >\epsilon\) and \(g\left(\mathbf{R}\right)>\gamma\)

Algorithm SERIAL_GS is quite simple: diagonal block problems are solved until reaction change is small enough. The Gauss–Seidel paradigm corresponds to the fact, that the most recent off–diagonal reactions are used when solving the diagonal problem. Of course, because of that, a perfectly parallel implementation is not possible. After all, reactions are updated in a sequence. We can nevertheless relax the need for sequential processing. Perhaps the most scalable Gauss–Seidel approach to date was devised by Adams 1. Although originally it was used as a multigrid smoother, the core idea can be as well applied in our context. Each processor owes a subset of (internal) constraints \(Q_{i}\), where \(i=1,2,...,n\) are the processors indices. Therefore the local velocity update can be rewritten as

\[\mathbf{U}_{\alpha}=\mathbf{B}_{\alpha}+\sum_{\beta\in Q_{i}}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta}+\sum_{\beta\notin Q_{i}}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta}\]

Some of the \(\mathbf{W}_{\alpha\beta}\) blocks and reactions \(\mathbf{R}_{\beta}\) correspond to the (external) constraints stored on other processors (\(\beta\notin Q_{i}\)). Let us denote the set of corresponding reaction indices by \(P_{i}\). That is

\[P_{i}=\left\{ \beta:\exists\mathbf{W}_{\alpha\beta}\ne\mathbf{0}\mbox{ and }\alpha\in Q_{i}\mbox{ and }\beta\notin Q_{i}\right\}\]

For each \(\beta\in P_{i}\) we know an index of processor \(cpu\left(\beta\right)\) storing the constraint with index \(\beta\). For processor \(i\) we can then define a set of adjacent processors as follows

\[adj\left(i\right)=\left\{ cpu\left(\beta\right):\beta\in P_{i}\right\}\]

When updating reactions, a processor needs to communicate only with other adjacent processors. We are going to optimize a pattern of this communication by coloring the processors. We shall then assign to each processor a color, such that no two adjacent processors have the same color. A simple coloring method is summarized in Algorithm COLOR below

COLOR \(\left(\right)\)
1 \(\,\,\) for \(i=1,...,n\) do \(color\left[i\right]=0\)
2 \(\,\,\) for \(i=1,...,n\) do
3 \(\,\,\,\,\,\,\) do
4 \(\,\,\,\,\,\,\,\,\,\,\) \(color\left[i\right]=color\left[i\right]+1\)
5 \(\,\,\,\,\,\,\) while for any \(j\in adj\left(i\right)\) there holds \(color\left[i\right]=color\left[j\right]\)

We try to assign as few colors as possible. We then split the index sets \(Q_{i}\) as follows

\[Top_{i}=\left\{ \alpha:\forall\mathbf{W}_{\alpha\beta}:\beta\in P_{i}\wedge color\left[cpu\left(\beta\right)\right]<color\left[i\right]\right\}\]
\[Bottom_{i}=\left\{ \alpha:\forall\mathbf{W}_{\alpha\beta}:\beta\in P_{i}\wedge color\left[cpu\left(\beta\right)\right]>color\left[i\right]\right\}\]
\[Middle_{i}=\left\{ \alpha:\forall\mathbf{W}_{\alpha\beta}:\beta\in P_{i}\wedge\alpha\notin Top_{i}\cup Bottom_{i}\right\}\]
\[Inner_{i}=Q_{i}\setminus\left\{ Top_{i}\cup Bottom_{i}\cup Middle_{i}\right\}\]

The top constraints require communication only with processors of lower colors. The bottom constraints require communication only with processors of higher colors. The middle constraints require communication with either. The inner constraints require no communication. The inner reactions are further split in two sets

\[Inner_{i}=Inner1_{i}\cup Inner2_{i}so that\]
(92)\[\left|Bottom_{i}\right|+\left|Inner2_{i}\right|=\left|Top_{i}\right|+\left|Inner1_{i}\right|\]

The parallel Gauss–Seidel scheme is summarized in Algorithm PARALLEL_GS below. The presented version is simplified in the respect, that alternate forward and backward runs are not accounted for (in terms of constraints ordering).

SWEEP \(\left(Set\right)\)
1 \(\,\,\) for each \(\alpha\in Set\) do
2 \(\,\,\,\,\,\,\) find \(\mathbf{R}_{\alpha}\) such that \(\mathbf{C}_{\alpha}\left(\mathbf{B}_{\alpha}+\sum_{\beta}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta},\mathbf{R}_{\alpha}\right)=\mathbf{0}\)
3 \(\,\,\,\,\,\,\,\,\,\,\) assuming \(\mathbf{R}_{\beta}=\mbox{constant}\) for \(\beta\ne\alpha\)

LOOP \(\left(Set\right)\)
1 \(\,\,\) descending sort of \(\alpha\in Set\) based on \(\max\left(color\left[cpu\left(\beta\right)\right]\right)\) where \(\exists\mathbf{W}_{\alpha\beta}\)
2 \(\,\,\) for each ordered \(\alpha in Set\) do
3 \(\,\,\,\,\,\,\) for each \(\beta\) such that \(\exists\mathbf{W}_{\alpha\beta}\) and \(color\left[cpu\left(\alpha\right)\right]<color\left[cpu\left(\beta\right)\right]\) do
4 \(\,\,\,\,\,\,\,\,\,\,\) if not received \(\left(\mathbf{R}_{\beta}\right)\) then receive \(\left(\mathbf{R}_{\beta}\right)\)
5 \(\,\,\,\,\,\,\) find \(\mathbf{R}_{\alpha}\) such that \(\mathbf{C}_{\alpha}\left(\mathbf{B}_{\alpha}+\sum_{\beta}\mathbf{W}_{\alpha\beta}\mathbf{R}_{\beta},\mathbf{R}_{\alpha}\right)=\mathbf{0}\)
6 \(\,\,\,\,\,\,\,\,\,\,\) assuming \(\mathbf{R}_{\beta}=\mbox{constant}\) for \(\beta\ne\alpha\)
7 \(\,\,\,\,\,\,\) send \(\left(\mathbf{R}_{\alpha}\right)\)
8 :math: ,, receive all remaining \(\mathbf{R}_{\beta}\)

PARALLEL_GS \(\left(\epsilon,\gamma\right)\)
1 \(\,\,\) COLOR \(\left(\right)\)
2 \(\,\,\) do
3 \(\,\,\,\,\,\,\) \(\mathbf{S}=\mathbf{R}\)
4 \(\,\,\,\,\,\,\) SWEEP \(\left(Top_{i}\right)\)
5 \(\,\,\,\,\,\,\) send \(\left(Top_{i}\right)\)
6 \(\,\,\,\,\,\,\) SWEEP \(\left(Inner2_{i}\right)\)
7 \(\,\,\,\,\,\,\) receive \(\left(Top_{i}\right)\)
8 \(\,\,\,\,\,\,\) LOOP \(\left(Middle_{i}\right)\)
9 \(\,\,\,\,\,\,\) SWEEP \(\left(Bottom_{i}\right)\)
10 \(\,\,\,\,\,\) send \(\left(Bottom_{i}\right)\)
11 \(\,\,\,\,\,\) SWEEP \(\left(Inner1_{i}\right)\)
12 \(\,\,\,\,\,\) receive \(\left(Bottom_{i}\right)\)
13 \(\,\) while \(\left\Vert \mathbf{S}-\mathbf{R}\right\Vert /\left\Vert \mathbf{R}\right\Vert >\epsilon\) and \(g\left(\mathbf{R}\right)>\gamma\)

In PARALLEL_GS we first process the \(Top_{i}\) set: a single sweep over the corresponding diagonal block problems is performed in line 4. Then we send the computed top reactions to the processors with lower colors. We try to overlap communication and computation, hence we sweep over the \(Inner2_{i}\) set (line 6) while sending. We then receive the top reactions. It should be noted that all communication is asynchronous – we only wait to receive reactions immediately necessary for computations. In line 8 we enter the loop processing the \(Middle_{i}\) set. This is the location of the computational bottleneck. Middle nodes communicate with processors of higher and lower colors and hence, they need to be processed in a sequence. The sequential processing is still relaxed by using processor coloring. In the LOOP algorithm we first sort the constraints according to the descending order of maximal colors of their adjacent processors (line 1). We then maintain this ordering while processing constraints. As the top reactions were already sent, some of the constraints from the middle set will have their external reactions from higher colors fully updated. These will be processed first in line 5 of LOOP and then sent to lower and higher (by color) processors in line 7. This way some processors with lower colors will have their higher color off-diagonal reactions of middle set constraints fully updated and they will proceed next. And so on. At the end (line 8), we need to receive all remaining reactions that have been sent in line 7 of LOOP. Coming back to PARALLEL_GS, after the bottleneck of the LOOP, in lines 9–12 we process the \(Bottom_{i}\) and \(Inner1_{i}\) sets in the same way as we did with the \(Top_{i}\) and \(Inner2_{i}\) sets. The condition (92) attempts to balance the amount of computations needed to hide the communication (e.g. the larger the \(Top_{i}\) set is, the larger the \(Inner2_{i}\) set becomes). It should be noted that the convergence criterion in line 13 is global across all processors.

In User Manual Solvers Section several variants of the parallel Gauss–Seidel algorithm are listed. Algorithm PARALLEL_GS corresponds to the FULL variant. We might like to relax the bottleneck of LOOP in line 8 of PARALLEL_GS by replacing it with

8.1 \(\,\,\) SWEEP \(\left(Middle_{i}\right)\)
8.2 \(\,\,\) send \(\left(Middle_{i}\right)\)
8.3 \(\,\,\) receive \(\left(Middle_{i}\right)\)

so that the middle nodes are processed in an inconsistent manner: the off–processor information corresponds to the previous iteration (just like in the Jacobi method). Usually the \(Middle_{i}\) sets are small and hence this inconsistency does not have to lead to divergence (especially for deformable kinematics, where constraint interactions are weak, while \(\mathbf{W}\) is diagonally dominant). This is the MIDDLE_JACOBI variant of the algorithm. The last variant corresponds to a rather gross inconsistency: something usually called “a processor Gauss-Seidel method”. Let us define the set

\[All_{i}=Top_{i}\cup Bottom_{i}\cup Middle_{i}\cup Inner_{i}\]

In this case, lines 4–12 of PARALLEL_GS need to be replaced with

3 \(\,\,\) SWEEP \(\left(All_{i}\right)\)
4 \(\,\,\) send \(\left(All_{i}\right)\)
5 \(\,\,\) receive \(\left(All_{i}\right)\)

Although this kind of approach does work as a multigrid smoother, it has little use in our context. Nevertheless, we use it for illustration sake and name the BOUNDARY_JACOBI.

Projected Newton solver

Using the non–smooth velocity equation formulation let us rewrite the frictional contact problem as

\[\mathbf{C}\left(\mathbf{R}\right)=\mathbf{F}\left(\mathbf{R}\right)+\mathbf{m}\left(\mathbf{R}-\mathbf{F}\left(\mathbf{R}\right)\right)=\mathbf{0}\mbox{ and }\mathbf{R}\in K\]

where \(K\) is the direct sum of friction cones at all contact points. Since \(\mathbf{C}\left(\mathbf{R}\right)\) is not smooth, to compute \(\nabla\mathbf{C}\) we generalize the approach from 2 and 3 and use a smoothed \(\nabla_{\omega}\mathbf{C}\) with \(\omega>0\) (we skip the details here), where only the self–dual case was considered (friction coefficient equal to 1). Our idea is to employ the following projected quasi–Newton step

(93)\[\mathbf{R}^{k+1}=\mbox{proj}_{K}\left[\mathbf{R}^{k}-\mathbf{A}^{-1}\mathbf{C}\left(\mathbf{R}\right)\right]\]

so that, as required, the iterates remain within the friction cone and where

\[\mathbf{A}\simeq\nabla_{\omega}\mathbf{C}\]

is an easy to invert approximation of \(\nabla_{\omega}\mathbf{C}\). Since in many practical situations \(\nabla_{\omega}\mathbf{C}\) is singular, we cannot do not employ it directly. Instead, we then two variants of \(\mathbf{A}\simeq\nabla_{\omega}\mathbf{C}\). The first one reads

\[\mathbf{A}_{1}=\nabla_{\omega}\mathbf{C}+\delta\mathbf{I},\mbox{ combined with GMRES.}\]

where \(\delta\ge0\). This is related to numerical integration of an artificial ODE

\[\frac{d\mathbf{R}}{dt}=\mathbf{C}\left(\mathbf{R}\right)\]

to a steady state (take one step of implicit Euler, in the literature this is called pseudo–transient continuation). The second variant reads

\[\mathbf{A}_{2}=\mbox{diag}_{3\times3}\left[\nabla_{\omega}\mathbf{C}\right],\mbox{ combined with direct inversion.}\]

and it is combined with a heuristic stabilization technique

\[\triangle\mathbf{R}^{k+1}=\left(1-\theta\right)\triangle\mathbf{R}^{k}-\theta\left(\mathbf{A}^{k}\right)^{-1}\mathbf{C}^{k}\]

where

\[\theta\in\left[0,1\right]\mbox{.}\]

We then have two variants of the projected quasi–Newton step:

  1. PQN1:

\[\mathbf{R}^{k+1}=\mbox{proj}_{K}\left[\mathbf{R}^{k}-\left(\nabla_{\omega}\mathbf{C}^{k}+\delta\mathbf{I}\right)_{\mbox{GMRES}\left(\epsilon\left\Vert \mathbf{C}^{k}\right\Vert ,m\right)}^{-1}\mathbf{C}^{k}\right]\]

where GMRES is preconditioned with \(\left[\mbox{diag}_{3\times3}\left(\nabla_{\omega}\mathbf{C}_{\alpha\alpha}^{k}+\delta\mathbf{I}\right)\right]^{-1}\) and \(\delta\), \(\epsilon\) and \(m\) need to be suitably selected. The linear problem should be solved only roughly, usually \(\epsilon=0.25\) and \(m=10\) (iterations bound) work well. For ill–conditioned problems a too accurate solution of the linear sub–problem results in a poor convergence rate. The diagonal regularization \(\delta\) needs to be adjusted “by hand”. The automatic update formulas that can be found in literature work only for well–conditioned cases and hence they are not very useful for us. For ill–conditioned problems one should pick \(\delta\) that delivers an overall best convergence behavior. Large values will slow down convergence, but stabilize it; small values may destabilize convergence for ill–conditioned problems; \(\delta\) (typically \(\ll1\)) should be tuned together with \(\epsilon\) and \(m\) (e.g. find a suitably small \(\delta\) first, then tweak \(\epsilon\)). Since rigorous analysis is still missing for these parameters, please experiment before settling on specific values for a specific problem. Use linver = ‘GMRES’ in NEWTON_SOLVER to enable this variant (this is also the default).

  1. PQN2:

\[\mathbf{R}^{k+1}=\mbox{proj}_{K}\left[\mathbf{R}^{k}+\left(1-\theta\right)\triangle\mathbf{R}^{k}-\theta\left(\mbox{diag}_{3\times3}\left[\nabla_{\omega}\mathbf{C}^{k}\right]\right)^{-1}\mathbf{C}^{k}\right]\]

where \(\theta\in\left[0,1\right]\) and the diagonal \(3\times3\) blocks of \(\nabla_{\omega}\mathbf{C}^{k}\) are directly inverted. This simple scheme is interesting because it converges for a sufficiently small \(\theta\), while it is essentially a nonlinear Jacobi–type method. Use linver = ‘DIAG’ in NEWTON_SOLVER to enable this variant.

Both variants are summarized as algorithms below.

PQN1 \(\left(\mathbf{R},\gamma,n,\omega,\delta,m,\epsilon\right)\)
1 \(\,\,\) \(\triangle\mathbf{R}^{0}=\mathbf{0}, k=0\)
2 \(\,\,\) Do
3 \(\,\,\,\,\,\,\) \(\mathbf{U}^{k}=\mathbf{W}\mathbf{R}^{k}+\mathbf{B}\)
4 \(\,\,\,\,\,\,\) Compute \(\mathbf{C}^{k}\) and \(\mathbf{A}^{k}=\nabla_{\omega}\mathbf{C}_{\alpha\alpha}^{k}+\delta\mathbf{I}\) using smoothing \(\omega\)
5 \(\,\,\,\,\,\,\) \(\triangle\mathbf{R}^{k+1}=-\left(\mathbf{A}^{k}\right)_{\mbox{GMRES}\left(\epsilon\left\Vert \mathbf{C}^{k}\right\Vert ,m\right)}^{-1}\mathbf{C}^{k}\)
6 \(\,\,\,\,\,\,\) \(\mathbf{R}^{k+1}=\mbox{proj}_{K}\left[\mathbf{R}^{k}+\triangle\mathbf{R}^{k+1}\right]\)
7 \(\,\,\,\,\,\,\) \(k=k+1\)
8 \(\,\,\,\,\,\,\) while \(g\left(\mathbf{R}^{k}\right)\ge\gamma\) and \(k<n\)

PQN2 \(\left(\mathbf{R},\theta,\gamma,n,\omega\right)\)
1 \(\,\,\) \(\triangle\mathbf{R}^{0}=\mathbf{0}, k=0\)
2 \(\,\,\) Do
3 \(\,\,\,\,\,\,\) \(\mathbf{U}^{k}=\mathbf{W}\mathbf{R}^{k}+\mathbf{B}\)
4 \(\,\,\,\,\,\,\) Compute \(\mathbf{C}^{k}\) and \(\mathbf{A}^{k}=\mbox{diag}_{3\times3}\left[\nabla_{\omega}\mathbf{C}_{\alpha\alpha}^{k}\right]\) using smoothing \(\omega\)
5 \(\,\,\,\,\,\,\) \(\triangle\mathbf{R}^{k+1}=\left(1-\theta\right)\triangle\mathbf{R}^{k}-\theta\left(\mathbf{A}^{k}\right)^{-1}\mathbf{C}^{k}\)
6 \(\,\,\,\,\,\,\) \(\mathbf{R}^{k+1}=\mbox{proj}_{K}\left[\mathbf{R}^{k}+\triangle\mathbf{R}^{k+1}\right]\)
7 \(\,\,\,\,\,\,\) \(k=k+1\)
8 \(\,\,\) while \(g\left(\mathbf{R}^{k}\right)\ge\gamma\) and \(k<n\)

Penalty Solver

The penalty solver is quite straightforward. On each processor we split the constraints into \(Contacts_{i}\) and \(Others_{i}\), hence we separate contact constraints from bilateral ones. We then update the contacts using the spring–dashpot model and calculate reactions of bilateral constraints using the Gauss–Seidel solver (fixed accuracy \(\mbox{epsilon=1E-4, maxiter = 1000}\) is used). We use the Gauss–Seidel approach for non–contacts because in this case it is quite fast, while it avoids issues related to penalization of bilateral constraints.

PENALTY_SOLVER \(\left(\right)\)
1 \(\,\,\) for all \(\alpha\) in \(Contacts_{i}\) do
2 \(\,\,\,\,\,\,\) SPRING_DASHPOT_CONTACT \(\left(h,gap_{\alpha},spring_{\alpha},dashpot_{\alpha},friction_{\alpha},cohesion_{\alpha},cohesive_{\alpha}\right)\)
3 \(\,\,\) send \(\left(Contacts_{i}\right)\)
4 \(\,\,\) receive \(\left(Contacts_{i}\right)\)
5 \(\,\,\) PARALLEL_GS \(\left(Others_{i}\right)\)

Algorithm PENALTY_SOLVER summarizes the method. First all contact forces are calculated using the SPRING_DASHPOT_CONTACT algorithm. In lines 3 and 4 contact domain boundary contact forces are sent to and received on the neighboring processors. Finally, the parallel Gauss–Seidel algorithm is executed to calculate the reactions of the bilateral constraints. In the serial mode lines 3 and 4 are skipped, while SERIAL_GS is used instead of the parallel one.

Implementation

The Gauss–Seidel solver is implemented in bgs.c and bgs.h. The projected Newton solver is implemented in nts.c and nts.h. The penalty solver is implemented in pes.c and pes.h.

1

Mark F. Adams, A distributed memory unstructured Gauss–Seidel algorithm for multigrid smoothers, In Supercomputing 01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing, pages 4-4, New York, USA, 2001.

2

Masao Fukushima, Zhi-Quan Luo, and Paul Tseng, Smoothing functions for second-order-cone complementarity problems, SIAM Journal on Optimization, 12(2):436–460, 2002.

3

Shunsuke Hayashi, Nobuo Yamashita, and Masao Fukushima, A Combined Smoothing and Regularization Method for Monotone Second–Order Cone Complementarity Problems, SIAM J. Optim., 15(2), 593–615, 2005.