Accelerating isogeometric boundary element analysis for three-dimensional elastostatics problems through black-box fast multipole method with proper generalized decomposition

SUMMARY The isogeometric approach to computational engineering analysis makes use of Non-Uniform Rational B-splines (NURBS) to discretise both the geometry and the analysis ﬁeld variables, giving a higher ﬁdelity geometric description and leading to improved convergence properties of the solution over conventional piecewise polynomial descriptions. Because of its boundary-only modelling, with no requirement for a volumetric NURBS geometric deﬁnition, the boundary element method is an ideal choice for isogeometric analysis of solids in 3-D. An isogeometric boundary element analysis (IGABEM) algorithm is presented for the solution of such problems in elasticity, and is accelerated using the black-box Fast Multipole Method (bbFMM). The bbFMM scheme is of O ( n ) complexity, giving a general kernel-independent separation that can be easily integrated into existing, conventional IGABEM codes with little modiﬁcation. In the bbFMM scheme, an important process of obtaining a low rank approximation of M2L operators has been hitherto based on Singular Value Decomposition (SVD), which can be very time consuming for large 3-D problems, and this motivates the present work. We introduce the Proper Generalized Decomposition (PGD) method as an alternative approach, and this is demonstrated to enhance efﬁciency in comparison with schemes that rely on the SVD. In the worst case a factor of approximately 2 performance gain is achieved. Numerical examples show the performance gains that are achievable in comparison to standard IGABEM solutions, and demonstrate that solution accuracy is not affected. The results illustrate the potential of this numerical technique for solving arbitrary large scale elastostatics


INTRODUCTION
In the traditional structural design process, mesh generation is an essential task in applying numerical stress analysis tools to geometric models built within a Computer Aided Design (CAD) environment.This can be a very time-consuming process that dominates the whole design lifecycle.To alleviate this burden, the idea of Isogeometric Analysis (IGA) [1] has received much attention in recent years as it offers precise and efficient geometric modelling, refinement without re-meshing the CAD model and control over the smoothness of the basis functions.This provides a uniform representation for the design and analysis models, significantly reducing the overall analysis time including the creation and refinement of the analysis model.Rapidly, the concept has expanded from elasticity [1,2] to various applications such as fluid dynamics [3,4], electromagnetics [5], vibration [6] and acoustics [7][8][9].Early work on IGA was initially based on NURBS [1,10], and then extended to T-splines [11] and PHT-splines [12][13][14].
The Boundary Element Method (BEM) [15] is a domain discretisation technique for solving partial differential equations, which takes advantage of boundary integral equations to decrease the dimensionality.The fact that both CAD and BEM require only a boundary representation suggests that there is much more scope for further development in linking BEM to CAD geometries than the Finite Element Method (FEM).BEM analysis of 3-D solids proceeds directly from a boundary representation, the most common geometric representation in CAD.Conversely, an isogeometric finite element analysis of a 3-D solid would require the user first to generate a model comprised of NURBS volumes, which is far from straightforward.This natural connection motivated many researchers to consider some early attempts to include CAD representations in a BEM framework.Arnold [16,17] proposed a spline collocation method which is used to develop convergence estimates for BEM in two and three dimensions.A BEM formulation based on cubic splines [18,19] was developed to solve groundwater flow problems and the Laplace equation.Turco [20] presented an approach by which the elasticity problem could be solved by B-spline elements.Rivas [21] was the first to put forward a combination of BEM with non-uniform rational B-splines (NURBS) in the context of the method of moments.
Importantly, IGA offers a considerable potential advantage over the conventional BEM in that it can circumvent the meshing procedure and eliminate geometry representation error.The idea of a true integration between CAD and analysis was explored in various areas including elastostatics [22][23][24], shape optimisation [25,26], acoustics [7,8,27] and fracture mechanics [28].However, a well-known drawback of both BEM and IGABEM remains, i.e. the dense (and, for the collocation form of the BEM, nonsymmetric) matrix for which O(n 2 ) operations are needed to compute the matrix coefficients and another O(n 3 ) operations to solve the system using direct solvers (n being the number of degrees of freedom (DOF)).We note that symmetric Galerkin formulations [29] are more amenable to mathematical analysis and optimal preconditioner strategies (i.e.operatorpreconditioning [30]) are available.This presents great challenges in computational complexity for 3-D problems if they have a large number of DOF.Hence, for dealing with large scale engineering problems, some matrix compression techniques should be used to reduce the complexity to O(n log n) or, better still, O(n).Some popular techniques have been successfully applied in order to accelerate BEM schemes, including the Fast Multipole Method (FMM) [31][32][33], Adaptive Cross Approximation (ACA) method [34][35][36] and pre-corrected Fast Fourier Transformation (pFFT) method [37][38][39].Similar to most acceleration schemes, these methods approximate the far-field kernel interactions through efficient hierarchical data structures, which allow for use of matrixvector multiplication with almost linear complexity.The IGABEM or BEM can be accelerated with the FMM, ACA, or pFFT to solve large scale problems on a personal computer.From another point of view, these methods perform the matrix compression in a purely algebraic manner, or based on an analytical expansion.In order to obtain the kernel with numerical approximation [9,[40][41][42], the black-box FMM [43] was developed by using Chebyshev interpolation to provide a more general, kernel-independent implementation for the matrix compression with a fast O(n) algorithm.The current paper is based on this technique.
The current work contains two key elements of novelty: 1.In both the conventional and black-box FMM variants, the construction of the M2L operators is the most numerically intensive stage.Some methods have been proposed to ameliorate the cost of this stage, and have typically been constructed through a low rank operation calculated by wavelet decomposition [44,45], Singular Value Decomposition (SVD) [41,46], QR decomposition [47], "skeletonization" technique [48,49], QR factorization with Adaptive Cross Approximation (ACA) [50,51] and subsampling strategy [52].As a canonical method to extract the optimal approximation basis, the computational cost of SVD is O(mn 2 ) for an m × n matrix, and is a significant part of the whole bbFMM calculation.The Proper Generalized Decomposition (PGD) method [53,54] provides a quasi-optimal basis based on an iterative procedure with a low computational effort (O(2mn)), rather than directly solving eigenfunctions as in SVD.In this paper, we introduce PGD into the M2L operators, which alleviates the expense of pre-computing the low rank decomposition and computing the matrix-vector products while maintaining a desired accuracy.Although some of the alternative acceleration techniques listed above have been shown to outperform SVD, our comparison in this paper is limited to SVD and PGD since our research focuses on problems of elastostatics, to which the above techniques have not yet been applied.2. From the implementation perspective, our paper provides all the required details for solving 3-D elastostatics problems based on bbFMM and IGABEM which have not reported before.Some special attention will be given to illustrate unique features of this kind of problem.The PGD acceleration of bbFMM, which is the main novelty in this paper, can be applied in either a conventional BEM or an IGABEM context, but IGABEM is chosen for its well-known benefits in accuracy of its geometric description and its good convergence properties.Thus, with a view to practical application in industry, an accelerated IGABEM is more appealing and promising than the conventional BEM.
The paper is organised as follows: a brief introduction is given to the NURBS-based IGABEM discretisation procedure and the corresponding linear system of equations; the bbFMM algorithm combined with PGD is described and, finally, numerical examples are given to verify the scheme and compare the time and memory saving with the conventional IGABEM for the solution of 3-D elastostatics problems.

Boundary integral equation
For a 3-D linear elastic problem, the structure occupies a continuous physical domain, Ω ⊂ R 3 , with boundary Γ.The boundary integral equation (BIE), in the absence of body forces, can be written as follows: where s ∈ Γ denotes the source point and x ∈ Γ the field point, u ∈ R 3 the displacement field, the jump term, ūi and ti the prescribed displacements and tractions, Γ ūi and Γt i the parts of Γ over which displacements and tractions are prescribed in each specific direction with Γ ui ∪ Γ tj = Γ, Γ ui ∩ Γ tj = ∅, i = j, i and j the indices running from 1 to 3 in three dimensions to denote the x-,y-and z-direction and − denotes an integral taken in the Cauchy Principal Value (CPV) sense.The integrals in Eq. ( 1) contain a weak singularity and a strong singularity, and their evaluation will be discussed in the following sections.The displacement and traction fundamental solutions are given as: where r = r(s, x) = s − x is the distance between source point and field point, n i the ith component of unit outward normal vector n, r ,i = ∂r ∂xi , µ the shear modulus and ν the Poisson's ratio.

B-splines and NURBS basis
This section provides a brief overview of B-splines and Non-Uniform Rational B-splines (NURBS).These are standard bases for geometry representation, and the interested reader is directed to [10] for a full description.B-splines are geometric entities that are constructed from a group of piecewise polynomials which are defined by a knot vector: which is a set of non-decreasing real numbers in the parametric space.Here, a denotes the knot index, p the curve degree, and n the number of basis functions or control points.Each real number ξ a is called a knot.The number of knots in a valid knot vector is always ) is called a knot span.The basis functions N a,p may be defined using the Cox-de Boor recursion formula [55,56]; starting with p = 0: and for p = 1, 2, 3, . ..: The B-spline geometry is found through a mapping from the parametric space, ξ, to physical space (x, y, z) through a linear combination of the basis functions N a,p , and the corresponding coefficients may be considered to be the geometric coordinates of a set of control points located in physical space.The B-spline curve can thus be expressed as: where x(ξ) is the location of the physical curve, ξ the spatial coordinate in parametric space, N a,p the B-spline basis function of order p and P a the control point related to each basis function.
Multivariate basis functions are defined by tensor products of univariate basis functions of parametric direction, given by: where i denotes the direction index and d p the number of dimensions.NURBS are developed from B-splines and can offer significant advantages due to their ability to represent a wide variety of geometric entities.Importantly for engineering applications, they have the ability to describe circular arcs and other conic sections exactly, whereas the traditional piecewise polynomial description cannot.The expression defining a NURBS curve is similar to that with B-splines, where P a is the set of control point coordinates, and R a,p the NURBS basis function which is given by: R a,p (ξ) = w a N a,p (ξ) with where a denotes the control points index, p the curve order, n a the number of control points, N a,p the B-spline basis function and w a a weight associated to each basis function or control point.The weight can influence the distance between the associated control point and the NURBS geometry; a larger weight w a will cause the curve to be drawn closer to the control point P a .If all of the weights are equal to 1, the NURBS curve will degenerate into a B-spline curve.

IGABEM implementation
The implementation of IGABEM is similar to the conventional BEM with the concept of isoparametric elements.We use NURBS to discretise both the solution variables (traction and displacement) and the geometry using the same shape functions.For practical application, if the geometry is sufficiently simple (without conic sections such as circular arcs) one might use B-spline shape functions instead to enhance performance.In the general case, though, a NURBS expansion is used and typically transformed into Bézier elements [57], so that the displacement and traction fields around the boundary are expressed: where ũa and ta are the nodal displacement and traction parameters associated with the control point with index a, and ξ = (ξ u , ξ v ) the intrinsic coordinates (i.e. in parametric space) of the field point in a specific patch or element.It should be noted that ũa and ta should not be interpreted as the displacements and tractions at control points.Indeed, the control points may lie outside the geometry.They are simply coefficients using which the displacements and tractions can be recovered using Eq. ( 14) and Eq.(15).We use a continuous IGABEM formulation so control points are shared between adjacent elements, and the degrees of freedom ũa and ta apply to all elements to which the control point with index a belongs.With this NURBS expansion, the BIE (Eq.( 1)) can be written as a discretised form: We use this BIE in a collocation scheme, so that ζc = (ζ u , ζ v ) indicates the intrinsic coordinate of the collocation point, c the collocation point index, e 0 the element in which the collocation point is located, and a 0 is the local index of the collocation point in element e 0 .ξ denotes the intrinsic coordinates of field point in parent element, e the element index, a the local index of the node in element e, R ea the shape function, J e the Jacobian and Γ e the portion of boundary Γ represented by element e.
There are singular cases in which the collocation point lies in Γ e ; the evaluation of these integrals is discusssed in sections 2.5 and 2.6.With the exception of these singular cases, the above integrals are regular and can be directly evaluated by Gauss-Legendre quadrature.By considering Eq. ( 16) at a sufficient number of collocation points, a system of equations can be assembled into a matrix form: where matrix H is a square matrix containing a combination of the integrals of the T kernel and the jump terms, G a rectangular matrix of U kernel integrals, ũ contains the nodal displacement coefficients and t the nodal traction coefficients.Both ũ and t include unknown values and the values prescribed by boundary conditions.Since the NURBS basis functions lack the Kronecker delta property, prescription of non-uniform boundary conditions introduces an added complexity.In this case, the traction and displacement coefficients describing these conditions can be calculated by collocating at a sufficient number of points on the physical surface.Application of the boundary conditions in the usual BEM fashion then yields the final form of the linear system: where matrix A contains the entries of kernel coefficients associated with the unknown displacements and tractions, λ includes all the unknown displacement and traction coefficients and f the column vector of all the known values.This linear system can now be solved using any solver capable of dealing with a dense, non-symmetric matrix.
In addition, in this paper, the Greville abscissae [58] are adopted to provide a set of fixed collocation points, since the control points may lie off the physical boundary Γ and therefore cannot generally be chosen for collocation in IGABEM, a polar transformation [59] is used to evaluate the weakly singular integrals and the regularized BIE form [60] is adopted to preclude the requirement to evaluate strongly singular integrals.It should further be noted that the computation of the NURBS basis functions incurs somewhat more computational cost than the conventional polynomial shape function.This is an important consideration because, as will be shown, quadrature dominates the run time.The balance between the additional costs of quadrature in IGABEM and the reduced costs from reduction in the required model size will be an interesting research topic.

FAST MULTIPOLE EXPANSION OF IGABEM
It has been seen that the BEM offers significant advantages in its boundary-only description, giving it a natural advantage over volumetric analysis methods in exploiting the benefits of the isogeometric concept.Attention is now turned to alleviating the computational burden for large problems.The basic idea of the Fast Multipole Method (FMM) is to apply an iterative solver (e.g.GMRES) to solve Eq. ( 18) and use multipole expansions to accelerate the matrix-vector multiplication (Aλ) in each iteration step without forming the explicit matrix A.

Integral redefinition
Before executing the multipole expansion process, a key step in the FMM is to find a separation of variables, i.e. to separate the effect of the source and field points in the expression of the kernel functions.Eq. ( 18) may be rewritten: where is the submatrix generated from the kernel T or U, λ = { λj }, f = { fi }, I the index of collocation points and J the index of control points.Further, we can reformulate (19) as: where the left side includes the integral of unknown displacement and traction which can be defined respectively as: Copyright where Γ u and Γ t denote the portions of boundary Γ over which the displacements and tractions are unknown.
The right hand side includes the integrals of kernels multiplied by known displacement and traction, and can similarly be defined: with where Γ ū and Γt denote the portions of boundary Γ over which displacement and traction boundary conditions are prescribed.
For any source point s, the singularity is cancelled by expressing the displacment at s either as u(s) in Eq. ( 21) or as ū(s) in Eq. ( 24), depending on whether the source point has a displacement or traction boundary condition.Further, in Eq. ( 21), ( 22), ( 24) and ( 25), each one can be divided into three parts: the far-field, near-field and singular components.For example, Eq. ( 21) can be separated as: Only the far field part can be considered using the fast multipole expansion; the near-field and singular part will be evaluated directly.Each of the four integrals above contains products of kernels and the displacements or tractions at the field point.This relation between the source point and field point can be expressed in a discretized form: where k = 1, . . ., n c is the index of the source points (collocation points), l = 1, . . ., n g the index of the field points (Gauss points), K t the equivalent kernel of traction, K u the equivalent kernel of displacement, σ u the equivalent displacement at each field point and σ t the equivalent traction.For brevity, we use K or σ to denote both of them as required.
In order to develop the FMM in a separated form, the equivalent displacement kernel K u has the same form as U, and the the equivalent traction kernel K t should have the same form as T but we remove the effect of the unit outward normal vector n at the field point, leaving the new traction kernel written as: with where i, j, k denote the index from 1 to 3 representing the x-, yand z-directions.Expressions for σ u and σ t are given as follows: where w eg is the weight associated with each field point, and e, a, e 0 , a 0 denote the same meaning as Eq. ( 16).The term n eg is the transpose of the unit outward normal vector at the field point, and is included here following its removal from the traction kernel.

Decomposition of the kernel function
After rewriting the integrals, the source points and field points should be separated in the kernel.We seek to approximate the kernel K as the following form: where in three dimensions, with a directly corresponding expression for R(x n , x).The above expression can be approximated by any suitable polynomial interpolation functions.Chebyshev polynomials are used in the current work which is based on [43] and reproduced here.So m is the index of Chebyshev nodes associated with source points and n the Chebyshev nodes index for field points.The Chebyshev nodes are located at: The mapping function S(s m , s) has the following form: where with the same form used for T l (s m ) and S(x n , x).

Far field approximation
In order to fully separate the source points and field points, for a 2-D problem, a quadtree structure should be built and, for 3-D, an octree structure.Fig. 1 shows a quadtree example.A square (2-D) or cube (3-D) is called a cell, and the cells exist at different levels.The level l 0 cell should cover the whole problem domain.A higher level cell at level l q is derived from the subdivision of its parent cell at level l q−1 .Cells are divided into increasingly higher levels until a suitable level l max is reached, when the number of source and field points contained in the cell is smaller than a prespecified number, and this highest level cell is called a leaf.Large differences in size of adjacent elements Figure 1.A quadtree based bbFMM scheme should be avoided, not only to limit the near-field area but also to keep computational stability.
The source points and field points are allocated to their corresponding leaves.The definition for the far-field is that two cells are nonadjacent but their parent cells are adjacent, and the near-field definition is that the two cells are adjacent.Similar to the conventional FMM, the process of far field approximation through bbFMM can be described as the following step-by-step procedure: Step 1. Multipole expansion (moment) In Fig. 1, the orange point denotes an arbitrary field point (Gauss point) existing in a leaf at level l 3 .More generally, for any field point in a leaf J max at level l max , the weights in Eq. ( 34) associated with the Chebyshev nodes xJmax n can be calculated by: where m Jmax denotes a vector formed by all the Chebyshev nodes in the leaf J max , x l the field point in the leaf and l the index of this field point.
Step 2. Moment-to-moment translation (M2M) The M2M process (represented by the blue arrows in Fig. 1) is used to generate a transfer matrix from a child cell J q+1 at level l q+1 into its parent cell J q at level l q .The transfer matrix is given as: where n is the index of Chebyshev nodes in the child cell J q+1 .The same transfer matrix can be formed at each level and stopped at level l 2 .The transfer matrices can be stacked as: Step 3. Moment-to-Local translation (M2L) In order to avoid the singularity, the lowest level to be considered for the FMM approximation of the far-field interactions will be level l 2 .For the same reason, the cells should not be adjacent, i.e. there is no common vertex shared by two cells.Thus, a list will be formed for each pair (I q , J q ) of source point and field point.In the list, only the pairs lying in nonadjacent cells (interaction list) will be considered by the bbFMM scheme, and others will be evaluated directly.Now, the coefficient matrix associated with two sets of Chebyshev nodes in the interaction list can be written as: where I q and J q denote the different cells of two sets of Chebyshev nodes located at level l q .The M2L stage is the most numerically intensive.Each cell containing the source point will interact with up to 6 3 − 3 3 = 189 other cells at the same level, each of which is described by a multiplicity of Chebyshev nodes.There will therefore be a serious detrimental effect on performance if no action is taken to accelerate the computation of these operators.We note that the M2L computation is also the most expensive in classical FMM schemes, though in bbFMM we can expect the additional cost of computing Chebyshev interpolations instead of a few terms in a Taylor series in the multipole expansions.
Step 4. Local-to-local translation (L2L) By reversing the process of M2M, the transfer matrices from the lowest level to the highest level can be written as: where cell I q is the parent of cell I q+1 and m the index of Chebyshev nodes in the child cell.
Step 5. Local expansion (local) A source point located in a leaf I max of level l max can be considered following the same procedure as the moment: where w is a vector spanned by all the Chebyshev nodes in the leaf I max and k the index of the source point.
Step 6. Assembly For a specific source point, the effect at a field point can now be written as: Here, q can identify any level which depends on the interaction list.A summation form now can be written to evaluate all the far-field contributions: x l ∈(Iq,Jq) W Imax,...,Iq x l ∈(Iq,Jq) B Iq,Jq x l ∈(Iq,Jq) M Jq,...,Jmax The far field part on the right hand side of Eq. ( 20) follows the procedure as above.However, in order to execute the last step efficiently, some optimization for the M2L operation is needed to reduce the time required to compute the matrix-vector products.A low rank approximation based on the Proper Generalized Decomposition (PGD) for matrix compression will be presented in the next section.

Optimization of the M2L operation by PGD
The PGD approach is based on a natural phenomenon that any data can be represented by the combination of the information in each of their directions.In other words, the kernel function K may be written in a separated form: where the operator • denotes a Hadamard product.For the sake of brevity, we write the above equation as: The PGD approximation is shown above as a summation of n terms comprising functional products.Instead of attempting to approximate all the matrices describing the kernel behaviour between Chebyshev points in two cells in the interaction list at some level in the bbFMM, an important development here is that we seek to use PGD to approximate the behaviour of the kernel over the entire domain of far-field interactions.Recalling the summation process (Eq.( 46)), for each source cell, there are at most 189 far-field cells which will interact and a leaf cell.Hence, executing the summation process will incur memory and time cost, especially for the 3-D problem.Writing the kernel function as above has some remarkable advantages: 1) acceleration of the matrix-vector products in Eq. ( 46), 2) reduction in memory required to store the kernel information.
For the acceleration of the M2L process, if we seek a low rank approximation K(φ 1 , . . ., φ 6 ) to the kernel function K(φ 1 , . . ., φ 6 ), we may consider this as the solution of a simple governing equation: We might use a weighted residual method to seek this solution, and the corresponding weighted residual form will be where K * is an arbitrary weight function.
An initial guess can always be made to generate the first approximation, and can be written as: Here, the initial guess could use some simple functions (e.g. ) or the modes of a similar problem previously solved.We further note that K t includes 27 coefficients, which can be written as a 3 × 3 × 3 matrix or a 3 × 9 matrix as required, and a 3 × 3 matrix is used to approximate the displacement kernel K u for a 3-D problem.
In order to reach a sufficiently accurate low-rank approximation, an enrichment process will be carried out as follows: where the functions ) are the only unknowns.In order to solve this non-linear problem, an alternating direction scheme is used.At any iteration step p, the approximated function is: where F d,1 n (φ d ) can be taken from the previous modes, e.g.F d 1 (φ d ), and it remains to find F k,p n (φ k ).The weight function K * is given the form: Copyright c 2017 John Wiley & Sons, Ltd.Int.J. Numer.Meth.Engng (2017) Prepared using nmeauth.clsDOI: 10.1002/nme Substituting Eq. ( 53) and ( 54) into Eq.( 50) yields: The φ k -direction integrals are then evaluated over Ω k to yield: where The corresponding strong form is given as: from which F k,p n (φ k ) may readily be obtained.By increasing p, we can continue to alternate between progressively improving forms of F d n (φ d ) until convergence is achieved.A suitable stopping criterion for the iteration process is given as: where the • is the L 2 -norm and ˜ (p) is a suitable threshold value.After some enrichment steps, the whole approximation process will be stopped once the following criterion is satisfied:  with an appropriate threshold value (n).Now, back to Eq. ( 46), all the matrices B Iq,Jq could be grouped as a single matrix: substitute the related Chebyshev nodes into Eq.( 47) and group the first and last 3 terms as: by using Eq. ( 64), Eq. ( 63) can be written as: here, the kernel function has been represented as a separated form with small size.

Algorithm
The full implementation of bbFMM with IGABEM is shown in Fig. 2. It may be noticed that the steps except for the iterative part can be precomputed, and the iterative part will be calculated by GMRES until convergence.Where we refer to a preconditioning matrix we use a block diagonal preconditioner formed by directly evaluating the singular integrals, though other preconditioners are available.In addition, the implementation of bbFMM with conventional BEM follows the above steps, the only differences being in the shape functions used and integral interval ([0, 1] → [−1, 1]).

Kernel approximation by Chebyshev polynomial
Before presenting the data compression, it is valuable to consider the influence of the number of Chebyshev nodes.It is clear that the accuracy of interpolation will be improved by increasing the number of Chebyshev nodes, but this comes at some computational cost.The relative error can be defined as: where K c denotes the kernel interpolated by Chebyshev nodes, K d is the kernel evaluated directly and • the L 2 -norm (similarly hereinafter).The test cases are depicted in Fig. 3; the case (a) shows two cells which would lie in an interaction list, and are amenable to FMM acceleration, while the case (b) shows two adjacent cells.The bounding box (cell 0 ) can be expressed as [0, 10] × [0, 10] × [0, 10], and in both cases we consider the source point coordinates (1, 1, 1) and a field point located at (9,9,9).The relative errors in the traction and displacement kernels, approximated by using different numbers of Chebyshev nodes, are illustrated in Fig. 4. It can be seen that with an increasing number of nodes, the accuracy is consistently improved; once 5 nodes are used in each direction the accuracy will become acceptable for the nonadjacent case, but approximating through two adjacent cells will lead to some relatively large errors even though the source and field points are unchanged.

Data compression by PGD
While the number of Chebyshev nodes used determines the accuracy of the scheme, the number of PGD modes (i.e.enrichment steps) determines the amount of compression that can be achieved during the M2L process.We introduce a relative error defined as: where K is the approximated kernel, which can be calculated by PGD or SVD.For convenience, we only compare a single pair of cells appearing in the interaction list which can be referred to Fig. 3a.Fig. 5a shows the relative error E p obtained for the traction kernel using different numbers of PGD modes and Chebyshev nodes, denoted n.It can be seen that an increase in the number of Chebyshev nodes brings a requirement for more modes in order to satisfy the stopping criterion (Eq.( 62)).After 5 Chebyshev nodes, the number of modes required will be almost fixed with the same relative error.Fig. 5b shows the SVD computation to have the same tendency as the PGD result.By comparing these two figures, it is evident that the PGD solutions are not able to reach a lower relative error than the SVD solutions; this is mainly because the coefficients α in the PGD enrichment process will become larger after several iterations, eventually exceeding the range of values represented by double precision (i.e. from 1.79769 × 10 308 to 2.22507 × 10 −308 ).However, both PGD and SVD yield a reduction in the size of the data structure, and in a separated form.Fig. 6a and Fig. 6b show the corresponding analysis for the K u kernel.The behaviour is similar to that for the K t kernel.
There are two variables which can control the final number of modes generated by the PGD algorithm.One variable (n), which can be treated as a global threshold value, decides the precision of the final approximate result.Another variable ˜ (P ), is a local threshold value which can influence both the computational speed and the final number of modes.Fig. 7 shows the relationship between the local threshold value ˜ (P ) and the final number of modes required to generate the PGD.It can be seen that, in each enrichment step, by decreasing the local threshold value, a smaller size of data structure will result; in other words, this offers a higher compression ratio, but at the expense of additional computational effort.
The PGD algorithm presented in section 3.4 proceeds in a very different way to the SVD compression algorithms in the literature.While SVD has been shown to provide a low-rank approximation to a matrix of kernel functions relating Chebyshev nodes in one cell to those in another cell at the same level, the PGD algorithm approximates the kernel across the problem domain (though it applies only in the regions of far-field interaction).In spite of this fundamental difference, the computational complexity of PGD and SVD may be further revealed by a test in which we limit our PGD to the scope of the SVD algorithms, i.e. approximating the kernel behaviour between the Chebyshev nodes contained in two cells.We present in Fig. 8 the run time required by both schemes to complete this operation.By applying different numbers of Chebyshev nodes, the size of the data structure will grow rapidly.Here, as the Chebyshev interpolation is built in 3-D, the size of the kernel matrix being approximated will be related to the cube of the number of Chebyshev nodes in each direction, and this is shown as "degrees of freedom" on the horizontal axis of Fig. 8.
It can be seen that, once around 100 DOF is reached, the PGD exhibits O(n 2 ) complexity while the SVD exhibits O(n 3 ) for the computation of the M2L process.This significant improvement found in approximating the kernel only between adjacent cells will be magnified further by using PGD to approximate the kernel over the whole domain of far-field interactions, not just a single cell.It will be shown in section 4.3 that this new approach accelerates the solution of problems across the whole range of problem sizes tested.From the discussion above, the advantages of using PGD compared with SVD can be summarised as: 1) run time will be clearly decreased; 2) PGD offers a flexible choice for the number of modes generated; a loose compression will be obtained quickly and a tight compression will further accelerate the final solving process; 3) without precompressing [43], any suitable quadrature rule can be directly employed in the coefficient integrals (Eq.( 57), ( 58) and ( 59)) contained in of the enrichment process.This will further enhance the efficiency of computing matrix-vector products.

Quarter cylinder example
The first example studied is a single quarter cylinder, shown in Fig. 9, under an internal pressure for which the analytical solution gives a maximum displacement of 0.341215m.The right, left and bottom surfaces have prescribed displacement constraints in the direction normal to each surface.The height is 1m, inner radius 0.4m and outer radius 1m.The control points and geometry are displayed in the figure.An octree structure is presented in Fig. 10 and 11.In Fig. 10a and 11a, as we adopt a 10×10 Gaussian quadrature scheme in each patch, the octree structure is generated depending on the location of source and field points, so there will exist some parts uncovered by the leaves.With the patch size decreasing, the leaves could fully contain all the patches (Fig. 10b  and 11b).Note that some cells appear as rectangles or other shapes that are not normally seen in octree subdivisions, this optical effect simply results from the fact that we draw only the cells that contain source and field points.Figs.12a and 12b show that both a coarse and fine model are able to approximate well the analytical deformation magnitude.
In order to compare fully the efficiency between the bbFMM IGABEM and the conventional IGABEM, a knot insertion algorithm [10] is introduced to increase the DOF.During the computation, the Gaussian elimination method was adopted for solving the final matrix generated by conventional IGABEM and a GMRES iterative solver was introduced for the bbFMM IGABEM approach, with a residual tolerance of 10 −5 .For both schemes the same Gaussian quadrature rule was used for the integrals (n g = 10).The interpolation in the bbFMM approach was constructed using 5 Chebyshev nodes in each direction.All the codes were executed on a 2.80GHz quadcore processor with 8 parallel threads, but in order to provide a true comparison of the algorithms, the core is limited to only one with a single thread.
The runtime cost is shown in Fig. 13.Compared with the conventional IGABEM, the time cost of the bbFMM IGABEM for solving a large problem will be clearly decreased.The O(n 2 ) behaviour for conventional IGABEM and O(n) for bbFMM IGABEM are obtained.This suggests that the computational complexity O(n 3 ) for solving the conventional IGABEM linear system by using Gaussian elimination does not play a dominant role in the overall computation.On the contrary, the matrix assembly will dominate.On the other hand, the bbFMM algorithm combined with PGD outperforms the use of SVD.When PGD is used in preference to SVD performance gains are seen across the range of model sizes tested, and in the worst case this gain is approximately a factor of 2. It is interesting to note the improvement for small problem sizes, effectively offsetting the overheads involved in using FMM for these problems.Although the time cost for the M2L operation is almost fixed, depending only on the depth of the octree structure, use of PGD will provide significant improvements for calculating the matrix-vector products.Furthermore, the pre-computational time can be saved.Fig. 14 shows the maximum memory requirement for the two methods.Although the bbFMM incurs a lower memory cost than conventional IGABEM, both methods exhibit O(n) memory requirement.While readers might expect the memory requirement to be O(n 2 ) if we are storing a matrix, the O(n) behaviour arises because the main source of memory consumption is that, in order to improve computational efficiency, we store the basis function values, quadrature points and normal vectors.For the range of model sizes tested this is seen to dominate over the memory requirement to store the system matrix terms.Meanwhile, an analysis of the relative error compared with the conventional IGABEM, shown in Tab.I, shows that the accuracy is almost unaffected by the use of this PGD accelerated bbFMM algorithm.In this where ũea and ũex ea are the coefficients that allow us to recover the approximate and exact displacements, respectively, in a NURBS basis.

Torque arm example
A typical mechanical part in engineering, the torque arm, is selected as the second case study.The geometry is shown in Fig. 15, in which, for clarity, the control points have been omitted.The torque arm is subjected to a pressure of 100MPa prescribed on the surface of the circular hole on the right side, and the component is fixed at the hole on the left.Young's modulus and Poisson's ratio are 200 GPa and 0.3 respectively.The octree structure is generated from a 0.5m initial cube.
The model consists of 4794 DOF and 400 Bézier elements, and requires a run time for conventional IGABEM and bbFMM IGABEM of 2453s and 24s respectively.Fig. 16 shows contours of the deformation results.From the studies presented, it is evident that by using the bbFMM scheme, the IGABEM analysis can be considerably accelerated.Furthermore, the PGD algorithm has addressed the computational bottleneck in data compression that is seen with bbFMM schemes that rely upon SVD, and this makes the M2L operators and matrix-vector products much faster to compute than before.

PARAMETRIC STUDY
In order to evaluate the influence of the parameters used in this paper, a parametric study is performed by using the quarter cylinder geometry in Sec. 4  The study includes four main parameters: computational complexity with knot insertion (h), NURBS degree (p), number of Chebyshev points (n c ) and precision of PGD approximation ( (n)).
The first study is related to the scaling of elements of the run time under an h-refinement scheme.In an IGABEM context this is done by knot insertion, this was shown in Sec.4.3 but here we show some extra details.The knot vector used for the initial geometry (Mesh 0) is Ξ = {0, 0, 0, 1, 1, 1} for each dimension in parametric space.We then insert knots h times with p repetitions to arrive at Ξ = {0, 0, 0, ξ 1 , ξ 1 , . . ., ξ h , ξ h , 1, 1, 1}, so that each surface is converted into (h + 1) × (h + 1) individual Bézier elements for calculation.This can be seen as an isogeometric form of h-refinement and the details are shown in Table II, where n e is the number of Bézier elements, E r the relative error computed by using Eq.(68), t f the time cost of the far-field integration during the whole calculation process, t d the time cost of the direct integration (which includes both the near-field and singular parts), t p the PGD computational time during the pre-computing process, t g the time cost of GMRES solver without matrix-vector products, t the total run time and n g the number of GMRES iterations.From the table it is evident that the cost of the far-field and direct parts both exhibit a linear complexity, the time cost of PGD computation will be O(1) as the source and field points occupy some relatively stable octree cells, the time cost of GMRES mainly includes two parts: the least squares problem (O(n 2 g )) and Arnoldi iteration (O(n dof × n g )) which also exhibits an almost linear computational complexity with little increase in the number of iterations.The second study relates to degree elevation (the isogeometric form of p-refinement).The surface degree in each dimension is increased from 2 to 7 with the knot vector expressed as Ξ = { p+1 0, . . ., 0, p+1 1, . . ., 1}.Table III shows the comparisons between different degrees, and it can be seen that the relative error will increase for large p and cost additional time during the local expansion, so the surface degrees p = 2, 3, 4 are recommended depending on the complexity of the geometry.The third study investigates the influence of the number of Chebyshev nodes, n c , used (Table IV).One can find that use of an insufficient number of Chebyshev nodes will lead to a degradation in accuracy in the final result, although it will incur a reduced time cost in the far-field, direct and PGD parts.The convergence speed will also degrade.For more Chebyshev nodes, the computation will obtain a good convergence property but cost more time so that use of n c = 5, 6 is recommended.
The last study relates to the influence of the precision of the PGD (Tab.V).A low level compression will make the far-field computation fast for each iteration but with poor precision and convergence properties, and a high level compression will cause more time to be spent during each matrix-vector calculation.The results suggest a tolerance (n) = 10 −5 ∼ 10 −6 is suitable.

CONCLUSION
Because of its boundary-only modelling, with no requirement for a volumetric NURBS geometric definition, the boundary element method is an ideal choice for isogeometric analysis of solids in 3-D.In this paper, we have presented a black-box fast multipole isogeometric boundary element analysis for the analysis of elasticity in 3-D solid components.The algorithm is combined with a model reduction approach based on the use of proper generalized decomposition to accelerate the computation of the M2L operators and matrix-vector products.When PGD is used in preference to SVD performance gains are seen across the range of model sizes tested, and in the worst case this gain is approximately a factor of 2. Together, these strategies provide a promising computational approach for the fast analysis of large-scale elastostatics problems.The computational resources required are decreased without significant loss of accuracy.As this method can be easily integrated into conventional IGABEM codes, it will be a good choice for future application in commercial software for wide dissemination to the industrial community.

Figure 2 .
Figure 2. Algorithm of the bbFMM solver

Figure 3 .
Figure 3. Geometry of the test cases

Figure 4 .
Figure 4. Relative error with the number of Chebyshev nodes

Figure 5 .
Figure 5. Relative error of K t kernel with the number of modes

Figure 6 .Figure 7 .Figure 8 .
Figure 6.Relative error of K u kernel with the number of modes

Copyright c 2017 Figure 9 .
Figure 9. Quarter cylinder geometry with NURBS control points

Figure 10 .Figure 11 .
Figure 10.Octree structure of the quarter cylinder model (Front view)

Figure 12 .
Figure 12.Deformation solution with undeformed patches of the quarter cylinder model

Figure 13 R
Figure 13.Time comparison Figure 14.Memory comparison

Figure 16 .
Figure 16.Deformation solution with undeformed patches of the torque arm model

Table I .
Relative error of the quarter cylinder problem, bbFMM vs. conventional IGABEM .3 with the same boundary conditions.

Table II .
Parameter study: h-refinement

Table III .
Parameter study: p-refinement