
Enhancing Interpolation Capabilities and Training Convergence of DeepONets for Approximating Parametric Linear PDEs
Please login to view abstract download link
Solving parametric partial differential equations (PDEs) precisely and promptly is crucial in multiple applications nowadays, like real-time prediction [1] or optimal control [2]. A promising alternative to classical methods for addressing this problem is to use deep learning techniques. Particularly, one of the most relevant approaches consists of employing neural operators, which are based on the Universal Approximation Theorem (UAT) [3]. Among the existing neural operators, one of the most popular is DeepONet [3]. Although the DeepONet approach was initially conceived for learning operators, it can approximate parametric PDEs [4]. This method expresses the solution as a separate representation (or low-rank representation) composed of the dot product of some coefficients by some basis functions. The coefficients depend upon the parameters p of the PDE and are generated by a neural network called branch br(p). Meanwhile, the basis functions depend on the spatial coordinates x and are computed by another neural network called trunk tr(x). The main strength of the DeepONets is that after training these neural networks, solving the parametric PDE for a specific parameter value becomes computationally inexpensive as it only involves forward evaluating the model. However, although UAT guarantees a small approximation error, optimization and generalization errors often penalize the method's performance. We propose an alternative formulation to the conventional separated representation, adding an extra linear layer α. The weights of α are computed by least squares, while the rest of the weights of the neural networks are obtained by gradient descent, resulting in a hybrid optimization gradient descent/least squares (GD-LS). This addition aims to enhance the model's training convergence and overall performance [5]. Moreover, we define a loss function that has two components: a physical and a derivative part. The physical part is related to the fulfillment of the equation, while the derivative one is based on the derivatives of the solution w.r.t. the parameters of the equation. The Hermite interpolation [6] inspires this extra term, which aims to enhance the interpolation capabilities of the model. We train the model with data generated from an automatic differentiable version of OpenFOAM, so that we can compute not only the gradients of the solution but also the desired derivatives. Particularly, we solve the convection-diffusion equation.