Parallel Overview
This document describes how PDEs are solved in parallel.
In general, the mesh is partitioned and each part is assigned to a different
MPI process. Each element is owned by exactly one process. During
initialization, the mesh constructor on each process figures out which other
processes have elements that share a face (edge in 2D) with local elements.
It counts how many faces and elements are shared (a single element could have
multiple faces on the parallel boundary), and assigns local number to both the
elements and the degrees of freedom on the elements. The non-local elements
are given numbers greater than numEl
, the number of local elements.
The degrees of freedom are re-numbered such that newly assigned dof number plus the dof_offset
for the current process equals the global dof number, which is defined by
the local dof number assigned by the process that owns the element. As a
result, dof numbers for elements that live on processes with lower ranks
will be negative.
As part of counting the number of shared faces and elements, 3 arrays are
formed: bndries_local
, bndries_remote
, and shared_interfaces
which
describe the shared faces from the local side, the remote side, and a
unified view of the interface, respectively. This allows treating the
shared faces like either a special kind of boundary condition or a proper
interface, similar to the interior interfaces.
There are 2 modes of parallel operation, one for explicit time marching and the other for Newton's method.
Explicate Time Marching
In this mode, each process each process sends the solution values at the shared faces to the other processes. Each process then evaluates the residual using the received values and updates the solution.
The function exchangeFaceData
is designed to perform the sending and
receiving of data. Non-blocking communications are used, and the function
does not wait for the communication to finish before returning. The
MPI_Requests for the sends and receives are stored in the appropriate fields
of the mesh. It is the responsibility of each physics module call
exchangeFaceData
and to wait for the communication to finish before using
the data. Because the receives could be completed in any order, it is
recommended to use MPI_Waitany
to wait for the first receive to complete,
do as many computations as possible on the data, and then call MPI_Waitany
again for the next receive.
Newtons Method
For Newton's method, each process sends the solution values for all the elements on the shared interface at the beginning of a Jacobian calculation. Each process is then responsible for perturbing the solutions values of both the local and non-local elements. The benefit of this is that parallel communication is required once per Jacobian calculation, rather than once per residual evaluation as with the explicit time marching mode.
The function exchangeElementData
copies the data from the shared elements
into the send buffer and sends it, and also posts the corresponding receives.
It does not wait for the communications to finish before returning.
The function is called by Newton's method after a new solution is calculated,
so the physics module does not have to do it, but the physics module does
have to wait for the receives to finish before using the data. This is
necessary to allow overlap of the communication with computation. As
with the explicit time marching mode, use of MPI_Waitany
is recommended.