Advances in Task-Based Parallel Programming for Distributed Memory Architectures
- Location: ITC/2446, ITC, Lägerhyddsvägen 2, Uppsala
- Doctoral student: Zafari, Afshin
- About the dissertation
- Organiser: Avdelningen för beräkningsvetenskap
- Contact person: Zafari, Afshin
It has become common knowledge that parallel programming is needed for scientific applications, particularly for running large scale simulations. Different programming models are introduced for simplifying parallel programming, while enabling an application to use the full computational capacity of the hardware. In task-based programming, all the variables in the program are abstractly viewed as data. Parallelism is provided by partitioning the data. A task is a collection of operations performed on input data to generate output data. In distributed memory environments, the data is distributed over the computational nodes (or processes), and is communicated when a task needs remote data.
This thesis discusses advanced techniques in distributed task-based parallel programming, implemented in the DuctTeip software library. DuctTeip uses MPI (Message Passing Interface) for asynchronous inter-process communication and Pthreads for shared memory parallelization within the processes. The data dependencies that determine which subsets of tasks can be executed in parallel are extracted from information about the data accesses (input or output) of the tasks. A versioning system is used internally to represent the task-data dependencies efficiently. A hierarchical partitioning of tasks and data allows for independent optimization of the size of computational tasks and the size of communicated data. A data listener technique is used to manage communication efficiently.
DuctTeip provides an algorithm independent dynamic load balancing functionality. Redistributing tasks from busy processes to idle processes dynamically can provide an overall shorter execution time. A random search method with high probability of success is employed for locating idle/busy nodes.
The advantage of the abstract view of tasks and data is exploited in a unified programming interface, which provides a standard for task-based frameworks to decouple framework development from application development. The interface can be used for collaboration between different frameworks in running an application program efficiently on different hardware.
To evaluate the DuctTeip programming model, applications such as Cholesky factorization, a time-dependent PDE solver for the shallow water equations, and the fast multipole method have been implemented using DuctTeip. Experiments show that DuctTeip provides both scalability and performance. Comparisons with similar frameworks such as StarPU, OmpSs, and PaRSEC show competitive results.