Design, analysis and performance evaluation of parallel algorithms for solving triangular linear systems on multicore platforms
RAIRO. Operations Research, Tome 55 (2021) no. 2, pp. 545-559

In this paper, we focus on the schedulings of 2-steps graph with constant task cost obtained when parallelizing algorithm solving a triangular linear system. We present three scheduling approaches having the same least theoretical execution time. The first is designed through solving a 0-1 integer problem by Mixed Integer Programming (MIP), the second is based on the Critical Path Algorithm (CPA) and the third is a particular Column-Oriented Scheduling (COS). The MIP approach experiments were carried out and confirmed that the makespan values of the MIP scheduling coincide with those of the corresponding lower bound already reached. Experimental results of the last two approaches detailing both makespans and efficiencies are presented and show that their practical performances differ though they are theoretically identical. We compare also these results to those of the appropriate procedure into so-called PLASMA library (Parallel Linear Algebra for Scalable Multi-core Architectures).

DOI : 10.1051/ro/2021009
Classification : 90C10, 68M20, 65F05
Keywords: 0-1 integer problem, task scheduling, parallel algorithm, PLASMA library, triangular linear system
@article{RO_2021__55_2_545_0,
     author = {Belmabrouk, Mounira and Marrakchi, Mounir},
     title = {Design, analysis and performance evaluation of parallel algorithms for solving triangular linear systems on multicore platforms},
     journal = {RAIRO. Operations Research},
     pages = {545--559},
     year = {2021},
     publisher = {EDP-Sciences},
     volume = {55},
     number = {2},
     doi = {10.1051/ro/2021009},
     mrnumber = {4238786},
     zbl = {1472.90069},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ro/2021009/}
}
TY  - JOUR
AU  - Belmabrouk, Mounira
AU  - Marrakchi, Mounir
TI  - Design, analysis and performance evaluation of parallel algorithms for solving triangular linear systems on multicore platforms
JO  - RAIRO. Operations Research
PY  - 2021
SP  - 545
EP  - 559
VL  - 55
IS  - 2
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/ro/2021009/
DO  - 10.1051/ro/2021009
LA  - en
ID  - RO_2021__55_2_545_0
ER  - 
%0 Journal Article
%A Belmabrouk, Mounira
%A Marrakchi, Mounir
%T Design, analysis and performance evaluation of parallel algorithms for solving triangular linear systems on multicore platforms
%J RAIRO. Operations Research
%D 2021
%P 545-559
%V 55
%N 2
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/ro/2021009/
%R 10.1051/ro/2021009
%G en
%F RO_2021__55_2_545_0
Belmabrouk, Mounira; Marrakchi, Mounir. Design, analysis and performance evaluation of parallel algorithms for solving triangular linear systems on multicore platforms. RAIRO. Operations Research, Tome 55 (2021) no. 2, pp. 545-559. doi: 10.1051/ro/2021009

A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki and A. Yarkhan, Linear algebra software for large-scale accelerated multicore computing, Acta Numer. 25 (2016) 1–160. | MR | Zbl | DOI

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek and S. Tomov, Numerical linear algebra on emerging architectures: the plasma and magma projects. In: Vol. 180 ofJournal of Physics: Conference Series. IOP Publishing, Bristol, UK (2009) 012037.

M. Belmabrouk and M. Marrakchi, Optimal parallel scheduling for resolution a triangular system with availability constraints. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), IEEE, Piscataway, NJ, USA (2015) 1–7.

M. Belmabrouk and M. Marrakchi, Comparison of parallel scheduling for triangular system resolution on multi-core processors. In: 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT). IEEE, Piscataway, NJ, USA (2017) 0651–0656. | DOI

A. Buttari, J. Langou, J. Kurzak and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35 (2009) 38–53. | MR | DOI

A. Charara, D. Keyes and H. Ltaief, A framework for dense triangular matrix kernels on various manycore architectures. Concurrency Comput. Pract. Experience 29 (2017).

E. G. Coffman and P. J. Denning, Operating Systems Theory. Prentice-Hall Englewood Cliffs, NJ, USA (1973).

J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. Yarkhan, M. Abalenkovs, N. Bagherpour and S. Hammarling, Plasma: Parallel linear algebra software for multicore using openmp. ACM Trans. Math. Softw. (TOMS) 45 (2019) 1–35. | MR | Zbl | DOI

C. A. Floudas and X. Lin, Mixed integer linear programming in process scheduling: modeling, algorithms, and applications. Ann. Oper. Res. 139 (2005) 131–162. | MR | Zbl | DOI

J. González-Domínguez, M. J. Martín, G. L. Taboada and J. Touriño, Dense triangular solvers on multicore clusters using upc. Proc. Comput. Sci. 4 (2011) 231–240. | DOI

Grid’5000, [online] https://www.grid5000.fr/mediawiki/index.php/Grid5000:Home (2007).

R. Iakymchuk, D. Defour, S. Collange and S. Graillat, Reproducible triangular solvers for high-performance computing. In: 2015 12th International Conference on Information Technology-New Generations. IEEE, Piscataway, NJ, USA (2015) 353–358. | DOI

IBM ILOG CPLEX Optimization Studio CPLEX Users Manual (1999).

IBM Knowlege Center, Solution of triangular system of equations with a single right-hand side. [online] https://www.ibm.com/support/knowledgecenter/.

X. Jin, T. Yang and X. Tang, A comparison of cache blocking methods for fast execution of ensemble-based score computation. In: Proceedings of the 39th International ACM SIGIR Conference On Research and Development in Information Retrieval (2016) 629–638.

C. C. Kjelgaard Mikkelsen, A. B. Schwarz andL. Karlsson, Parallel robust solution of triangular linear systems. Concurrency Comput. Pract. Experience 31 (2019) e5064. | DOI

M. Marrakchi, Optimal parallel scheduling for the 2-steps graph with constant task cost. Parallel Comput. 18 (1992) 169–176. | Zbl | DOI

P. D. Michailidis and K. G. Margaritis, Parallel direct methods for solving the system of linear equations with pipelining on a multicore using openmp. J. Comput. Appl. Math. 236 (2011) 326–341. | MR | Zbl | DOI

N. M. Missirlis and F. Tjaferis, Parallel matrix factorizations on a shared memory mimd computer. In: International Conference on Supercomputing. Vol. 297 of: Lecture Notes in Computer Science. Springer, Berlin-Heidelberg (1987) 926–938.

OpenMP, The OpenMP API specification for parallel programming. [online] http://openmp.org (1997).

PLASMA, [online] http://icl.cs.utk.edu/projectsfiles/plasma/html/htmlbrowsing/dtrsm.c.html (2009).

H. Shioda, K. Konishi and S. Shin, Optimal task scheduling algorithm for parallel processing. In: Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science. Vol. 145 of: Advances in Intelligent and Soft Computing. Springer, Berlin-Heidelberg (2012) 79–87.

C. F. Van Loan and G. H. Golub, Matrix Computations. Johns Hopkins University Press, Baltimore, MD, USA (1983). | MR | Zbl

T. Wicky, E. Solomonik and T. Hoefler, Communication-avoiding parallel algorithms for solving triangular systems of linear equations. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, Piscataway, NJ, USA (2017) 678–687. | DOI

A. Yarkhan, J. Kurzak, P. Luszczek and J. Dongarra, Porting the plasma numerical library to the openmp standard. Int. J. Parallel Program. 45 (2017) 612–633. | DOI

Cité par Sources :

This paper is an extension of a previous work presented at Codit’17 [4].