Περίληψη σε άλλη γλώσσα
Many complicated parallel computing applications are composed of several stages. As the program proceeds from one stage to another, it may require different distribution of data between several processor sets. Such applications are the alternate direction implicit method [29] and the multidimensional Fast Fourier Transform [40], [44], [45]. Data redistribution is therefore required each time the distribution of data to a processor is improper for a certain execution phase [68]. Since the redistribution must be completed during runtime, an efficient algorithm must be adopted. The interest in runtime redistribution was initially motivated by High Performance Fortran (HPF), a parallel programming language [15], [39]. Programs developed in HPF use the ALIGN, DISTRIBUTE and REDISTRIBUTE directives to specify data redistributions [38]. The array redistribution in High Performance Fortran has three types, Block, Cyclic, and Block Cyclic [30], [31], [63]. The block-cyclic array redistribution ...
Many complicated parallel computing applications are composed of several stages. As the program proceeds from one stage to another, it may require different distribution of data between several processor sets. Such applications are the alternate direction implicit method [29] and the multidimensional Fast Fourier Transform [40], [44], [45]. Data redistribution is therefore required each time the distribution of data to a processor is improper for a certain execution phase [68]. Since the redistribution must be completed during runtime, an efficient algorithm must be adopted. The interest in runtime redistribution was initially motivated by High Performance Fortran (HPF), a parallel programming language [15], [39]. Programs developed in HPF use the ALIGN, DISTRIBUTE and REDISTRIBUTE directives to specify data redistributions [38]. The array redistribution in High Performance Fortran has three types, Block, Cyclic, and Block Cyclic [30], [31], [63]. The block-cyclic array redistribution is the most regular among the three types of redistribution [17]. The block cyclic array redistribution with block size x is referred to as cyclic(x). Each block contains x successive elements. In this dissertation, we consider the problem of redistributing data form cyclic (r) on a P-processor grid to cyclic (s) on a Q-processor grid. Two algorithms were developed for the redistribution of messages on a multiprocessor grid; the RCI algorithm is applied to redistribute messages of fixed length, while the RPIPE algorithm is applied to redistribute messages of variable length. In either case, the main goal is to reduce the total Par.rthma 146 redistribution time. The rest of the dissertation is organized as follows: In chapter 2, the main criteria for performance evaluation of a parallel algorithm are presented. Furthermore the differences between static and dynamic parallel algorithms are brought to the fore. Chapter 3 defines the block-cyclic redistribution. The propositions found in this chapter are the basis of the algorithms for dynamically passing messages between various processor sets. Furthermore, the most important algorithms found in literature are presented and compared. These algorithms are divided into two categories; Efficient Communication Scheduling Algorithms and Non-Efficient -Communication Scheduling Algorithms. Chapter 4 presents the two algorithms in detail. The RCI (Row-Column Interchange) Algorithm for exchanging fixed messages on a grid of processors is based on a series of an array’s column transformations proven to lead to a free of contention communication scheme. This array represents the processors’ communication grid. Two methods for massage transfering were developed; the direct method where messages from each processor are directly tranfered to the destination, and the indirect method that uses relay processors to complete the transmission. As the message size increases, the indirect method performs better. The basic idea behind the RPIPE method is to decompose the redistribution problem into a set of pipeline operations. Each pipeline includes a specified number of tasks which are responsible for the message exchanging between carefully selected processor pairs. The main identities of all the tasks in each pipeline operation are: 1. Each pipeline task handles the transmission of processor pairs that have a certain transmission cost. A pipeline operation cannot include two or more tasks that handle message transmissions of the same cost 2. All tasks are scheduled in such a way that all sending processors can initialize send requests to multiple destinations at time intervals which are infinitesimally Par.rthma 147 small, thus reducing the time that processors remain idle, 3. All tasks are scheduled in such a way that receiving processors get one message at a time, thus avoiding congestion. The redistribution of data is composed of three stages: a. The creation of the pipeline tasks stage, b. The message preparation stage, and c. the sending stage. These stages are described in chapter 4. Chapter 5 describes the development of MPL routines for perfoming message passing. The routines were written in MODSIM and were implemented on a simulated IBM SP2 system. A variety of simulation results on different redistribution parameters is also presented. Chapter 6 concludes the dissertation and suggests future research aspects.
περισσότερα