| Cray T3ETM Fortran Optimization Guide - 004-2518-002 | ||
|---|---|---|
| Prev Section | Next Section | |
The Parallel Virtual Machine (PVM) message-passing library passes messages between PEs to distribute data and to perform other functions necessary for running programs. (The network version of PVM, which enables message passing between computer systems, is not described in this publication.) For background information on PVM, see Section 1.1.1.
The differences between PVM on a CRAY T3D system and PVM on a CRAY T3E system are few. The major difference is that the channels feature is not implemented on the CRAY T3E system. But optimizations that worked on the CRAY T3D system should still work on the CRAY T3E system.
This chapter describes the following methods of speeding up your PVM program:
Saving extra transfers by setting the size of a message properly (see Section 2.1).
Allocating the most efficient send buffers, depending on the nature of your message (see Section 2.2).
Realizing the performance advantage of 32-bit data (see Section 2.3).
Using routines that are optimized for sending and receiving stride-1 data (see Section 2.4).
Making quick improvements by mixing optimized send and receive routines (see Section 2.5).
Avoiding performance pitfalls when initializing and packing data (see Section 2.6).
Accomplishing work while you wait for messages (see Section 2.7).
Minimizing wait time by avoiding barriers (see Section 2.8).
Using broadcast rather than multicast when sending data to multiple PEs (see Section 2.9).
Minimizing synchronization time and maximizing work time when receiving data (see Section 2.10).
Using the reduction functions to execute an operation on multiple PEs (see Section 2.11).
Distributing data from one PE to multiple PEs and gathering data from multiple PEs to a single PE (see Section 2.12).
Setting the size of a message properly can save you extra transfers and, consequently, message-passing overhead. You can control the size of a message by setting the PVM_DATA_MAX environment variable. The default size for the first message sent is 4,096 bytes, or 512 64-bit words, which should be large enough for most messages. If the data in a message is larger than the value of PVM_DATA_MAX, however, the data will be divided up into parts, and the parts will be sent in separate messages until all of it has been delivered.
To find the current value of PVM_DATA_MAX within your program, use the PVMFGETOPT(3) routine, as follows. The variable MAXmax will hold the maximum message size value, in bytes.
CALL PVMFGETOPT(PVM_DATA_MAX, MAX) |
% setenv PVM_DATA_MAX 8192 % ./a.out |
Increasing the size of PVM_DATA_MAX is not always the best solution. If you have one or two large transfers in your program, but a number of smaller transfers, you may not want to increase the size of all messages. Adjusting the size of PVM_DATA_MAX may not help your overall performance. It takes away from the memory available to the application, and a large message is not always transferred quickly, especially when it is broadcast to multiple PEs.
Breaking the large messages up into smaller messages may be faster in some cases. Whether this proves to be faster in your program depends upon the application. You may have to time the program to find out. For information on timing your code, see Section 1.3.
PVM does not handle large amounts of data in the same way as small amounts. For large transfers (greater than the value of PVM_DATA_MAX), the message contains the first chunk of data and the address of the data block on the sending PE. After the receiving PE unpacks, it uses remote loads to get the remainder of the data.
Often, remote stores used for short messages can occur at the same time as computation on the receiving PE. But with large messages, remote loads require the receiving PE to wait until the loads complete. If the same data is being sent to several PEs, those PEs may all try to do remote loads at the same time, creating a slowdown as they share the limited memory bandwidth.
| Prev Section | Table of Contents | Title Page | Next Section |
| Measuring Performance | Up one level | Allocating Send Buffers |