25 A QUANTITATIVE LOOK AT PARALLEL COMPUTATION

Encoding ECC200 In JavaUsing Barcode maker for Java Control to generate, create DataMatrix image in Java applications.

The two main reasons for implementing a parallel program are to obtain better performance and to solve larger problems Performance can be both modeled and measured, so in this section we will take a another look at parallel computations by giving some simple analytical models that illustrate some of the factors that influence the performance of a parallel program Consider a computation consisting of three parts: a setup section, a computation section, and a finalization section The total running time of this program on one PE is then given as the sum of the times for the three parts Equation 21

Bar Code Printer In JavaUsing Barcode printer for Java Control to generate, create barcode image in Java applications.

What happens when we run this computation on a parallel computer with multiple PEs Suppose that the setup and finalization sections cannot be carried out concurrently with any other activities, but that the computation section could be divided into tasks that would run independently on as many PEs as are available, with the same total number of computation steps as in the original computation The time for the full computation on P PEs can therefore be given by Of course, Eq 22 describes a very idealized situation However, the idea that computations have a serial part (for which additional PEs are useless) and a parallelizable part (for which more PEs decrease the running time) is realistic Thus, this simple model captures an important relationship Equation 22

Barcode Decoder In JavaUsing Barcode recognizer for Java Control to read, scan read, scan image in Java applications.

An important measure of how much additional PEs help is the relative speedup S, which describes how much faster a problem runs in a way that normalizes away the actual running time Equation 23

Paint ECC200 In Visual C#Using Barcode creator for .NET Control to generate, create ECC200 image in .NET framework applications.

A related measure is the efficiency E, which is the speedup normalized by the number of PEs Equation 24

Data Matrix 2d Barcode Generator In .NET FrameworkUsing Barcode creation for ASP.NET Control to generate, create Data Matrix image in ASP.NET applications.

Equation 25

Make Data Matrix 2d Barcode In VS .NETUsing Barcode drawer for .NET framework Control to generate, create Data Matrix ECC200 image in VS .NET applications.

Ideally, we would want the speedup to be equal to P, the number of PEs This is sometimes called perfect linear speedup Unfortunately, this is an ideal that can rarely be achieved because times for setup and finalization are not improved by adding more PEs, limiting the speedup The terms that cannot be run concurrently are called the serial terms Their running times represent some fraction of the total, called the serial fraction, denoted Equation 26

Make ECC200 In Visual Basic .NETUsing Barcode generation for .NET framework Control to generate, create ECC200 image in .NET framework applications.

The fraction of time spent in the parallelizable part of the program is then (1 ) We can thus rewrite the expression for total computation time with P PEs as Equation 27

Make Barcode In JavaUsing Barcode creator for Java Control to generate, create bar code image in Java applications.

Now, rewriting S in terms of the new expression for Ttotal(P), we obtain the famous Amdahl's law: Equation 28

Code 128C Maker In JavaUsing Barcode maker for Java Control to generate, create Code 128 Code Set A image in Java applications.

Equation 29

Generate UCC.EAN - 128 In JavaUsing Barcode creator for Java Control to generate, create GS1-128 image in Java applications.

Thus, in an ideal parallel algorithm with no overhead in the parallel part, the speedup should follow Eq 29 What happens to the speedup if we take our ideal parallel algorithm and use a very large number of processors Taking the limit as P goes to infinity in our expression for S yields

Barcode Creator In JavaUsing Barcode creation for Java Control to generate, create barcode image in Java applications.

Equation 210

Barcode Printer In JavaUsing Barcode drawer for Java Control to generate, create barcode image in Java applications.

Eq 210 thus gives an upper bound on the speedup obtainable in an algorithm whose serial part represents of the total computation These concepts are vital to the parallel algorithm designer In designing a parallel algorithm, it is important to understand the value of the serial fraction so that realistic expectations can be set for performance It may not make sense to implement a complex, arbitrarily scalable parallel algorithm if 10% or more of the algorithm is serial and 10% is fairly common Of course, Amdahl's law is based on assumptions that may or may not be true in practice In real life, a number of factors may make the actual running time longer than this formula implies For example, creating additional parallel tasks may increase overhead and the chances of contention for shared resources On the other hand, if the original serial computation is limited by resources other than the availability of CPU cycles, the actual performance could be much better than Amdahl's law would predict For example, a large parallel machine may allow bigger problems to be held in memory, thus reducing virtual memory paging, or multiple processors each with its own cache may allow much more of the problem to remain in the cache Amdahl's law also rests on the assumption that for any given input, the parallel and serial implementations perform exactly the same number of computational steps If the serial algorithm being used in the formula is not the best possible algorithm for the problem, then a clever parallel algorithm that structures the computation differently can reduce the total number of computational steps It has also been observed [Gus88] that the exercise underlying Amdahl's law, namely running exactly the same problem with varying numbers of processors, is artificial in some circumstances If, say, the parallel application were a weather simulation, then when new processors were added, one would most likely increase the problem size by adding more details to the model while keeping the total execution time constant If this is the case, then Amdahl's law, or fixed size speedup, gives a pessimistic view of the benefits of additional processors To see this, we can reformulate the equation to give the speedup in terms of performance on a P processor system Earlier in Eq 22, we obtained the execution time for T processors, Ttotal(P), from the execution time of the serial terms and the execution time of the parallelizable part when executed on one processor Here, we do the opposite and obtain Ttotal(1) from the serial and parallel terms when executed on P processors Equation 211

Code-27 Creation In JavaUsing Barcode drawer for Java Control to generate, create Uniform Symbology Specification Codabar image in Java applications.

Code 128B Decoder In .NETUsing Barcode decoder for VS .NET Control to read, scan read, scan image in .NET applications.

Scanning Bar Code In Visual Studio .NETUsing Barcode decoder for .NET framework Control to read, scan read, scan image in .NET framework applications.

Paint Barcode In .NETUsing Barcode generator for ASP.NET Control to generate, create bar code image in ASP.NET applications.

Decoding Code 39 In VS .NETUsing Barcode decoder for .NET Control to read, scan read, scan image in .NET framework applications.