Compiling data-parallel programs for efficient execution on shared-memory multiprocessors:

Abstract: Data parallelism is well-suited for algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming. However, the fine-grained parallelism characteristic of data-parallel programs makes the efficient implementation of such languages on MIMD m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Chatterjee, Siddhartha (VerfasserIn)
Format: Buch
Sprache:English
Veröffentlicht: Pittsburgh, Pa. School of Computer Science 1991
Schlagworte:
Zusammenfassung:Abstract: Data parallelism is well-suited for algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming. However, the fine-grained parallelism characteristic of data-parallel programs makes the efficient implementation of such languages on MIMD machines a challenging task due to the high overheads these machines incur at small grain sizes. We claim that compile-time analysis can be used to reduce these overheads, thereby allowing data-parallel code to run efficiently on MIMD machines. This dissertation reports on the design, implementation, and evaluation of an optimizing compiler for an applicative nested data-parallel language called VCODE
The target machine is the Encore Multimax, a coherent-cache shared-memory multiprocessor. The source language allows nested aggregate data types, and provides a variety of aggregate operations including elementwise forms, scans, reductions, and permutations. Such features greatly expand the range of application that can be cast into a data- parallel model. We present a small set of powerful compile-time techniques that reduce the overheads on MIMD machines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference
The two key ideas behind these optimizations are the symbolic analysis of loop structures, and the hierarchical clustering of the program graph, first by loop structure, and then by loop traversal patterns. This localizes synchronization and work distribution actions to well-defined points in the output code. Loop traversal patterns are then used to identify parallel loops and to eliminate unnecessary intermediate storage. The most significant aspect of the analysis techniques is that they are symbolic in nature and work in the presence of control constructs such as conditionals and recursion. A compiler has been implemented based on these ideas and has been used to compile a large number of benchmarks
Beschreibung:Zugl.: Pittsburgh, Pa., Carnegie Mellon Univ., Diss., 1991
Beschreibung:XVIII, 175 S. graph. Darst.

Es ist kein Print-Exemplar vorhanden.

Fernleihe Bestellen Achtung: Nicht im THWS-Bestand!