Explicit parallel block Cholesky algorithms on the CRAY APP:
Abstract: "In this paper we consider the CRAY APP, the Attached Parallel Processor of the Cray S-MP, which consists of seven buses with each bus supporting up to 12 processing elements. Processing elements on different buses can communicate simultaneously with the shared main memory, but proces...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Amsterdam
1994
|
Schriftenreihe: | Centrum voor Wiskunde en Informatica <Amsterdam> / Afdeling Numerieke Wiskunde: Report NM
1994,25 |
Schlagworte: | |
Zusammenfassung: | Abstract: "In this paper we consider the CRAY APP, the Attached Parallel Processor of the Cray S-MP, which consists of seven buses with each bus supporting up to 12 processing elements. Processing elements on different buses can communicate simultaneously with the shared main memory, but processing elements sharing the same bus can not, since only one processing element per bus can access memory at a given time. Applications with a high level of data reuse, or, with a high compute intensity, and applications being highly parallel are very suitable to run on the APP. An example of such an algorithm is matrix-matrix multiplication. We illustrate how the data traffic's restriction influences the performance and we discuss the scalability of the CRAY APP. Furthermore, two different algorithms for Cholesky factorization are discussed: a block left-looking algorithm and a block right-looking algorithm. The maximum achievable speed on the CRAY APP is mainly determined by the performance of the matrix-matrix multiplication. Parallelism is applied explicitly over the blocks, which makes it possible to concatenate different block operations in cache. The results obtained on CWI's APP (a machine having twenty- eight processing elements) indicate how block algorithms can be parallelized on machines with hundreds or thousands of processors." |
Beschreibung: | 22 S. |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV010186888 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 950517s1994 |||| 00||| engod | ||
035 | |a (OCoLC)32904763 | ||
035 | |a (DE-599)BVBBV010186888 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-91G | ||
100 | 1 | |a Nool, Margreet |e Verfasser |4 aut | |
245 | 1 | 0 | |a Explicit parallel block Cholesky algorithms on the CRAY APP |c M. Nool |
264 | 1 | |a Amsterdam |c 1994 | |
300 | |a 22 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Centrum voor Wiskunde en Informatica <Amsterdam> / Afdeling Numerieke Wiskunde: Report NM |v 1994,25 | |
520 | 3 | |a Abstract: "In this paper we consider the CRAY APP, the Attached Parallel Processor of the Cray S-MP, which consists of seven buses with each bus supporting up to 12 processing elements. Processing elements on different buses can communicate simultaneously with the shared main memory, but processing elements sharing the same bus can not, since only one processing element per bus can access memory at a given time. Applications with a high level of data reuse, or, with a high compute intensity, and applications being highly parallel are very suitable to run on the APP. An example of such an algorithm is matrix-matrix multiplication. We illustrate how the data traffic's restriction influences the performance and we discuss the scalability of the CRAY APP. Furthermore, two different algorithms for Cholesky factorization are discussed: a block left-looking algorithm and a block right-looking algorithm. The maximum achievable speed on the CRAY APP is mainly determined by the performance of the matrix-matrix multiplication. Parallelism is applied explicitly over the blocks, which makes it possible to concatenate different block operations in cache. The results obtained on CWI's APP (a machine having twenty- eight processing elements) indicate how block algorithms can be parallelized on machines with hundreds or thousands of processors." | |
650 | 4 | |a Parallel processing (Electronic computers) | |
810 | 2 | |a Afdeling Numerieke Wiskunde: Report NM |t Centrum voor Wiskunde en Informatica <Amsterdam> |v 1994,25 |w (DE-604)BV010177152 |9 1994,25 | |
999 | |a oai:aleph.bib-bvb.de:BVB01-006767596 |
Datensatz im Suchindex
_version_ | 1804124587784404992 |
---|---|
any_adam_object | |
author | Nool, Margreet |
author_facet | Nool, Margreet |
author_role | aut |
author_sort | Nool, Margreet |
author_variant | m n mn |
building | Verbundindex |
bvnumber | BV010186888 |
ctrlnum | (OCoLC)32904763 (DE-599)BVBBV010186888 |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02330nam a2200289 cb4500</leader><controlfield tag="001">BV010186888</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">950517s1994 |||| 00||| engod</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)32904763</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV010186888</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Nool, Margreet</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Explicit parallel block Cholesky algorithms on the CRAY APP</subfield><subfield code="c">M. Nool</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam</subfield><subfield code="c">1994</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">22 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Centrum voor Wiskunde en Informatica <Amsterdam> / Afdeling Numerieke Wiskunde: Report NM</subfield><subfield code="v">1994,25</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Abstract: "In this paper we consider the CRAY APP, the Attached Parallel Processor of the Cray S-MP, which consists of seven buses with each bus supporting up to 12 processing elements. Processing elements on different buses can communicate simultaneously with the shared main memory, but processing elements sharing the same bus can not, since only one processing element per bus can access memory at a given time. Applications with a high level of data reuse, or, with a high compute intensity, and applications being highly parallel are very suitable to run on the APP. An example of such an algorithm is matrix-matrix multiplication. We illustrate how the data traffic's restriction influences the performance and we discuss the scalability of the CRAY APP. Furthermore, two different algorithms for Cholesky factorization are discussed: a block left-looking algorithm and a block right-looking algorithm. The maximum achievable speed on the CRAY APP is mainly determined by the performance of the matrix-matrix multiplication. Parallelism is applied explicitly over the blocks, which makes it possible to concatenate different block operations in cache. The results obtained on CWI's APP (a machine having twenty- eight processing elements) indicate how block algorithms can be parallelized on machines with hundreds or thousands of processors."</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parallel processing (Electronic computers)</subfield></datafield><datafield tag="810" ind1="2" ind2=" "><subfield code="a">Afdeling Numerieke Wiskunde: Report NM</subfield><subfield code="t">Centrum voor Wiskunde en Informatica <Amsterdam></subfield><subfield code="v">1994,25</subfield><subfield code="w">(DE-604)BV010177152</subfield><subfield code="9">1994,25</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-006767596</subfield></datafield></record></collection> |
id | DE-604.BV010186888 |
illustrated | Not Illustrated |
indexdate | 2024-07-09T17:48:02Z |
institution | BVB |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-006767596 |
oclc_num | 32904763 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM |
owner_facet | DE-91G DE-BY-TUM |
physical | 22 S. |
publishDate | 1994 |
publishDateSearch | 1994 |
publishDateSort | 1994 |
record_format | marc |
series2 | Centrum voor Wiskunde en Informatica <Amsterdam> / Afdeling Numerieke Wiskunde: Report NM |
spelling | Nool, Margreet Verfasser aut Explicit parallel block Cholesky algorithms on the CRAY APP M. Nool Amsterdam 1994 22 S. txt rdacontent n rdamedia nc rdacarrier Centrum voor Wiskunde en Informatica <Amsterdam> / Afdeling Numerieke Wiskunde: Report NM 1994,25 Abstract: "In this paper we consider the CRAY APP, the Attached Parallel Processor of the Cray S-MP, which consists of seven buses with each bus supporting up to 12 processing elements. Processing elements on different buses can communicate simultaneously with the shared main memory, but processing elements sharing the same bus can not, since only one processing element per bus can access memory at a given time. Applications with a high level of data reuse, or, with a high compute intensity, and applications being highly parallel are very suitable to run on the APP. An example of such an algorithm is matrix-matrix multiplication. We illustrate how the data traffic's restriction influences the performance and we discuss the scalability of the CRAY APP. Furthermore, two different algorithms for Cholesky factorization are discussed: a block left-looking algorithm and a block right-looking algorithm. The maximum achievable speed on the CRAY APP is mainly determined by the performance of the matrix-matrix multiplication. Parallelism is applied explicitly over the blocks, which makes it possible to concatenate different block operations in cache. The results obtained on CWI's APP (a machine having twenty- eight processing elements) indicate how block algorithms can be parallelized on machines with hundreds or thousands of processors." Parallel processing (Electronic computers) Afdeling Numerieke Wiskunde: Report NM Centrum voor Wiskunde en Informatica <Amsterdam> 1994,25 (DE-604)BV010177152 1994,25 |
spellingShingle | Nool, Margreet Explicit parallel block Cholesky algorithms on the CRAY APP Parallel processing (Electronic computers) |
title | Explicit parallel block Cholesky algorithms on the CRAY APP |
title_auth | Explicit parallel block Cholesky algorithms on the CRAY APP |
title_exact_search | Explicit parallel block Cholesky algorithms on the CRAY APP |
title_full | Explicit parallel block Cholesky algorithms on the CRAY APP M. Nool |
title_fullStr | Explicit parallel block Cholesky algorithms on the CRAY APP M. Nool |
title_full_unstemmed | Explicit parallel block Cholesky algorithms on the CRAY APP M. Nool |
title_short | Explicit parallel block Cholesky algorithms on the CRAY APP |
title_sort | explicit parallel block cholesky algorithms on the cray app |
topic | Parallel processing (Electronic computers) |
topic_facet | Parallel processing (Electronic computers) |
volume_link | (DE-604)BV010177152 |
work_keys_str_mv | AT noolmargreet explicitparallelblockcholeskyalgorithmsonthecrayapp |