A quantitative performance evaluation of SCI memory hierarchies:
Abstract: "The Scalable Coherent Interface (SCI) is an IEEE standard that defines a hardware platform for scalable shared-memory multiprocessors. SCI consists of three parts. The first is a set of physical interfaces that defines board sizes, wiring and network clock rates. The second is a comm...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Abschlussarbeit Buch |
Sprache: | English |
Veröffentlicht: |
Edinburgh
University of Edinburgh, Dept. of Computer Science
[1994]
|
Schlagworte: | |
Zusammenfassung: | Abstract: "The Scalable Coherent Interface (SCI) is an IEEE standard that defines a hardware platform for scalable shared-memory multiprocessors. SCI consists of three parts. The first is a set of physical interfaces that defines board sizes, wiring and network clock rates. The second is a communication protocol based on unidirectional point to point links. The third defines a cache coherence protocol based on a full directory that is distributed amongst the cache and memory modules. The cache controllers keep track of the copies of a given datum by maintaining them in a doubly linked list. SCI can scale up to 65520 nodes. This dissertation contains a quantitative performance evaluation of an SCI-connected multiprocessor that assesses both the communication and cache coherence subsystems. The simulator is driven by reference streams generated as a by-product of the execution of 'real' programs. The workload consists of three programs from the SPLASH suite and three parallel loops The simplest topology supported by SCI is the ring. It was found that, for the hardware and software simulated, the largest efficient ring size is between eight and sixteen nodes and that raw network bandwidth seen by processing elements is limited at about 80Mbytes/s. This is because the network saturates when link traffic reaches 600- 7000Mbytes/s. These levels of link traffic only occur for two poorly designed programs. The other four programs generate low traffic and their execution speed is not limited by interconnect nor cache coherence protocol. An analytical model of the multiprocessor is used to assess the cost of some frequently occurring cache coherence protocol operations. In order to build large systems, networks more sophisticated than rings must be used. The performance of SCI meshes and cubes is evaluated for systems of up to 64 nodes. As with rings, processor throughput is also limited by link traffic for the same two poorly designed programs Cubes are 10-15% faster than meshes for programs that generate high levels of network traffic. Otherwise, the differences are negligble. No significant relationship between cache size and network dimensionality was found. |
Beschreibung: | viii, 148 p. ill. 21 cm |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV035044980 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 080909s1994 a||| m||| 00||| eng d | ||
035 | |a (OCoLC)36679662 | ||
035 | |a (DE-599)BVBBV035044980 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-91G | ||
088 | |a CST-112-94 | ||
100 | 1 | |a Hexsel, Roberto A. |e Verfasser |4 aut | |
245 | 1 | 0 | |a A quantitative performance evaluation of SCI memory hierarchies |c Roberto A. Hexsel |
264 | 1 | |a Edinburgh |b University of Edinburgh, Dept. of Computer Science |c [1994] | |
300 | |a viii, 148 p. |b ill. |c 21 cm | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
502 | |a Thesis (Ph. D.)--University of Edinburgh, 1994 | ||
520 | 3 | |a Abstract: "The Scalable Coherent Interface (SCI) is an IEEE standard that defines a hardware platform for scalable shared-memory multiprocessors. SCI consists of three parts. The first is a set of physical interfaces that defines board sizes, wiring and network clock rates. The second is a communication protocol based on unidirectional point to point links. The third defines a cache coherence protocol based on a full directory that is distributed amongst the cache and memory modules. The cache controllers keep track of the copies of a given datum by maintaining them in a doubly linked list. SCI can scale up to 65520 nodes. This dissertation contains a quantitative performance evaluation of an SCI-connected multiprocessor that assesses both the communication and cache coherence subsystems. The simulator is driven by reference streams generated as a by-product of the execution of 'real' programs. The workload consists of three programs from the SPLASH suite and three parallel loops | |
520 | 3 | |a The simplest topology supported by SCI is the ring. It was found that, for the hardware and software simulated, the largest efficient ring size is between eight and sixteen nodes and that raw network bandwidth seen by processing elements is limited at about 80Mbytes/s. This is because the network saturates when link traffic reaches 600- 7000Mbytes/s. These levels of link traffic only occur for two poorly designed programs. The other four programs generate low traffic and their execution speed is not limited by interconnect nor cache coherence protocol. An analytical model of the multiprocessor is used to assess the cost of some frequently occurring cache coherence protocol operations. In order to build large systems, networks more sophisticated than rings must be used. The performance of SCI meshes and cubes is evaluated for systems of up to 64 nodes. As with rings, processor throughput is also limited by link traffic for the same two poorly designed programs | |
520 | 3 | |a Cubes are 10-15% faster than meshes for programs that generate high levels of network traffic. Otherwise, the differences are negligble. No significant relationship between cache size and network dimensionality was found. | |
650 | 4 | |a Computer stage devices |x Evaluation | |
650 | 4 | |a Memory hierarchy (Computer science) | |
650 | 4 | |a Multiprocessors |x Evaluation | |
655 | 7 | |0 (DE-588)4113937-9 |a Hochschulschrift |2 gnd-content | |
999 | |a oai:aleph.bib-bvb.de:BVB01-016713763 |
Datensatz im Suchindex
_version_ | 1804137981442785280 |
---|---|
adam_txt | |
any_adam_object | |
any_adam_object_boolean | |
author | Hexsel, Roberto A. |
author_facet | Hexsel, Roberto A. |
author_role | aut |
author_sort | Hexsel, Roberto A. |
author_variant | r a h ra rah |
building | Verbundindex |
bvnumber | BV035044980 |
ctrlnum | (OCoLC)36679662 (DE-599)BVBBV035044980 |
format | Thesis Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>03281nam a2200349 c 4500</leader><controlfield tag="001">BV035044980</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">080909s1994 a||| m||| 00||| eng d</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)36679662</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV035044980</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield></datafield><datafield tag="088" ind1=" " ind2=" "><subfield code="a">CST-112-94</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Hexsel, Roberto A.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">A quantitative performance evaluation of SCI memory hierarchies</subfield><subfield code="c">Roberto A. Hexsel</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Edinburgh</subfield><subfield code="b">University of Edinburgh, Dept. of Computer Science</subfield><subfield code="c">[1994]</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">viii, 148 p.</subfield><subfield code="b">ill.</subfield><subfield code="c">21 cm</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="502" ind1=" " ind2=" "><subfield code="a">Thesis (Ph. D.)--University of Edinburgh, 1994</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Abstract: "The Scalable Coherent Interface (SCI) is an IEEE standard that defines a hardware platform for scalable shared-memory multiprocessors. SCI consists of three parts. The first is a set of physical interfaces that defines board sizes, wiring and network clock rates. The second is a communication protocol based on unidirectional point to point links. The third defines a cache coherence protocol based on a full directory that is distributed amongst the cache and memory modules. The cache controllers keep track of the copies of a given datum by maintaining them in a doubly linked list. SCI can scale up to 65520 nodes. This dissertation contains a quantitative performance evaluation of an SCI-connected multiprocessor that assesses both the communication and cache coherence subsystems. The simulator is driven by reference streams generated as a by-product of the execution of 'real' programs. The workload consists of three programs from the SPLASH suite and three parallel loops</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The simplest topology supported by SCI is the ring. It was found that, for the hardware and software simulated, the largest efficient ring size is between eight and sixteen nodes and that raw network bandwidth seen by processing elements is limited at about 80Mbytes/s. This is because the network saturates when link traffic reaches 600- 7000Mbytes/s. These levels of link traffic only occur for two poorly designed programs. The other four programs generate low traffic and their execution speed is not limited by interconnect nor cache coherence protocol. An analytical model of the multiprocessor is used to assess the cost of some frequently occurring cache coherence protocol operations. In order to build large systems, networks more sophisticated than rings must be used. The performance of SCI meshes and cubes is evaluated for systems of up to 64 nodes. As with rings, processor throughput is also limited by link traffic for the same two poorly designed programs</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Cubes are 10-15% faster than meshes for programs that generate high levels of network traffic. Otherwise, the differences are negligble. No significant relationship between cache size and network dimensionality was found.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer stage devices</subfield><subfield code="x">Evaluation</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Memory hierarchy (Computer science)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multiprocessors</subfield><subfield code="x">Evaluation</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-016713763</subfield></datafield></record></collection> |
genre | (DE-588)4113937-9 Hochschulschrift gnd-content |
genre_facet | Hochschulschrift |
id | DE-604.BV035044980 |
illustrated | Illustrated |
index_date | 2024-07-02T21:54:26Z |
indexdate | 2024-07-09T21:20:56Z |
institution | BVB |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-016713763 |
oclc_num | 36679662 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM |
owner_facet | DE-91G DE-BY-TUM |
physical | viii, 148 p. ill. 21 cm |
publishDate | 1994 |
publishDateSearch | 1994 |
publishDateSort | 1994 |
publisher | University of Edinburgh, Dept. of Computer Science |
record_format | marc |
spelling | Hexsel, Roberto A. Verfasser aut A quantitative performance evaluation of SCI memory hierarchies Roberto A. Hexsel Edinburgh University of Edinburgh, Dept. of Computer Science [1994] viii, 148 p. ill. 21 cm txt rdacontent n rdamedia nc rdacarrier Thesis (Ph. D.)--University of Edinburgh, 1994 Abstract: "The Scalable Coherent Interface (SCI) is an IEEE standard that defines a hardware platform for scalable shared-memory multiprocessors. SCI consists of three parts. The first is a set of physical interfaces that defines board sizes, wiring and network clock rates. The second is a communication protocol based on unidirectional point to point links. The third defines a cache coherence protocol based on a full directory that is distributed amongst the cache and memory modules. The cache controllers keep track of the copies of a given datum by maintaining them in a doubly linked list. SCI can scale up to 65520 nodes. This dissertation contains a quantitative performance evaluation of an SCI-connected multiprocessor that assesses both the communication and cache coherence subsystems. The simulator is driven by reference streams generated as a by-product of the execution of 'real' programs. The workload consists of three programs from the SPLASH suite and three parallel loops The simplest topology supported by SCI is the ring. It was found that, for the hardware and software simulated, the largest efficient ring size is between eight and sixteen nodes and that raw network bandwidth seen by processing elements is limited at about 80Mbytes/s. This is because the network saturates when link traffic reaches 600- 7000Mbytes/s. These levels of link traffic only occur for two poorly designed programs. The other four programs generate low traffic and their execution speed is not limited by interconnect nor cache coherence protocol. An analytical model of the multiprocessor is used to assess the cost of some frequently occurring cache coherence protocol operations. In order to build large systems, networks more sophisticated than rings must be used. The performance of SCI meshes and cubes is evaluated for systems of up to 64 nodes. As with rings, processor throughput is also limited by link traffic for the same two poorly designed programs Cubes are 10-15% faster than meshes for programs that generate high levels of network traffic. Otherwise, the differences are negligble. No significant relationship between cache size and network dimensionality was found. Computer stage devices Evaluation Memory hierarchy (Computer science) Multiprocessors Evaluation (DE-588)4113937-9 Hochschulschrift gnd-content |
spellingShingle | Hexsel, Roberto A. A quantitative performance evaluation of SCI memory hierarchies Computer stage devices Evaluation Memory hierarchy (Computer science) Multiprocessors Evaluation |
subject_GND | (DE-588)4113937-9 |
title | A quantitative performance evaluation of SCI memory hierarchies |
title_auth | A quantitative performance evaluation of SCI memory hierarchies |
title_exact_search | A quantitative performance evaluation of SCI memory hierarchies |
title_exact_search_txtP | A quantitative performance evaluation of SCI memory hierarchies |
title_full | A quantitative performance evaluation of SCI memory hierarchies Roberto A. Hexsel |
title_fullStr | A quantitative performance evaluation of SCI memory hierarchies Roberto A. Hexsel |
title_full_unstemmed | A quantitative performance evaluation of SCI memory hierarchies Roberto A. Hexsel |
title_short | A quantitative performance evaluation of SCI memory hierarchies |
title_sort | a quantitative performance evaluation of sci memory hierarchies |
topic | Computer stage devices Evaluation Memory hierarchy (Computer science) Multiprocessors Evaluation |
topic_facet | Computer stage devices Evaluation Memory hierarchy (Computer science) Multiprocessors Evaluation Hochschulschrift |
work_keys_str_mv | AT hexselrobertoa aquantitativeperformanceevaluationofscimemoryhierarchies |