Shared multilevel caches for scalable multiprocessors:
Abstract: "The most difficult problem in realizing a large-scale shared-memory multiprocessor is the design of a cost-effective memory system that has high bandwidth and low latency. The performance of a memory system design based on multilevel shared caches is investigated in this dissertation...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Stanford, Calif.
1991
|
Schriftenreihe: | Stanford University / Computer Science Department: Report STAN CS
1393 |
Schlagworte: | |
Zusammenfassung: | Abstract: "The most difficult problem in realizing a large-scale shared-memory multiprocessor is the design of a cost-effective memory system that has high bandwidth and low latency. The performance of a memory system design based on multilevel shared caches is investigated in this dissertation. By localizing a large fraction of the traffic required to maintain coherence between per-processor caches, a shared cache hierarchy reduces the bandwidth requirements of the global interconnect, and reduces the latency for access to shared data. Experimental measurements from simulations of three large applications confirm that shared caches offer better scalability and improved performance Specifically, a system using shared caches yields up to a 65% reduction in global traffic compared to a system using per-processor caches. The sensitivity of the performance to various cache design parameters is explored. Programs can also be restructured to exploit the advantages offered by shared cache hierarchies. A program restructuring technique based on the locality in three-dimensional space of a physical system simulation, a direct particle simulation, is described. Simulation and execution time measurements are presented to show the performance impact of the restructuring of the particle simulator The restructuring of the program reduced the global traffic by an order of magnitude, and the shared cache scheme further reduced the global traffic by a factor of five. |
Beschreibung: | Zugl.: Stanford, Calif., Univ., Diss. |
Beschreibung: | 76 S. |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV008992845 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t | ||
008 | 940206s1991 m||| 00||| eng d | ||
035 | |a (OCoLC)25497107 | ||
035 | |a (DE-599)BVBBV008992845 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-29T | ||
100 | 1 | |a Goosen, Hendrik A. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Shared multilevel caches for scalable multiprocessors |c by Hendrik A. Goosen |
264 | 1 | |a Stanford, Calif. |c 1991 | |
300 | |a 76 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Stanford University / Computer Science Department: Report STAN CS |v 1393 | |
500 | |a Zugl.: Stanford, Calif., Univ., Diss. | ||
520 | 3 | |a Abstract: "The most difficult problem in realizing a large-scale shared-memory multiprocessor is the design of a cost-effective memory system that has high bandwidth and low latency. The performance of a memory system design based on multilevel shared caches is investigated in this dissertation. By localizing a large fraction of the traffic required to maintain coherence between per-processor caches, a shared cache hierarchy reduces the bandwidth requirements of the global interconnect, and reduces the latency for access to shared data. Experimental measurements from simulations of three large applications confirm that shared caches offer better scalability and improved performance | |
520 | 3 | |a Specifically, a system using shared caches yields up to a 65% reduction in global traffic compared to a system using per-processor caches. The sensitivity of the performance to various cache design parameters is explored. Programs can also be restructured to exploit the advantages offered by shared cache hierarchies. A program restructuring technique based on the locality in three-dimensional space of a physical system simulation, a direct particle simulation, is described. Simulation and execution time measurements are presented to show the performance impact of the restructuring of the particle simulator | |
520 | 3 | |a The restructuring of the program reduced the global traffic by an order of magnitude, and the shared cache scheme further reduced the global traffic by a factor of five. | |
650 | 4 | |a Cache memory | |
650 | 4 | |a Multiprocessors | |
650 | 0 | 7 | |a Mehrprozessorsystem |0 (DE-588)4038397-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Pufferspeicher |0 (DE-588)4176324-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Skalierung |0 (DE-588)4055202-0 |2 gnd |9 rswk-swf |
655 | 7 | |0 (DE-588)4113937-9 |a Hochschulschrift |2 gnd-content | |
689 | 0 | 0 | |a Pufferspeicher |0 (DE-588)4176324-5 |D s |
689 | 0 | 1 | |a Skalierung |0 (DE-588)4055202-0 |D s |
689 | 0 | 2 | |a Mehrprozessorsystem |0 (DE-588)4038397-0 |D s |
689 | 0 | |5 DE-604 | |
810 | 2 | |a Computer Science Department: Report STAN CS |t Stanford University |v 1393 |w (DE-604)BV008928280 |9 1393 | |
999 | |a oai:aleph.bib-bvb.de:BVB01-005941763 |
Datensatz im Suchindex
_version_ | 1804123335366279168 |
---|---|
any_adam_object | |
author | Goosen, Hendrik A. |
author_facet | Goosen, Hendrik A. |
author_role | aut |
author_sort | Goosen, Hendrik A. |
author_variant | h a g ha hag |
building | Verbundindex |
bvnumber | BV008992845 |
ctrlnum | (OCoLC)25497107 (DE-599)BVBBV008992845 |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02947nam a2200433 cb4500</leader><controlfield tag="001">BV008992845</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">940206s1991 m||| 00||| eng d</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)25497107</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV008992845</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Goosen, Hendrik A.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Shared multilevel caches for scalable multiprocessors</subfield><subfield code="c">by Hendrik A. Goosen</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Stanford, Calif.</subfield><subfield code="c">1991</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">76 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Stanford University / Computer Science Department: Report STAN CS</subfield><subfield code="v">1393</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Zugl.: Stanford, Calif., Univ., Diss.</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Abstract: "The most difficult problem in realizing a large-scale shared-memory multiprocessor is the design of a cost-effective memory system that has high bandwidth and low latency. The performance of a memory system design based on multilevel shared caches is investigated in this dissertation. By localizing a large fraction of the traffic required to maintain coherence between per-processor caches, a shared cache hierarchy reduces the bandwidth requirements of the global interconnect, and reduces the latency for access to shared data. Experimental measurements from simulations of three large applications confirm that shared caches offer better scalability and improved performance</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Specifically, a system using shared caches yields up to a 65% reduction in global traffic compared to a system using per-processor caches. The sensitivity of the performance to various cache design parameters is explored. Programs can also be restructured to exploit the advantages offered by shared cache hierarchies. A program restructuring technique based on the locality in three-dimensional space of a physical system simulation, a direct particle simulation, is described. Simulation and execution time measurements are presented to show the performance impact of the restructuring of the particle simulator</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The restructuring of the program reduced the global traffic by an order of magnitude, and the shared cache scheme further reduced the global traffic by a factor of five.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Cache memory</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multiprocessors</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Mehrprozessorsystem</subfield><subfield code="0">(DE-588)4038397-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Pufferspeicher</subfield><subfield code="0">(DE-588)4176324-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Skalierung</subfield><subfield code="0">(DE-588)4055202-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="655" ind1=" " ind2="7"><subfield code="0">(DE-588)4113937-9</subfield><subfield code="a">Hochschulschrift</subfield><subfield code="2">gnd-content</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Pufferspeicher</subfield><subfield code="0">(DE-588)4176324-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Skalierung</subfield><subfield code="0">(DE-588)4055202-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Mehrprozessorsystem</subfield><subfield code="0">(DE-588)4038397-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="810" ind1="2" ind2=" "><subfield code="a">Computer Science Department: Report STAN CS</subfield><subfield code="t">Stanford University</subfield><subfield code="v">1393</subfield><subfield code="w">(DE-604)BV008928280</subfield><subfield code="9">1393</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-005941763</subfield></datafield></record></collection> |
genre | (DE-588)4113937-9 Hochschulschrift gnd-content |
genre_facet | Hochschulschrift |
id | DE-604.BV008992845 |
illustrated | Not Illustrated |
indexdate | 2024-07-09T17:28:08Z |
institution | BVB |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-005941763 |
oclc_num | 25497107 |
open_access_boolean | |
owner | DE-29T |
owner_facet | DE-29T |
physical | 76 S. |
publishDate | 1991 |
publishDateSearch | 1991 |
publishDateSort | 1991 |
record_format | marc |
series2 | Stanford University / Computer Science Department: Report STAN CS |
spelling | Goosen, Hendrik A. Verfasser aut Shared multilevel caches for scalable multiprocessors by Hendrik A. Goosen Stanford, Calif. 1991 76 S. txt rdacontent n rdamedia nc rdacarrier Stanford University / Computer Science Department: Report STAN CS 1393 Zugl.: Stanford, Calif., Univ., Diss. Abstract: "The most difficult problem in realizing a large-scale shared-memory multiprocessor is the design of a cost-effective memory system that has high bandwidth and low latency. The performance of a memory system design based on multilevel shared caches is investigated in this dissertation. By localizing a large fraction of the traffic required to maintain coherence between per-processor caches, a shared cache hierarchy reduces the bandwidth requirements of the global interconnect, and reduces the latency for access to shared data. Experimental measurements from simulations of three large applications confirm that shared caches offer better scalability and improved performance Specifically, a system using shared caches yields up to a 65% reduction in global traffic compared to a system using per-processor caches. The sensitivity of the performance to various cache design parameters is explored. Programs can also be restructured to exploit the advantages offered by shared cache hierarchies. A program restructuring technique based on the locality in three-dimensional space of a physical system simulation, a direct particle simulation, is described. Simulation and execution time measurements are presented to show the performance impact of the restructuring of the particle simulator The restructuring of the program reduced the global traffic by an order of magnitude, and the shared cache scheme further reduced the global traffic by a factor of five. Cache memory Multiprocessors Mehrprozessorsystem (DE-588)4038397-0 gnd rswk-swf Pufferspeicher (DE-588)4176324-5 gnd rswk-swf Skalierung (DE-588)4055202-0 gnd rswk-swf (DE-588)4113937-9 Hochschulschrift gnd-content Pufferspeicher (DE-588)4176324-5 s Skalierung (DE-588)4055202-0 s Mehrprozessorsystem (DE-588)4038397-0 s DE-604 Computer Science Department: Report STAN CS Stanford University 1393 (DE-604)BV008928280 1393 |
spellingShingle | Goosen, Hendrik A. Shared multilevel caches for scalable multiprocessors Cache memory Multiprocessors Mehrprozessorsystem (DE-588)4038397-0 gnd Pufferspeicher (DE-588)4176324-5 gnd Skalierung (DE-588)4055202-0 gnd |
subject_GND | (DE-588)4038397-0 (DE-588)4176324-5 (DE-588)4055202-0 (DE-588)4113937-9 |
title | Shared multilevel caches for scalable multiprocessors |
title_auth | Shared multilevel caches for scalable multiprocessors |
title_exact_search | Shared multilevel caches for scalable multiprocessors |
title_full | Shared multilevel caches for scalable multiprocessors by Hendrik A. Goosen |
title_fullStr | Shared multilevel caches for scalable multiprocessors by Hendrik A. Goosen |
title_full_unstemmed | Shared multilevel caches for scalable multiprocessors by Hendrik A. Goosen |
title_short | Shared multilevel caches for scalable multiprocessors |
title_sort | shared multilevel caches for scalable multiprocessors |
topic | Cache memory Multiprocessors Mehrprozessorsystem (DE-588)4038397-0 gnd Pufferspeicher (DE-588)4176324-5 gnd Skalierung (DE-588)4055202-0 gnd |
topic_facet | Cache memory Multiprocessors Mehrprozessorsystem Pufferspeicher Skalierung Hochschulschrift |
volume_link | (DE-604)BV008928280 |
work_keys_str_mv | AT goosenhendrika sharedmultilevelcachesforscalablemultiprocessors |