Introduction to high performance computing for scientists and engineers:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Boca Raton [u.a.]
CRC Press
2011
|
Schriftenreihe: | Chapman & Hall/CRC computational science series
7 |
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XXV, 330 S. Ill., graph. Darst. |
ISBN: | 9781439811924 143981192X |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV036621218 | ||
003 | DE-604 | ||
005 | 20170519 | ||
007 | t | ||
008 | 100818s2011 ad|| |||| 00||| eng d | ||
015 | |a GBB046716 |2 dnb | ||
020 | |a 9781439811924 |c hbk |9 978-1-4398-1192-4 | ||
020 | |a 143981192X |9 1-4398-1192-X | ||
035 | |a (OCoLC)699709647 | ||
035 | |a (DE-599)BVBBV036621218 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-91G |a DE-20 |a DE-29T |a DE-83 |a DE-92 |a DE-703 |a DE-355 |a DE-526 |a DE-706 |a DE-863 |a DE-634 |a DE-19 | ||
084 | |a ST 150 |0 (DE-625)143594: |2 rvk | ||
084 | |a ST 151 |0 (DE-625)143595: |2 rvk | ||
084 | |a ST 620 |0 (DE-625)143684: |2 rvk | ||
084 | |a DAT 200f |2 stub | ||
084 | |a DAT 516f |2 stub | ||
100 | 1 | |a Hager, Georg |d 1970- |e Verfasser |0 (DE-588)13194522X |4 aut | |
245 | 1 | 0 | |a Introduction to high performance computing for scientists and engineers |c Georg Hager ; Gerhard Wellein |
264 | 1 | |a Boca Raton [u.a.] |b CRC Press |c 2011 | |
300 | |a XXV, 330 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Chapman & Hall/CRC computational science series |v 7 | |
650 | 4 | |a High performance computing | |
650 | 4 | |a Science / Data processing | |
650 | 4 | |a Engineering / Data processing | |
650 | 4 | |a Datenverarbeitung | |
650 | 4 | |a Ingenieurwissenschaften | |
650 | 4 | |a Naturwissenschaft | |
650 | 0 | 7 | |a Naturwissenschaften |0 (DE-588)4041421-8 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Hochleistungsrechnen |0 (DE-588)4532701-4 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Ingenieurwissenschaften |0 (DE-588)4137304-2 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Hochleistungsrechnen |0 (DE-588)4532701-4 |D s |
689 | 0 | 1 | |a Naturwissenschaften |0 (DE-588)4041421-8 |D s |
689 | 0 | 2 | |a Ingenieurwissenschaften |0 (DE-588)4137304-2 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Wellein, Gerhard |d 1970- |e Verfasser |0 (DE-588)120334836 |4 aut | |
830 | 0 | |a Chapman & Hall/CRC computational science series |v 7 |w (DE-604)BV039102824 |9 7 | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020541244&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-020541244 |
Datensatz im Suchindex
DE-BY-863_location | 1340 |
---|---|
DE-BY-FWS_call_number | 1340/ST 150 H144 |
DE-BY-FWS_katkey | 428652 |
DE-BY-FWS_media_number | 083101189576 |
_version_ | 1828435635215532033 |
adam_text | Titel: Introduction to high performance computing for scientists and engineers
Autor: Hager, Georg
Jahr: 2011
Contents
Foreword xiii
Preface xv
About the authors xxi
List of acronyms and abbreviations xxiii
1 Modern processors 1
1.1 Stored-program computer architecture................ 1
1.2 General-purpose cache-based microprocessor architecture ..... 2
1.2.1 Performance metrics and benchmarks............ 3
1.2.2 Transistors galore: Moore s Law............... 7
1.2.3 Pipelining........................... 9
1.2.4 Superscalarity......................... 13
1.2.5 SIMD............................. 14
1.3 Memory hierarchies ......................... 15
1.3.1 Cache............................. 15
1.3.2 Cache mapping........................ 18
1.3.3 Prefetch............................ 20
1.4 Multicore processors......................... 23
1.5 Multithreaded processors....................... 26
1.6 Vector processors........................... 28
1.6.1 Design principles....................... 29
1.6.2 Maximum performance estimates.............. 31
1.6.3 Programming for vector architectures............ 32
2 Basic optimization techniques for serial code 37
2.1 Scalar profiling............................ 37
2.1.1 Function- and line-based runtime profiling.......... 38
2.1.2 Hardware performance counters............... 41
2.1.3 Manual instrumentation ................... 45
2.2 Common sense optimizations .................... 45
2.2.1 Dolesswork!......................... 45
2.2.2 Avoid expensive operations!................. 46
2.2.3 Shrink the working set!.................... 47
2.3 Simple measures, large impact.................... 47
2.3.1 Elimination of common subexpressions........... 47
2.3.2 Avoiding branches...................... 48
2.3.3 Using SIMD instruction sets................. 49
2.4 The role of compilers......................... 51
2.4.1 General optimization options................. 52
2.4.2 Inlining............................ 52
2.4.3 Aliasing............................ 53
2.4.4 Computational accuracy................... 54
2.4.5 Register optimizations.................... 55
2.4.6 Using compiler logs ..................... 55
2.5 C++ optimizations .......................... 56
2.5.1 Temporaries.......................... 56
2.5.2 Dynamic memory management ............... 59
2.5.3 Loop kernels and iterators.................. 60
Data access optimization 63
3.1 Balance analysis and lightspeed estimates.............. 63
3.1.1 Bandwidth-based performance modeling .......... 63
3.1.2 The STREAM benchmarks.................. 67
3.2 Storage order............................. 69
3.3 Case study: The Jacobi algorithm .................. 71
3.4 Case study: Dense matrix transpose ................. 74
3.5 Algorithm classification and access optimizations.......... 79
3.5.1 0(N)/0{N).......................... 79
3.5.2 0(N2)/0{N2) ........................ 79
3.5.3 0(N3)/0{N2) ........................ 84
3.6 Case study: Sparse matrix-vector multiply.............. 86
3.6.1 Sparse matrix storage schemes................ 86
3.6.2 Optimizing JDS sparse MVM................ 89
Parallel computers 95
4.1 Taxonomy of parallel computing paradigms............. 96
4.2 Shared-memory computers...................... 97
4.2.1 Cache coherence....................... 97
4.2.2 UMA............................. 99
4.2.3 ccNUMA........................... 100
4.3 Distributed-memory computers ................... 102
4.4 Hierarchical (hybrid) systems .................... 103
4.5 Networks............................... 104
4.5.1 Basic performance characteristics of networks........ 104
4.5.2 Buses............................. 109
4.5.3 Switched and fat-tree networks................ 110
4.5.4 Mesh networks........................ 112
4.5.5 Hybrids............................ 113
Basics of parallelization 115
5.1 Why parallelize? ........................... 115
5.2 Parallelism .............................. 116
5.2.1 Data parallelism ....................... 116
5.2.2 Functional parallelism.................... 119
5.3 Parallel scalability .......................... 120
5.3.1 Factors that limit parallel execution............. 120
5.3.2 Scalability metrics...................... 122
5.3.3 Simple scalability laws.................... 123
5.3.4 Parallel efficiency....................... 125
5.3.5 Serial performance versus strong scalability......... 126
5.3.6 Refined performance models................. 128
5.3.7 Choosing the right scaling baseline ............. 130
5.3.8 Case study: Can slower processors compute faster?..... 131
5.3.9 Load imbalance........................ 137
Shared-memory parallel programming with OpenMP 143
6.1 Short introduction to OpenMP.................... 143
6.1.1 Parallel execution....................... 144
6.1.2 Data scoping......................... 146
6.1.3 OpenMP worksharing for loops............... 147
6.1.4 Synchronization ....................... 149
6.1.5 Reductions.......................... 150
6.1.6 Loop scheduling....................... 151
6.1.7 Tasking............................ 153
6.1.8 Miscellaneous ........................ 154
6.2 Case study: OpenMP-parallel Jacobi algorithm ........... 156
6.3 Advanced OpenMP: Wavefront parallelization ........... 158
Efficiënt OpenMP programming 165
7.1 Profiling OpenMP programs..................... 165
7.2 Performance pitfalls ......................... 166
7.2.1 Ameliorating the impact of OpenMP worksharing constructs 168
7.2.2 Determining OpenMP overhead for short loops....... 175
7.2.3 Serialization ......................... 177
7.2.4 Falsesharing......................... 179
7.3 Case study: Parallel sparse matrix-vector multiply ......... 181
Locality optimizations on ccNUMA architectures 185
8.1 Locality of access on ccNUMA ................... 185
8.1.1 Page placement by first touch ................ 186
8.1.2 Access locality by other means................ 190
8.2 Case study: ccNUMA optimization of sparse MVM ........ 190
8.3 Placement pitfalls........................... 192
8.3.1 NUMA-unfriendly OpenMP scheduling........... 192
8.3.2 File system cache....................... 194
8.4 ccNUMA issues with C++...................... 197
8.4.1 Arrays of objects....................... 97
8.4.2 Standard Template Library.................. 199
9 Distributed-memory parallel programming with MPI 203
9.1 Message passing ........................... 203
9.2 A short introduction to MPI ..................... 205
9.2.1 A simple example ...................... 205
9.2.2 Messages and point-to-point communication ........ 207
9.2.3 Collective communication.................. 213
9.2.4 Nonblocking point-to-point communication......... 216
9.2.5 Virtual topologies....................... 220
9.3 Example: MPI parallelization of a Jacobi solver........... 224
9.3.1 MPI implementation..................... 224
9.3.2 Performance properties.................... 230
10 Efficiënt MPI programming 235
10.1 MPI performance toois........................ 235
10.2 Communication parameters ..................... 239
10.3 Synchronization, serialization, contention.............. 240
10.3.1 Implicit serialization and synchronization.......... 240
10.3.2 Contention.......................... 243
10.4 Reducing communication overhead ................. 244
10.4.1 Optimal domain decomposition............... 244
10.4.2 Aggregating messages.................... 248
10.4.3 Nonblocking vs. asynchronous communication....... 250
10.4.4 Collective communication.................. 253
10.5 Understanding intranode point-to-point communication....... 253
11 Hybrid parallelization with MPI and OpenMP 263
11.1 Basic MPI/OpenMP programming models ............. 264
11.1.1 Vector mode implementation................. 264
11.1.2 Task mode implementation.................. 265
11.1.3 Case study: Hybrid Jacobi solver............... 267
11.2 MPI taxonomy of thread interoperability .............. 268
11.3 Hybrid decomposition and mapping................. 270
11.4 Potential benefits and drawbacks of hybrid programming...... 273
A Topology and affinity in multicore environments 277
A.l Topology ............................... 279
A.2 Thread and process placement.................... 280
A.2.1 External affinity control ................... 280
A.2.2 Affinity under program control................ 283
A.3 Page placement beyond first touch.................. 284
B Solutions to the problems 287
Bibliography 309
Index 323
|
any_adam_object | 1 |
author | Hager, Georg 1970- Wellein, Gerhard 1970- |
author_GND | (DE-588)13194522X (DE-588)120334836 |
author_facet | Hager, Georg 1970- Wellein, Gerhard 1970- |
author_role | aut aut |
author_sort | Hager, Georg 1970- |
author_variant | g h gh g w gw |
building | Verbundindex |
bvnumber | BV036621218 |
classification_rvk | ST 150 ST 151 ST 620 |
classification_tum | DAT 200f DAT 516f |
ctrlnum | (OCoLC)699709647 (DE-599)BVBBV036621218 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02291nam a2200541 cb4500</leader><controlfield tag="001">BV036621218</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20170519 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100818s2011 ad|| |||| 00||| eng d</controlfield><datafield tag="015" ind1=" " ind2=" "><subfield code="a">GBB046716</subfield><subfield code="2">dnb</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9781439811924</subfield><subfield code="c">hbk</subfield><subfield code="9">978-1-4398-1192-4</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">143981192X</subfield><subfield code="9">1-4398-1192-X</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)699709647</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV036621218</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-92</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-355</subfield><subfield code="a">DE-526</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-863</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-19</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 150</subfield><subfield code="0">(DE-625)143594:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 151</subfield><subfield code="0">(DE-625)143595:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 620</subfield><subfield code="0">(DE-625)143684:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 200f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 516f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Hager, Georg</subfield><subfield code="d">1970-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)13194522X</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Introduction to high performance computing for scientists and engineers</subfield><subfield code="c">Georg Hager ; Gerhard Wellein</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Boca Raton [u.a.]</subfield><subfield code="b">CRC Press</subfield><subfield code="c">2011</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XXV, 330 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Chapman & Hall/CRC computational science series</subfield><subfield code="v">7</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">High performance computing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Science / Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Engineering / Data processing</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Datenverarbeitung</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Ingenieurwissenschaften</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Naturwissenschaft</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Naturwissenschaften</subfield><subfield code="0">(DE-588)4041421-8</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Hochleistungsrechnen</subfield><subfield code="0">(DE-588)4532701-4</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Ingenieurwissenschaften</subfield><subfield code="0">(DE-588)4137304-2</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Hochleistungsrechnen</subfield><subfield code="0">(DE-588)4532701-4</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Naturwissenschaften</subfield><subfield code="0">(DE-588)4041421-8</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Ingenieurwissenschaften</subfield><subfield code="0">(DE-588)4137304-2</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Wellein, Gerhard</subfield><subfield code="d">1970-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)120334836</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Chapman & Hall/CRC computational science series</subfield><subfield code="v">7</subfield><subfield code="w">(DE-604)BV039102824</subfield><subfield code="9">7</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020541244&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-020541244</subfield></datafield></record></collection> |
id | DE-604.BV036621218 |
illustrated | Illustrated |
indexdate | 2025-04-04T04:01:44Z |
institution | BVB |
isbn | 9781439811924 143981192X |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-020541244 |
oclc_num | 699709647 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM DE-20 DE-29T DE-83 DE-92 DE-703 DE-355 DE-BY-UBR DE-526 DE-706 DE-863 DE-BY-FWS DE-634 DE-19 DE-BY-UBM |
owner_facet | DE-91G DE-BY-TUM DE-20 DE-29T DE-83 DE-92 DE-703 DE-355 DE-BY-UBR DE-526 DE-706 DE-863 DE-BY-FWS DE-634 DE-19 DE-BY-UBM |
physical | XXV, 330 S. Ill., graph. Darst. |
publishDate | 2011 |
publishDateSearch | 2011 |
publishDateSort | 2011 |
publisher | CRC Press |
record_format | marc |
series | Chapman & Hall/CRC computational science series |
series2 | Chapman & Hall/CRC computational science series |
spellingShingle | Hager, Georg 1970- Wellein, Gerhard 1970- Introduction to high performance computing for scientists and engineers Chapman & Hall/CRC computational science series High performance computing Science / Data processing Engineering / Data processing Datenverarbeitung Ingenieurwissenschaften Naturwissenschaft Naturwissenschaften (DE-588)4041421-8 gnd Hochleistungsrechnen (DE-588)4532701-4 gnd Ingenieurwissenschaften (DE-588)4137304-2 gnd |
subject_GND | (DE-588)4041421-8 (DE-588)4532701-4 (DE-588)4137304-2 |
title | Introduction to high performance computing for scientists and engineers |
title_auth | Introduction to high performance computing for scientists and engineers |
title_exact_search | Introduction to high performance computing for scientists and engineers |
title_full | Introduction to high performance computing for scientists and engineers Georg Hager ; Gerhard Wellein |
title_fullStr | Introduction to high performance computing for scientists and engineers Georg Hager ; Gerhard Wellein |
title_full_unstemmed | Introduction to high performance computing for scientists and engineers Georg Hager ; Gerhard Wellein |
title_short | Introduction to high performance computing for scientists and engineers |
title_sort | introduction to high performance computing for scientists and engineers |
topic | High performance computing Science / Data processing Engineering / Data processing Datenverarbeitung Ingenieurwissenschaften Naturwissenschaft Naturwissenschaften (DE-588)4041421-8 gnd Hochleistungsrechnen (DE-588)4532701-4 gnd Ingenieurwissenschaften (DE-588)4137304-2 gnd |
topic_facet | High performance computing Science / Data processing Engineering / Data processing Datenverarbeitung Ingenieurwissenschaften Naturwissenschaft Naturwissenschaften Hochleistungsrechnen |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=020541244&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
volume_link | (DE-604)BV039102824 |
work_keys_str_mv | AT hagergeorg introductiontohighperformancecomputingforscientistsandengineers AT welleingerhard introductiontohighperformancecomputingforscientistsandengineers |
Inhaltsverzeichnis
THWS Würzburg Teilbibliothek SHL, Raum I.2.11
Signatur: |
1340 ST 150 H144 |
---|---|
Exemplar 1 | ausleihbar Verfügbar Bestellen |