Performance optimization of numerically intensive codes:
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Philadelphia
SIAM
2001
|
Schriftenreihe: | Software, environments, tools
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XI, 173 S. graph. Darst. |
ISBN: | 9780898714845 0898714842 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV013770609 | ||
003 | DE-604 | ||
005 | 20090715 | ||
007 | t | ||
008 | 010612s2001 d||| |||| 00||| eng d | ||
020 | |a 9780898714845 |9 978-0-89871-484-5 | ||
020 | |a 0898714842 |9 0-89871-484-2 | ||
035 | |a (OCoLC)614132117 | ||
035 | |a (DE-599)BVBBV013770609 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-29T |a DE-703 |a DE-83 |a DE-20 |a DE-91G | ||
084 | |a ST 130 |0 (DE-625)143588: |2 rvk | ||
084 | |a 68M20 |2 msc | ||
084 | |a MAT 679f |2 stub | ||
084 | |a 68W10 |2 msc | ||
084 | |a DAT 532f |2 stub | ||
100 | 1 | |a Goedecker, Stefan |e Verfasser |4 aut | |
245 | 1 | 0 | |a Performance optimization of numerically intensive codes |c Stefan Goedecker ; Adolfy Hoisie |
264 | 1 | |a Philadelphia |b SIAM |c 2001 | |
300 | |a XI, 173 S. |b graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 0 | |a Software, environments, tools | |
650 | 0 | 7 | |a Numerisches Verfahren |0 (DE-588)4128130-5 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Performanz |g Linguistik |0 (DE-588)4128325-9 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Datenverarbeitung |0 (DE-588)4011152-0 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Datenverarbeitung |0 (DE-588)4011152-0 |D s |
689 | 0 | 1 | |a Numerisches Verfahren |0 (DE-588)4128130-5 |D s |
689 | 0 | 2 | |a Performanz |g Linguistik |0 (DE-588)4128325-9 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Hoisie, Adolfy |e Verfasser |4 aut | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=009412159&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-009412159 |
Datensatz im Suchindex
_version_ | 1804128589542588416 |
---|---|
adam_text | Contents
Preface ix
1 Introduction 1
2 Notions of Computer Architecture 3
2.1 The on chip parallelism of superscalar architectures 3
2.2 Overview of the memory hierarchy of RISC architectures 7
2.3 Mapping rules for caches 10
2.4 A taxonomy of cache misses 11
2.5 TLB misses 12
2.6 Multilevel cache configurations 12
2.7 Characteristics of memory hierarchies on some common machines . . 13
2.8 Parallel architectures 13
2.8.1 Shared memory architectures 13
2.8.2 Distributed memory architectures 16
2.8.3 Distributed shared memory architectures 22
2.9 A comparison between vector and superscalar architectures 23
3 A Pew Basic Efficiency Guidelines 27
3.1 Selection of best algorithm 27
3.2 Use of efficient libraries 27
3.3 Optimal data layout 29
3.4 Use of compiler optimizations 30
3.5 Basic optimizations done by the compiler 30
4 Timing and Profiling of a Program 37
4.1 Subroutine level profiling 38
4.2 Tick based profiling 39
4.3 Timing small sections of your program 40
4.4 Assembler output 42
4.5 Hardware performance monitors 42
4.6 Profiling parallel programs 44
v
vi Contents
5 Optimization of Floating Point Operations 47
5.1 Fused multiply add instructions 48
5.2 Exposing instruction level parallelism in a program 48
5.3 Software pipelining 50
5.4 Improving the ratio of floating point operations
to memory accesses 53
5.5 Running out of registers 55
5.6 An example where automatic loop unrolling fails 59
5.7 Loop unrolling overheads 60
5.8 Aliasing 60
5.9 Array arithmetic in Fortran90 64
5.10 Operator overloading in C++ 67
5.11 Elimination of floating point exceptions 67
5.12 Type conversions 67
5.13 Sign conversions 68
5.14 Complex arithmetic 69
5.15 Special functions 69
5.16 Eliminating overheads 72
5.16.1 If statements 72
5.16.2 Loop overheads 76
5.16.3 Subroutine calling overheads 77
5.17 Copy overheads in Fortran90 78
5.18 Attaining peak speed 78
6 Optimization of Memory Access 79
6.1 An illustration of the memory access times on RISC machines .... 79
6.2 Performance of various computers for unit and large stride
data access 82
6.3 Loop reordering for optimal data locality 85
6.4 Loop fusion to reduce unnecessary memory references 87
6.5 Data locality and the conceptual layout of a program 88
6.6 Cache thrashing 89
6.7 Experimental determination of cache and TLB parameters 91
6.8 Finding optimal strides 93
6.9 Square blocking 95
6.10 Line blocking 98
(i.U Prefetching 103
6.12 Misalignment of data 105
7 Miscellaneous Optimizations 107
7.1 Balancing the load of the functional units 107
7.2 Accessing the instructions 107
7.3 I/O: Writing to and reading from files 108
7.4 Memory fragmentation in Fortran90 108
7.5 Optimizations for vector architectures 110
Contents vii
8 Optimization of Parallel Programs 113
8.1 Ideal and observed speedup 113
8.2 Message passing libraries 115
8.3 Data locality 116
8.4 Load balancing 117
8.5 Minimizing the surface to volume ratio in grid based methods .... 117
8.6 Coarse grain parallelism against fine grain parallelism 118
8.7 Adapting parallel programs to the computer topology 120
9 Case Studies 121
9.1 Matrix vector multiplication 121
9.2 Sparse matrix vector multiplication 124
9.3 Two loops from a configuration interaction program 126
9.4 A two dimensional wavelet transform 131
9.5 A three dimensional fast Fourier transform 135
9.5.1 Vector machines 136
9.5.2 RISC machines 137
9.5.3 Parallel machines 138
9.6 Multigrid methods on parallel machines 140
9.7 A real world electronic structure code 145
10 Benchmarks 147
Appendix 151
A.I Timing routine for BLAS library 151
A.2 MPI timing routine 152
A.3 Program that should run at peak speed 155
A.4 Program for memory testing 159
A.5 Program to test suitability of parallel computers for
fine grained tasks 163
A.6 Unrolled CI loop structure 164
Bibliography 167
Index 169
|
any_adam_object | 1 |
author | Goedecker, Stefan Hoisie, Adolfy |
author_facet | Goedecker, Stefan Hoisie, Adolfy |
author_role | aut aut |
author_sort | Goedecker, Stefan |
author_variant | s g sg a h ah |
building | Verbundindex |
bvnumber | BV013770609 |
classification_rvk | ST 130 |
classification_tum | MAT 679f DAT 532f |
ctrlnum | (OCoLC)614132117 (DE-599)BVBBV013770609 |
discipline | Informatik Mathematik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01731nam a2200445 c 4500</leader><controlfield tag="001">BV013770609</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20090715 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">010612s2001 d||| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780898714845</subfield><subfield code="9">978-0-89871-484-5</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">0898714842</subfield><subfield code="9">0-89871-484-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)614132117</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV013770609</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-91G</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 130</subfield><subfield code="0">(DE-625)143588:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68M20</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">MAT 679f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">68W10</subfield><subfield code="2">msc</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 532f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Goedecker, Stefan</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Performance optimization of numerically intensive codes</subfield><subfield code="c">Stefan Goedecker ; Adolfy Hoisie</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Philadelphia</subfield><subfield code="b">SIAM</subfield><subfield code="c">2001</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XI, 173 S.</subfield><subfield code="b">graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="0" ind2=" "><subfield code="a">Software, environments, tools</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Numerisches Verfahren</subfield><subfield code="0">(DE-588)4128130-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Performanz</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4128325-9</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Datenverarbeitung</subfield><subfield code="0">(DE-588)4011152-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Numerisches Verfahren</subfield><subfield code="0">(DE-588)4128130-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Performanz</subfield><subfield code="g">Linguistik</subfield><subfield code="0">(DE-588)4128325-9</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hoisie, Adolfy</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=009412159&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-009412159</subfield></datafield></record></collection> |
id | DE-604.BV013770609 |
illustrated | Illustrated |
indexdate | 2024-07-09T18:51:39Z |
institution | BVB |
isbn | 9780898714845 0898714842 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-009412159 |
oclc_num | 614132117 |
open_access_boolean | |
owner | DE-29T DE-703 DE-83 DE-20 DE-91G DE-BY-TUM |
owner_facet | DE-29T DE-703 DE-83 DE-20 DE-91G DE-BY-TUM |
physical | XI, 173 S. graph. Darst. |
publishDate | 2001 |
publishDateSearch | 2001 |
publishDateSort | 2001 |
publisher | SIAM |
record_format | marc |
series2 | Software, environments, tools |
spelling | Goedecker, Stefan Verfasser aut Performance optimization of numerically intensive codes Stefan Goedecker ; Adolfy Hoisie Philadelphia SIAM 2001 XI, 173 S. graph. Darst. txt rdacontent n rdamedia nc rdacarrier Software, environments, tools Numerisches Verfahren (DE-588)4128130-5 gnd rswk-swf Performanz Linguistik (DE-588)4128325-9 gnd rswk-swf Datenverarbeitung (DE-588)4011152-0 gnd rswk-swf Datenverarbeitung (DE-588)4011152-0 s Numerisches Verfahren (DE-588)4128130-5 s Performanz Linguistik (DE-588)4128325-9 s DE-604 Hoisie, Adolfy Verfasser aut HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=009412159&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Goedecker, Stefan Hoisie, Adolfy Performance optimization of numerically intensive codes Numerisches Verfahren (DE-588)4128130-5 gnd Performanz Linguistik (DE-588)4128325-9 gnd Datenverarbeitung (DE-588)4011152-0 gnd |
subject_GND | (DE-588)4128130-5 (DE-588)4128325-9 (DE-588)4011152-0 |
title | Performance optimization of numerically intensive codes |
title_auth | Performance optimization of numerically intensive codes |
title_exact_search | Performance optimization of numerically intensive codes |
title_full | Performance optimization of numerically intensive codes Stefan Goedecker ; Adolfy Hoisie |
title_fullStr | Performance optimization of numerically intensive codes Stefan Goedecker ; Adolfy Hoisie |
title_full_unstemmed | Performance optimization of numerically intensive codes Stefan Goedecker ; Adolfy Hoisie |
title_short | Performance optimization of numerically intensive codes |
title_sort | performance optimization of numerically intensive codes |
topic | Numerisches Verfahren (DE-588)4128130-5 gnd Performanz Linguistik (DE-588)4128325-9 gnd Datenverarbeitung (DE-588)4011152-0 gnd |
topic_facet | Numerisches Verfahren Performanz Linguistik Datenverarbeitung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=009412159&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT goedeckerstefan performanceoptimizationofnumericallyintensivecodes AT hoisieadolfy performanceoptimizationofnumericallyintensivecodes |