Programming massively parallel processors: a hands-on approach
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Burlington, MA [u.a.]
Elsevier
2010
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | Includes bibliographical references and index |
Beschreibung: | XVIII, 258 S. Ill., graph. Darst. |
ISBN: | 9780123814722 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV035989667 | ||
003 | DE-604 | ||
005 | 20230126 | ||
007 | t | ||
008 | 100201s2010 ad|| |||| 00||| eng d | ||
010 | |a 2009048259 | ||
020 | |a 9780123814722 |c alk. paper |9 978-0-12-381472-2 | ||
035 | |a (OCoLC)489718737 | ||
035 | |a (DE-599)BSZ316485241 | ||
040 | |a DE-604 |b ger | ||
041 | 0 | |a eng | |
049 | |a DE-91G |a DE-92 |a DE-634 |a DE-29T |a DE-703 |a DE-706 |a DE-83 |a DE-384 |a DE-11 |a DE-526 |a DE-20 |a DE-Aug4 |a DE-523 | ||
050 | 0 | |a QA76.642 | |
082 | 0 | |a 004/.35 |2 22 | |
084 | |a ST 151 |0 (DE-625)143595: |2 rvk | ||
084 | |a ST 170 |0 (DE-625)143602: |2 rvk | ||
084 | |a DAT 516f |2 stub | ||
100 | 1 | |a Kirk, David |d 1960- |e Verfasser |0 (DE-588)141606215 |4 aut | |
245 | 1 | 0 | |a Programming massively parallel processors |b a hands-on approach |c David B. Kirk and Wen-mei W. Hwu |
264 | 1 | |a Burlington, MA [u.a.] |b Elsevier |c 2010 | |
300 | |a XVIII, 258 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
500 | |a Includes bibliographical references and index | ||
650 | 4 | |a Parallel programming (Computer science) | |
650 | 4 | |a Parallel processing (Electronic computers) | |
650 | 4 | |a Multiprocessors | |
650 | 4 | |a Computer architecture | |
650 | 0 | 7 | |a Massive Parallelität |0 (DE-588)4324752-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Parallelprozessor |0 (DE-588)4173279-0 |2 gnd |9 rswk-swf |
650 | 0 | 7 | |a Programmierung |0 (DE-588)4076370-5 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a Parallelprozessor |0 (DE-588)4173279-0 |D s |
689 | 0 | 1 | |a Massive Parallelität |0 (DE-588)4324752-0 |D s |
689 | 0 | 2 | |a Programmierung |0 (DE-588)4076370-5 |D s |
689 | 0 | |5 DE-604 | |
700 | 1 | |a Hwu, Wen-mei W. |e Verfasser |0 (DE-588)1016974647 |4 aut | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018882404&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-018882404 |
Datensatz im Suchindex
_version_ | 1804141013410775040 |
---|---|
adam_text | Titel: Programming massively parallel processors
Autor: Kirk, David B.
Jahr: 2010
Contents
Preface......................................................................................................................xi
Acknowledgments................................................................................................xvii
Dedication...............................................................................................................xix
CHAPTER 1 INTRODUCTION................................................................................1
1.1 GPUs as Parallel Computers..........................................................2
1.2 Architecture of a Modern GPU......................................................8
1.3 Why More Speed or Parallelism?................................................10
1.4 Parallel Programming Languages and Models............................13
1.5 Overarching Goals........................................................................15
1.6 Organization of the Book.............................................................16
CHAPTER 2 HISTORY OF GPU COMPUTING.....................................................21
2.1 Evolution of Graphics Pipelines..................................................21
2.1.1 The Era of Fixed-Function Graphics Pipelines..................22
2.1.2 Evolution of Programmable Real-Time Graphics.............26
2.1.3 Unified Graphics and Computing Processors....................29
2.1.4 GPGPU: An Intermediate Step...........................................31
2.2 GPU Computing...........................................................................32
2.2.1 Scalable GPUs.....................................................................33
2.2.2 Recent Developments..........................................................34
2.3 Future Trends................................................................................34
CHAPTER 3 INTRODUCTION TO CUDA..............................................................39
3.1 Data Parallelism............................................................................39
3.2 CUDA Program Structure............................................................41
3.3 A Matrix-Matrix Multiplication Example...................................42
3.4 Device Memories and Data Transfer...........................................46
3.5 Kernel Functions and Threading..................................................51
3.6 Summary.......................................................................................56
3.6.1 Function declarations..........................................................56
3.6.2 Kernel launch......................................................................56
3.6.3 Predefined variables............................................................56
3.6.4 Runtime API........................................................................57
CHAPTER 4 CUDA THREADS.............................................................................59
4.1 CUDA Thread Organization........................................................59
4.2 Using bl ockldx and threadldx..........................................64
4.3 Synchronization and Transparent Scalability..............................68
VII
viii Contents
4.4 Thread Assignment.......................................................................70
4.5 Thread Scheduling and Latency Tolerance.................................71
4.6 Summary.......................................................................................74
4.7 Exercises.......................................................................................74
CHAPTER 5 CUDA™ MEMORIES.......................................................................77
5.1 Importance of Memory Access Efficiency..................................78
5.2 CUDA Device Memory Types....................................................79
5.3 A Strategy for Reducing Global Memory Traffic.......................83
5.4 Memory as a Limiting Factor to Parallelism..............................90
5.5 Summary.......................................................................................92
5.6 Exercises.......................................................................................93
CHAPTER 6 PERFORMANCE CONSIDERATIONS................................................95
6.1 More on Thread Execution..........................................................96
6.2 Global Memory Bandwidth........................................................103
6.3 Dynamic Partitioning of SM Resources....................................Ill
6.4 Data Prefetching.........................................................................113
6.5 Instruction Mix...........................................................................115
6.6 Thread Granularity.....................................................................116
6.7 Measured Performance and Summary.......................................118
6.8 Exercises.....................................................................................120
CHAPTER 7 FLOATING POINT CONSIDERATIONS...........................................125
7.1 Floating-Point Format.................................................................126
7.1.1 Normalized Representation of M.....................................126
7.1.2 Excess Encoding of E.......................................................127
7.2 Representable Numbers..............................................................129
7.3 Special Bit Patterns and Precision.............................................134
7.4 Arithmetic Accuracy and Rounding..........................................135
7.5 Algorithm Considerations...........................................................136
7.6 Summary.....................................................................................138
7.7 Exercises.....................................................................................138
CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI
RECONSTRUCTION.......................................................................141
8.1 Application Background.............................................................142
8.2 Iterative Reconstruction..............................................................144
8.3 Computing Fwd...........................................................................148
Step 1. Determine the Kernel Parallelism Structure.................149
Step 2. Getting Around the Memory Bandwidth Limitation.... 156
Contents ix
Step 3. Using Hardware Trigonometry Functions....................163
Step 4. Experimental Performance Tuning...............................166
8.4 Final Evaluation..........................................................................167
8.5 Exercises.....................................................................................170
CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION
AND ANALYSIS............................................................................173
9.1 Application Background.............................................................174
9.2 A Simple Kernel Implementation..............................................176
9.3 Instruction Execution Efficiency................................................180
9.4 Memory Coalescing....................................................................182
9.5 Additional Performance Comparisons.......................................185
9.6 Using Multiple GPUs.................................................................187
9.7 Exercises.....................................................................................188
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL
THINKING....................................................................................191
10.1 Goals of Parallel Programming...............................................192
10.2 Problem Decomposition...........................................................193
10.3 Algorithm Selection.................................................................196
10.4 Computational Thinking...........................................................202
10.5 Exercises...................................................................................204
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCL™......................................205
.1 Background...............................................................................205
.2 Data Parallelism Model............................................................207
.3 Device Architecture..................................................................209
.4 Kernel Functions......................................................................211
.5 Device Management and Kernel Launch................................212
.6 Electrostatic Potential Map in OpenCL..................................214
.7 Summary...................................................................................219
.8 Exercises...................................................................................220
CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK........................................221
12.1 Goals Revisited.........................................................................221
12.2 Memory Architecture Evolution..............................................223
12.2.1 Large Virtual and Physical Address Spaces................223
12.2.2 Unified Device Memory Space....................................224
12.2.3 Configurable Caching and Scratch Pad........................225
12.2.4 Enhanced Atomic Operations.......................................226
12.2.5 Enhanced Global Memory Access...............................226
Contents
12.3 Kernel Execution Control Evolution.......................................227
12.3.1 Function Calls within Kernel Functions......................227
12.3.2 Exception Handling in Kernel Functions.....................227
12.3.3 Simultaneous Execution of Multiple Kernels..............228
12.3.4 Interruptible Kernels.....................................................228
12.4 Core Performance.....................................................................229
12.4.1 Double-Precision Speed ...............................................229
12.4.2 Better Control Flow Efficiency....................................229
12.5 Programming Environment......................................................230
12.6 A Bright Outlook......................................................................230
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION
SOURCE CODE.............................................................................233
A.1 matri xmul . cu........................................................................233
A.2 matri xmul _gol d . cpp.........................................................237
A.3 matri xmul . h..........................................................................238
A.4 assi st. h.................................................................................239
A.5 Expected Output.........................................................................243
APPENDIX B GPU COMPUTE CAPABILITIES....................................................245
B.1 GPU Compute Capability Tables...............................................245
B.2 Memory Coalescing Variations..................................................246
Index.........................................................................................................251
|
any_adam_object | 1 |
author | Kirk, David 1960- Hwu, Wen-mei W. |
author_GND | (DE-588)141606215 (DE-588)1016974647 |
author_facet | Kirk, David 1960- Hwu, Wen-mei W. |
author_role | aut aut |
author_sort | Kirk, David 1960- |
author_variant | d k dk w m w h wmw wmwh |
building | Verbundindex |
bvnumber | BV035989667 |
callnumber-first | Q - Science |
callnumber-label | QA76 |
callnumber-raw | QA76.642 |
callnumber-search | QA76.642 |
callnumber-sort | QA 276.642 |
callnumber-subject | QA - Mathematics |
classification_rvk | ST 151 ST 170 |
classification_tum | DAT 516f |
ctrlnum | (OCoLC)489718737 (DE-599)BSZ316485241 |
dewey-full | 004/.35 |
dewey-hundreds | 000 - Computer science, information, general works |
dewey-ones | 004 - Computer science |
dewey-raw | 004/.35 |
dewey-search | 004/.35 |
dewey-sort | 14 235 |
dewey-tens | 000 - Computer science, information, general works |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02050nam a2200493 c 4500</leader><controlfield tag="001">BV035989667</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20230126 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">100201s2010 ad|| |||| 00||| eng d</controlfield><datafield tag="010" ind1=" " ind2=" "><subfield code="a">2009048259</subfield></datafield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780123814722</subfield><subfield code="c">alk. paper</subfield><subfield code="9">978-0-12-381472-2</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)489718737</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BSZ316485241</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield><subfield code="a">DE-92</subfield><subfield code="a">DE-634</subfield><subfield code="a">DE-29T</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-706</subfield><subfield code="a">DE-83</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-526</subfield><subfield code="a">DE-20</subfield><subfield code="a">DE-Aug4</subfield><subfield code="a">DE-523</subfield></datafield><datafield tag="050" ind1=" " ind2="0"><subfield code="a">QA76.642</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">004/.35</subfield><subfield code="2">22</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 151</subfield><subfield code="0">(DE-625)143595:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 170</subfield><subfield code="0">(DE-625)143602:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 516f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Kirk, David</subfield><subfield code="d">1960-</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)141606215</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Programming massively parallel processors</subfield><subfield code="b">a hands-on approach</subfield><subfield code="c">David B. Kirk and Wen-mei W. Hwu</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Burlington, MA [u.a.]</subfield><subfield code="b">Elsevier</subfield><subfield code="c">2010</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XVIII, 258 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="500" ind1=" " ind2=" "><subfield code="a">Includes bibliographical references and index</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parallel programming (Computer science)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Parallel processing (Electronic computers)</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multiprocessors</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Computer architecture</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Massive Parallelität</subfield><subfield code="0">(DE-588)4324752-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Parallelprozessor</subfield><subfield code="0">(DE-588)4173279-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">Programmierung</subfield><subfield code="0">(DE-588)4076370-5</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">Parallelprozessor</subfield><subfield code="0">(DE-588)4173279-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="1"><subfield code="a">Massive Parallelität</subfield><subfield code="0">(DE-588)4324752-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2="2"><subfield code="a">Programmierung</subfield><subfield code="0">(DE-588)4076370-5</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Hwu, Wen-mei W.</subfield><subfield code="e">Verfasser</subfield><subfield code="0">(DE-588)1016974647</subfield><subfield code="4">aut</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018882404&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-018882404</subfield></datafield></record></collection> |
id | DE-604.BV035989667 |
illustrated | Illustrated |
indexdate | 2024-07-09T22:09:07Z |
institution | BVB |
isbn | 9780123814722 |
language | English |
lccn | 2009048259 |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-018882404 |
oclc_num | 489718737 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM DE-92 DE-634 DE-29T DE-703 DE-706 DE-83 DE-384 DE-11 DE-526 DE-20 DE-Aug4 DE-523 |
owner_facet | DE-91G DE-BY-TUM DE-92 DE-634 DE-29T DE-703 DE-706 DE-83 DE-384 DE-11 DE-526 DE-20 DE-Aug4 DE-523 |
physical | XVIII, 258 S. Ill., graph. Darst. |
publishDate | 2010 |
publishDateSearch | 2010 |
publishDateSort | 2010 |
publisher | Elsevier |
record_format | marc |
spelling | Kirk, David 1960- Verfasser (DE-588)141606215 aut Programming massively parallel processors a hands-on approach David B. Kirk and Wen-mei W. Hwu Burlington, MA [u.a.] Elsevier 2010 XVIII, 258 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier Includes bibliographical references and index Parallel programming (Computer science) Parallel processing (Electronic computers) Multiprocessors Computer architecture Massive Parallelität (DE-588)4324752-0 gnd rswk-swf Parallelprozessor (DE-588)4173279-0 gnd rswk-swf Programmierung (DE-588)4076370-5 gnd rswk-swf Parallelprozessor (DE-588)4173279-0 s Massive Parallelität (DE-588)4324752-0 s Programmierung (DE-588)4076370-5 s DE-604 Hwu, Wen-mei W. Verfasser (DE-588)1016974647 aut HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018882404&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Kirk, David 1960- Hwu, Wen-mei W. Programming massively parallel processors a hands-on approach Parallel programming (Computer science) Parallel processing (Electronic computers) Multiprocessors Computer architecture Massive Parallelität (DE-588)4324752-0 gnd Parallelprozessor (DE-588)4173279-0 gnd Programmierung (DE-588)4076370-5 gnd |
subject_GND | (DE-588)4324752-0 (DE-588)4173279-0 (DE-588)4076370-5 |
title | Programming massively parallel processors a hands-on approach |
title_auth | Programming massively parallel processors a hands-on approach |
title_exact_search | Programming massively parallel processors a hands-on approach |
title_full | Programming massively parallel processors a hands-on approach David B. Kirk and Wen-mei W. Hwu |
title_fullStr | Programming massively parallel processors a hands-on approach David B. Kirk and Wen-mei W. Hwu |
title_full_unstemmed | Programming massively parallel processors a hands-on approach David B. Kirk and Wen-mei W. Hwu |
title_short | Programming massively parallel processors |
title_sort | programming massively parallel processors a hands on approach |
title_sub | a hands-on approach |
topic | Parallel programming (Computer science) Parallel processing (Electronic computers) Multiprocessors Computer architecture Massive Parallelität (DE-588)4324752-0 gnd Parallelprozessor (DE-588)4173279-0 gnd Programmierung (DE-588)4076370-5 gnd |
topic_facet | Parallel programming (Computer science) Parallel processing (Electronic computers) Multiprocessors Computer architecture Massive Parallelität Parallelprozessor Programmierung |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=018882404&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT kirkdavid programmingmassivelyparallelprocessorsahandsonapproach AT hwuwenmeiw programmingmassivelyparallelprocessorsahandsonapproach |