CUDA programming: a developer's guide to parallel computing with GPUs
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Buch |
Sprache: | English |
Veröffentlicht: |
Amsterdam [u.a.]
Morgan Kaufmann
2013
|
Schlagworte: | |
Online-Zugang: | Inhaltsverzeichnis |
Beschreibung: | XIV, 576 S. Ill., graph. Darst. |
ISBN: | 9780124159334 |
Internformat
MARC
LEADER | 00000nam a2200000 c 4500 | ||
---|---|---|---|
001 | BV040667407 | ||
003 | DE-604 | ||
005 | 20131007 | ||
007 | t | ||
008 | 130111s2013 ad|| |||| 00||| eng d | ||
020 | |a 9780124159334 |9 978-0-12-415933-4 | ||
035 | |a (OCoLC)827010116 | ||
035 | |a (DE-599)HBZHT017309436 | ||
040 | |a DE-604 |b ger |e rakwb | ||
041 | 0 | |a eng | |
049 | |a DE-91G |a DE-384 |a DE-703 |a DE-11 |a DE-573 | ||
084 | |a ST 151 |0 (DE-625)143595: |2 rvk | ||
084 | |a ST 230 |0 (DE-625)143617: |2 rvk | ||
084 | |a DAT 516f |2 stub | ||
084 | |a 54.25 |2 bkl | ||
084 | |a DAT 752f |2 stub | ||
084 | |a 54.73 |2 bkl | ||
100 | 1 | |a Cook, Shane |e Verfasser |4 aut | |
245 | 1 | 0 | |a CUDA programming |b a developer's guide to parallel computing with GPUs |c Shane Cook |
264 | 1 | |a Amsterdam [u.a.] |b Morgan Kaufmann |c 2013 | |
300 | |a XIV, 576 S. |b Ill., graph. Darst. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
650 | 0 | 7 | |a CUDA |g Informatik |0 (DE-588)7719528-0 |2 gnd |9 rswk-swf |
689 | 0 | 0 | |a CUDA |g Informatik |0 (DE-588)7719528-0 |D s |
689 | 0 | |5 DE-604 | |
856 | 4 | 2 | |m HBZ Datenaustausch |q application/pdf |u http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025494052&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |3 Inhaltsverzeichnis |
999 | |a oai:aleph.bib-bvb.de:BVB01-025494052 |
Datensatz im Suchindex
_version_ | 1804149793596899328 |
---|---|
adam_text | Titel: CUDA programming
Autor: Cook, Shane
Jahr: 2013
Contents
Preface................................................................................................................................................xiii
CHAPTER 1 A Short History of Supercomputing................................................1
Introduction................................................................................................................1
Von Neumann Architecture........................................................................................2
Cray.............................................................................................................................5
Connection Machine...................................................................................................6
Cell Processor.............................................................................................................7
Multinode Computing................................................................................................9
The Early Days of GPGPU Coding.........................................................................11
The Death of the Single-Core Solution...................................................................12
NVIDIA and CUDA.................................................................................................13
GPU Hardware.........................................................................................................15
Alternatives to CUDA..............................................................................................16
OpenCL...............................................................................................................16
DirectCompute....................................................................................................17
CPU alternatives..................................................................................................17
Directives and libraries.......................................................................................18
Conclusion................................................................................................................19
CHAPTER 2 Understanding Parallelism with GPUs.........................................21
Introduction..............................................................................................................21
Traditional Serial Code............................................................................................21
Serial/Parallel Problems...........................................................................................23
Concurrency..............................................................................................................24
Locality................................................................................................................25
Types of Parallelism.................................................................................................27
Task-based parallelism........................................................................................27
Data-based parallelism........................................................................................28
Flynn s Taxonomy....................................................................................................30
Some Common Parallel Patterns..............................................................................31
Loop-based patterns............................................................................................31
Fork/join pattern..................................................................................................33
Tiling/grids..........................................................................................................35
Divide and conquer.............................................................................................35
Conclusion................................................................................................................36
CHAPTER 3 CUDA Hardware Overview...........................................................37
PC Architecture........................................................................................................37
GPU Hardware.........................................................................................................42
vi Contents
....46
CPUs and GPUs................................................................................................ .,
. ..........4o
Compute Levels........................................................................................
Compute 1.0........................................................................................................
Compute 1.1........................................................................................................
Compute 1.2........................................................................................................
Compute 1.3........................................................................................................
Compute 2.0........................................................................................................
Compute 2.1........................................................................................................
CHAPTER 4 Setting Up CUDA........................................................................53
Introduction..............................................................................................................
Installing the SDK under Windows.........................................................................53
Visual Studio............................................................................................................54
Projects................................................................................................................^
64-bit users..........................................................................................................- -
Creating projects.................................................................................................-1
Linux.........................................................................................................................58
Kernel base driver installation (CentOS, Ubuntu 10.4).....................................59
Mac...........................................................................................................................62
Installing a Debugger...............................................................................................62
Compilation Model...................................................................................................66
Error Handling..........................................................................................................67
Conclusion................................................................................................................68
CHAPTER 5 Grids, Blocks, and Threads.........................................................69
What it all Means.....................................................................................................69
Threads.....................................................................................................................69
Problem decomposition.......................................................................................69
How CPUs and GPUs are different....................................................................71
Task execution model..........................................................................................72
Threading on GPUs.............................................................................................73
A peek at hardware.............................................................................................74
CUDA kernels................................................. ZZ Z Z Z.^...............77
Blocks................................................................. 78
Block arrangement..............................................................................................80
Grids...................................................... g3
Stride and offset...................................... 84
X and V thread indexes............................................. 85
Warps............................................................ Q]
Branching....................................... 92
GPU utilization............................................. 93
Block Scheduling................................. 95
Contents vii
A Practical Example-Histograms..........................................................................97
Conclusion..............................................................................................................103
Questions...........................................................................................................104
Answers.............................................................................................................104
CHAPTER 6 Memory Handling with CUDA....................................................107
Introduction............................................................................................................107
Caches.....................................................................................................................108
Types of data storage........................................................................................110
Register Usage........................................................................................................Ill
Shared Memory......................................................................................................120
Sorting using shared memory...........................................................................121
Radix sort..........................................................................................................125
Merging lists......................................................................................................131
Parallel merging................................................................................................137
Parallel reduction...............................................................................................140
A hybrid approach.............................................................................................144
Shared memory on different GPUs...................................................................148
Shared memory summary.................................................................................148
Questions on shared memory............................................................................149
Answers for shared memory.............................................................................149
Constant Memory...................................................................................................150
Constant memory caching.................................................................................150
Constant memory broadcast..............................................................................152
Constant memory updates at runtime...............................................................162
Constant question..............................................................................................166
Constant answer................................................................................................167
Global Memory......................................................................................................167
Score boarding...................................................................................................176
Global memory sorting.....................................................................................176
Sample sort........................................................................................................179
Questions on global memory............................................................................198
Answers on global memory..............................................................................199
Texture Memory.....................................................................................................200
Texture caching.................................................................................................200
Hardware manipulation of memory fetches.....................................................200
Restrictions using textures................................................................................201
Conclusion..............................................................................................................202
CHAPTER 7 Using CUDA in Practice............................................................203
Introduction............................................................................................................203
Serial and Parallel Code.........................................................................................203
Design goals of CPUs and GPUs.....................................................................203
viii Contents
Algorithms that work best on the CPU versus the GPU..................................206
Processing Datasets................................................................................................jLKjy
Using ballot and other intrinsic operations.......................................................211
Profiling..................................................................................................................219
An Example Using AES........................................................................................231
The algorithm....................................................................................................232
Serial implementations of AES........................................................................236
An initial kernel................................................................................................239
Kernel performance...........................................................................................244
Transfer performance........................................................................................248
A single streaming version...............................................................................249
How do we compare with the CPU..................................................................250
Considerations for running on other GPUs......................................................260
Using multiple streams......................................................................................263
AES summary...................................................................................................264
Conclusion..............................................................................................................265
Questions...........................................................................................................265
Answers.............................................................................................................265
References..............................................................................................................266
CHAPTER 8 Multi-CPU and Multi-GPU Solutions..........................................267
Introduction............................................................................................................267
Locality...................................................................................................................267
Multi-CPU Systems................................................................................................267
Multi-GPU Systems................................................................................................268
Algorithms on Multiple GPUs...............................................................................269
Which GPU?........................................................................Z.ZZZ ZZ .......270
Single-Node Systems..............................................................................................274
Streams.................................................................... 275
Multiple-Node Systems......................................................................................290
Conclusion........................................................... i,0
Questions............................................................ 3Q2
Answers....................................................... 3Q2
CHAPTER 9 Optimizing Your Application......................................................305
Strategy 1: Parallel/Serial GPU/CPU Problem BreakdoVnZZZ. . . . . . .. 305
Analyzing the problem....................... 305
I 1*....................................................~II~II~II~.3Q5
Problem decomposition.................................. 307
Dependencies............ ...............................-,no
^ . ......................................................................................JUo
Dataset size.......................................... ,..
Contents ix
Resolution..........................................................................................................312
Identifying the bottlenecks................................................................................313
Grouping the tasks for CPU and GPU..............................................................317
Section summary...............................................................................................320
Strategy 2: Memory Considerations......................................................................320
Memory bandwidth...........................................................................................320
Source of limit...................................................................................................321
Memory organization........................................................................................323
Memory accesses to computation ratio............................................................325
Loop and kernel fusion.....................................................................................331
Use of shared memory and cache.....................................................................332
Section summary...............................................................................................333
Strategy 3: Transfers..............................................................................................334
Pinned memory.................................................................................................334
Zero-copy memory............................................................................................338
Bandwidth limitations.......................................................................................347
GPU timing.......................................................................................................351
Overlapping GPU transfers...............................................................................356
Section summary...............................................................................................360
Strategy 4: Thread Usage, Calculations, and Divergence.....................................361
Thread memory patterns...................................................................................361
Inactive threads..................................................................................................364
Arithmetic density.............................................................................................365
Some common compiler optimizations............................................................369
Divergence.........................................................................................................374
Understanding the low-level assembly code....................................................379
Register usage...................................................................................................383
Section summary...............................................................................................385
Strategy 5: Algorithms...........................................................................................386
Sorting...............................................................................................................386
Reduction...........................................................................................................392
Section summary...............................................................................................414
Strategy 6: Resource Contentions..........................................................................414
Identifying bottlenecks......................................................................................414
Resolving bottlenecks.......................................................................................427
Section summary...............................................................................................434
Strategy 7: Self-Tuning Applications.....................................................................435
Identifying the hardware...................................................................................436
Device utilization..............................................................................................437
Sampling performance......................................................................................438
Section summary...............................................................................................439
Conclusion..............................................................................................................439
Questions on Optimization................................................................................439
Answers.............................................................................................................440
Contents
CHAPTER 10 Libraries and SDK..................................................................^j
Introduction.......................................................................................................... ,.
441
Libranes............................................................................................................... A ..
442
General library conventions...........................................................................
NPP (Nvidia Performance Primitives)...........................................................442
T, . .............451
Thrust......................................................................................................
CuRAND.........................................................................................................40/
CuBLAS (CUDA basic linear algebra) library..............................................471
CUDA Computing SDK......................................................................................475
Device Query..................................................................................................4
Bandwidth test................................................................................................478
SimpleP2P.......................................................................................................479
asyncAPI and cudaOpenMP...........................................................................482
Aligned types..................................................................................................489
Directive-Based Programming............................................................................491
OpenACC........................................................................................................492
Writing Your Own Kernels..................................................................................499
Conclusion...........................................................................................................502
CHAPTER 11 Designing GPU-Based Systems................................................503
Introduction..........................................................................................................503
CPU Processor.....................................................................................................505
GPU Device.........................................................................................................507
Large memory support...................................................................................507
ECC memory support.....................................................................................508
Tesla compute cluster driver (TCC)...............................................................508
Higher double-precision math........................................................................508
Larger memory bus width..............................................................................508
SMI.................................................................................................................509
Status LEDs....................................................................................................509
PCI-EBus............................................................................................................509
GeForce cards......................................................................................................510
CPU Memory......................................................................................................510
Air Cooling........................................................................... 512
Liquid Cooling............................................................................... 513
Desktop Cases and Motherboards.......................................................................517
Mass Storage................................................................. gig
Motherboard-based I/O............................................................ 518
Dedicated RAID controllers...........................................................................519
HDSL..............................................................................................................520
Mass storage requirements................................................................. 521
Networking.............................................. c2i
Power Considerations........................................ 522
Contents xi
Operating Systems...............................................................................................525
Windows.........................................................................................................525
Linux...............................................................................................................525
Conclusion...........................................................................................................526
CHAPTER 12 Common Problems, Causes, and Solutions...............................527
Introduction..........................................................................................................527
Errors With CUDA Directives.............................................................................527
CUDA error handling.....................................................................................527
Kernel launching and bounds checking.........................................................528
Invalid device handles....................................................................................529
Volatile qualifiers............................................................................................530
Compute level-dependent functions..............................................................532
Device, global, and host functions.................................................................534
Kernels within streams...................................................................................535
Parallel Programming Issues...............................................................................536
Race hazards...................................................................................................536
Synchronization..............................................................................................537
Atomic operations...........................................................................................541
Algorithmic Issues...............................................................................................544
Back-to-back testing.......................................................................................544
Memory leaks.................................................................................................546
Long kernels...................................................................................................546
Finding and Avoiding Errors...............................................................................547
How many errors does your GPU program have?.........................................547
Divide and conquer.........................................................................................548
Assertions and defensive programming.........................................................549
Debug level and printing................................................................................551
Version control................................................................................................555
Developing for Future GPUs...............................................................................555
Kepler..............................................................................................................555
What to think about........................................................................................558
Further Resources................................................................................................560
Introduction.....................................................................................................560
Online courses................................................................................................560
Taught courses................................................................................................561
Books..............................................................................................................562
NVIDIA CUDA certification..........................................................................562
Conclusion...........................................................................................................562
References............................................................................................................563
Index..................................................................................................................................................565
|
any_adam_object | 1 |
author | Cook, Shane |
author_facet | Cook, Shane |
author_role | aut |
author_sort | Cook, Shane |
author_variant | s c sc |
building | Verbundindex |
bvnumber | BV040667407 |
classification_rvk | ST 151 ST 230 |
classification_tum | DAT 516f DAT 752f |
ctrlnum | (OCoLC)827010116 (DE-599)HBZHT017309436 |
discipline | Informatik |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>01382nam a2200373 c 4500</leader><controlfield tag="001">BV040667407</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">20131007 </controlfield><controlfield tag="007">t</controlfield><controlfield tag="008">130111s2013 ad|| |||| 00||| eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9780124159334</subfield><subfield code="9">978-0-12-415933-4</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)827010116</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)HBZHT017309436</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakwb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-91G</subfield><subfield code="a">DE-384</subfield><subfield code="a">DE-703</subfield><subfield code="a">DE-11</subfield><subfield code="a">DE-573</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 151</subfield><subfield code="0">(DE-625)143595:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">ST 230</subfield><subfield code="0">(DE-625)143617:</subfield><subfield code="2">rvk</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 516f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.25</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">DAT 752f</subfield><subfield code="2">stub</subfield></datafield><datafield tag="084" ind1=" " ind2=" "><subfield code="a">54.73</subfield><subfield code="2">bkl</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Cook, Shane</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">CUDA programming</subfield><subfield code="b">a developer's guide to parallel computing with GPUs</subfield><subfield code="c">Shane Cook</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Amsterdam [u.a.]</subfield><subfield code="b">Morgan Kaufmann</subfield><subfield code="c">2013</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">XIV, 576 S.</subfield><subfield code="b">Ill., graph. Darst.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="650" ind1="0" ind2="7"><subfield code="a">CUDA</subfield><subfield code="g">Informatik</subfield><subfield code="0">(DE-588)7719528-0</subfield><subfield code="2">gnd</subfield><subfield code="9">rswk-swf</subfield></datafield><datafield tag="689" ind1="0" ind2="0"><subfield code="a">CUDA</subfield><subfield code="g">Informatik</subfield><subfield code="0">(DE-588)7719528-0</subfield><subfield code="D">s</subfield></datafield><datafield tag="689" ind1="0" ind2=" "><subfield code="5">DE-604</subfield></datafield><datafield tag="856" ind1="4" ind2="2"><subfield code="m">HBZ Datenaustausch</subfield><subfield code="q">application/pdf</subfield><subfield code="u">http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025494052&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA</subfield><subfield code="3">Inhaltsverzeichnis</subfield></datafield><datafield tag="999" ind1=" " ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-025494052</subfield></datafield></record></collection> |
id | DE-604.BV040667407 |
illustrated | Illustrated |
indexdate | 2024-07-10T00:28:41Z |
institution | BVB |
isbn | 9780124159334 |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-025494052 |
oclc_num | 827010116 |
open_access_boolean | |
owner | DE-91G DE-BY-TUM DE-384 DE-703 DE-11 DE-573 |
owner_facet | DE-91G DE-BY-TUM DE-384 DE-703 DE-11 DE-573 |
physical | XIV, 576 S. Ill., graph. Darst. |
publishDate | 2013 |
publishDateSearch | 2013 |
publishDateSort | 2013 |
publisher | Morgan Kaufmann |
record_format | marc |
spelling | Cook, Shane Verfasser aut CUDA programming a developer's guide to parallel computing with GPUs Shane Cook Amsterdam [u.a.] Morgan Kaufmann 2013 XIV, 576 S. Ill., graph. Darst. txt rdacontent n rdamedia nc rdacarrier CUDA Informatik (DE-588)7719528-0 gnd rswk-swf CUDA Informatik (DE-588)7719528-0 s DE-604 HBZ Datenaustausch application/pdf http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025494052&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA Inhaltsverzeichnis |
spellingShingle | Cook, Shane CUDA programming a developer's guide to parallel computing with GPUs CUDA Informatik (DE-588)7719528-0 gnd |
subject_GND | (DE-588)7719528-0 |
title | CUDA programming a developer's guide to parallel computing with GPUs |
title_auth | CUDA programming a developer's guide to parallel computing with GPUs |
title_exact_search | CUDA programming a developer's guide to parallel computing with GPUs |
title_full | CUDA programming a developer's guide to parallel computing with GPUs Shane Cook |
title_fullStr | CUDA programming a developer's guide to parallel computing with GPUs Shane Cook |
title_full_unstemmed | CUDA programming a developer's guide to parallel computing with GPUs Shane Cook |
title_short | CUDA programming |
title_sort | cuda programming a developer s guide to parallel computing with gpus |
title_sub | a developer's guide to parallel computing with GPUs |
topic | CUDA Informatik (DE-588)7719528-0 gnd |
topic_facet | CUDA Informatik |
url | http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&local_base=BVB01&doc_number=025494052&sequence=000002&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA |
work_keys_str_mv | AT cookshane cudaprogrammingadevelopersguidetoparallelcomputingwithgpus |