| 24小時(shí)熱門(mén)版塊排行榜 |
| 13 | 1/1 | 返回列表 |
| 查看: 1251 | 回復(fù): 12 | ||||||||
| 【獎(jiǎng)勵(lì)】 本帖被評(píng)價(jià)11次,作者pkusiyuan增加金幣 8.6 個(gè) | ||||||||
pkusiyuan銀蟲(chóng) (正式寫(xiě)手)
|
[資源]
2010Programming.Massively.Parallel.Processors
|
|||||||
|
Contents Preface ......................................................................................................................xi Acknowledgments ................................................................................................ xvii Dedication...............................................................................................................xix CHAPTER 1 INTRODUCTION................................................................................1 1.1 GPUs as Parallel Computers ..........................................................2 1.2 Architecture of a Modern GPU......................................................8 1.3 Why More Speed or Parallelism? ................................................10 1.4 Parallel Programming Languages and Models............................13 1.5 Overarching Goals ........................................................................15 1.6 Organization of the Book.............................................................16 CHAPTER 2 HISTORY OF GPU COMPUTING .....................................................21 2.1 Evolution of Graphics Pipelines ..................................................21 2.1.1 The Era of Fixed-Function Graphics Pipelines..................22 2.1.2 Evolution of Programmable Real-Time Graphics .............26 2.1.3 Unified Graphics and Computing Processors ....................29 2.1.4 GPGPU: An Intermediate Step...........................................31 2.2 GPU Computing ...........................................................................32 2.2.1 Scalable GPUs.....................................................................33 2.2.2 Recent Developments..........................................................34 2.3 Future Trends................................................................................34 CHAPTER 3 INTRODUCTION TO CUDA..............................................................39 3.1 Data Parallelism............................................................................39 3.2 CUDA Program Structure ............................................................41 3.3 A Matrix–Matrix Multiplication Example...................................42 3.4 Device Memories and Data Transfer...........................................46 3.5 Kernel Functions and Threading..................................................51 3.6 Summary.......................................................................................56 3.6.1 Function declarations ..........................................................56 3.6.2 Kernel launch ......................................................................56 3.6.3 Predefined variables ............................................................56 3.6.4 Runtime API........................................................................57 CHAPTER 4 CUDA THREADS.............................................................................59 4.1 CUDA Thread Organization ........................................................59 4.2 Using blockIdx and threadIdx ..........................................64 4.3 Synchronization and Transparent Scalability ..............................68 vii 4.4 Thread Assignment.......................................................................70 4.5 Thread Scheduling and Latency Tolerance .................................71 4.6 Summary .......................................................................................74 4.7 Exercises .......................................................................................74 CHAPTER 5 CUDA MEMORIES.......................................................................77 5.1 Importance of Memory Access Efficiency..................................78 5.2 CUDA Device Memory Types ....................................................79 5.3 A Strategy for Reducing Global Memory Traffic.......................83 5.4 Memory as a Limiting Factor to Parallelism ..............................90 5.5 Summary .......................................................................................92 5.6 Exercises .......................................................................................93 CHAPTER 6 PERFORMANCE CONSIDERATIONS................................................95 6.1 More on Thread Execution ..........................................................96 6.2 Global Memory Bandwidth........................................................103 6.3 Dynamic Partitioning of SM Resources ....................................111 6.4 Data Prefetching .........................................................................113 6.5 Instruction Mix ...........................................................................115 6.6 Thread Granularity .....................................................................116 6.7 Measured Performance and Summary .......................................118 6.8 Exercises .....................................................................................120 CHAPTER 7 FLOATING POINT CONSIDERATIONS ...........................................125 7.1 Floating-Point Format.................................................................126 7.1.1 Normalized Representation of M.....................................126 7.1.2 Excess Encoding of E.......................................................127 7.2 Representable Numbers ..............................................................129 7.3 Special Bit Patterns and Precision.............................................134 7.4 Arithmetic Accuracy and Rounding ..........................................135 7.5 Algorithm Considerations...........................................................136 7.6 Summary .....................................................................................138 7.7 Exercises .....................................................................................138 CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION.......................................................................141 8.1 Application Background.............................................................142 8.2 Iterative Reconstruction..............................................................144 8.3 Computing FHd...........................................................................148 Step 1. Determine the Kernel Parallelism Structure .................149 Step 2. Getting Around the Memory Bandwidth Limitation....156 viii Contents Step 3. Using Hardware Trigonometry Functions ....................163 Step 4. Experimental Performance Tuning ...............................166 8.4 Final Evaluation..........................................................................167 8.5 Exercises .....................................................................................170 CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS............................................................................173 9.1 Application Background.............................................................174 9.2 A Simple Kernel Implementation ..............................................176 9.3 Instruction Execution Efficiency................................................180 9.4 Memory Coalescing....................................................................182 9.5 Additional Performance Comparisons .......................................185 9.6 Using Multiple GPUs .................................................................187 9.7 Exercises .....................................................................................188 CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING ....................................................................................191 10.1 Goals of Parallel Programming ...............................................192 10.2 Problem Decomposition ...........................................................193 10.3 Algorithm Selection .................................................................196 10.4 Computational Thinking...........................................................202 10.5 Exercises ...................................................................................204 CHAPTER 11 A BRIEF INTRODUCTION TO OPENCL ......................................205 11.1 Background...............................................................................205 11.2 Data Parallelism Model............................................................207 11.3 Device Architecture..................................................................209 11.4 Kernel Functions ......................................................................211 11.5 Device Management and Kernel Launch ................................212 11.6 Electrostatic Potential Map in OpenCL ..................................214 11.7 Summary...................................................................................219 11.8 Exercises ...................................................................................220 CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK ........................................221 12.1 Goals Revisited.........................................................................221 12.2 Memory Architecture Evolution ..............................................223 12.2.1 Large Virtual and Physical Address Spaces ................223 12.2.2 Unified Device Memory Space ....................................224 12.2.3 Configurable Caching and Scratch Pad........................225 12.2.4 Enhanced Atomic Operations .......................................226 12.2.5 Enhanced Global Memory Access ...............................226 Contents ix 12.3 Kernel Execution Control Evolution .......................................227 12.3.1 Function Calls within Kernel Functions ......................227 12.3.2 Exception Handling in Kernel Functions.....................227 12.3.3 Simultaneous Execution of Multiple Kernels ..............228 12.3.4 Interruptible Kernels .....................................................228 12.4 Core Performance.....................................................................229 12.4.1 Double-Precision Speed ...............................................229 12.4.2 Better Control Flow Efficiency ....................................229 12.5 Programming Environment ......................................................230 12.6 A Bright Outlook......................................................................230 APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE .............................................................................233 A.1 matrixmul.cu........................................................................233 A.2 matrixmul_gold.cpp .........................................................237 A.3 matrixmul.h..........................................................................238 A.4 assist.h .................................................................................239 A.5 Expected Output .........................................................................243 APPENDIX B GPU COMPUTE CAPABILITIES ....................................................245 B.1 GPU Compute Capability Tables...............................................245 B.2 Memory Coalescing Variations..................................................246 Index......................................................................................................... 251 |
Algorithm | love physics | 電子書(shū)資料 | CUDA |
科研軟件 |
銀蟲(chóng) (小有名氣)
| 13 | 1/1 | 返回列表 |
| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 317分 一志愿南理工材料工程 本科湖工大 求調(diào)劑 +10 | 芋泥小鈴鐺 2026-03-28 | 10/500 |
|
|---|---|---|---|---|
|
[考研] 277跪求調(diào)劑 +6 | 1915668 2026-03-27 | 10/500 |
|
|
[考研] 070300化學(xué)354求調(diào)劑 +6 | 101次希望 2026-03-28 | 6/300 |
|
|
[考研] 279求調(diào)劑 +4 | 蝶舞輕繞 2026-03-29 | 4/200 |
|
|
[考研] 調(diào)劑考研 +3 | 王杰一 2026-03-29 | 3/150 |
|
|
[考研] 289求調(diào)劑 +13 | 新時(shí)代材料 2026-03-27 | 13/650 |
|
|
[考研] 346求調(diào)劑 一志愿070303有機(jī)化學(xué) +3 | 蘿卜燉青菜 2026-03-28 | 3/150 |
|
|
[考研] 0703本科鄭州大學(xué)求調(diào)劑 +3 | nhj_ 2026-03-25 | 3/150 |
|
|
[考研] 調(diào)劑 +3 | 好好讀書(shū)。 2026-03-28 | 3/150 |
|
|
[考研] 材料求調(diào)劑一志愿哈工大324 +7 | 閆旭東 2026-03-28 | 9/450 |
|
|
[考研] 315分求調(diào)劑 +7 | 26考研上岸版26 2026-03-26 | 7/350 |
|
|
[考研] 一志愿211院校 344分 東北農(nóng)業(yè)大學(xué)生物學(xué)學(xué)碩,求調(diào)劑 +5 | 丶風(fēng)雪夜歸人丶 2026-03-26 | 8/400 |
|
|
[考研] 08開(kāi)頭275求調(diào)劑 +4 | 拉誰(shuí)不重要 2026-03-26 | 4/200 |
|
|
[考研] 322求調(diào)劑 +4 | 我真的很想學(xué)習(xí) 2026-03-23 | 4/200 |
|
|
[考研] 315調(diào)劑 +4 | 0860求調(diào)劑 2026-03-26 | 5/250 |
|
|
[考研] 一志愿吉大071010,316分求調(diào)劑 +3 | xgbiknn 2026-03-27 | 3/150 |
|
|
[考研] 341求調(diào)劑 +7 | 青檸檬1 2026-03-26 | 7/350 |
|
|
[考研] 求調(diào)劑 +3 | 李李不服輸 2026-03-25 | 3/150 |
|
|
[考研] 一志愿武理085500機(jī)械專(zhuān)業(yè)總分300求調(diào)劑 +3 | an10101 2026-03-24 | 7/350 |
|
|
[考研] 一志愿北化315 求調(diào)劑 +3 | akrrain 2026-03-24 | 3/150 |
|