图书介绍

并行程序设计原理(英文版)【2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载】

并行程序设计原理(英文版)
  • 出版社:
  • ISBN:
  • 出版时间:未知
  • 标注页数:0页
  • 文件大小:36MB
  • 文件页数:357页
  • 主题词:

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

并行程序设计原理(英文版)PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

PART 1 Foundations1

Chapter 1 Introduction1

The Power and Potential of Parallelism2

Parallelism,a Familiar Concept2

Parallelism in Computer Programs3

Multi-Core Computers,an Opportunity4

Even More Opportunities to Use Parallel Hardware5

Parallel Computing versus Distributed Computing6

System Level Parallelism6

Convenience of Parallel Abstractions8

Examining Sequential and Parallel Programs8

Parallelizing Compilers8

A Paradigm Shift9

Parallel Prefix Sum13

Parallelism Using Multiple Instruction Streams15

The Concept of a Thread15

A Multithreaded Solution to Counting 3s15

The Goals:Scalability and Performance Portability25

Scalability25

Performance Portability26

Principles First27

Chapter Summary27

Historical Perspective28

Exercises28

Chapter 2 Understanding Parallel Computers30

Balancing Machine Specifics with Portability30

A Look at Six Parallel Computers31

Chip Multiprocessors31

Symmetric Multiprocessor Architectures34

Heterogeneous Chip Designs36

Clusters39

Supercomputers40

Observations from Our Six Parallel Computers43

An Abstraction of a Sequential Computer44

Applying the RAM Model44

Evaluating the RAM Model45

The PRAM:A Parallel Computer Model46

The CTA:A Practical Parallel Computer Model47

The CTA Model47

Communication Latency49

Properties of the CTA52

Memory Reference Mechanisms53

Shared Memory53

One-Sided Communication54

Message Passing54

Memory Consistency Models55

Programming Models56

A Closer Look at Communication57

Applying the CTA Model58

Chapter Summary59

Historical Perspective59

Exercises59

Chapter 3 Reasoning about Performance61

Motivation and Basic Concepts61

Parallelism versus Performance61

Threads and Processes62

Latency and Throughput62

Sources of Performance Loss64

Overhead64

Non-Parallelizable Code65

Contention67

Idle Time67

Parallel Structure68

Dependences68

Dependences Limit Parallelism70

Granularity72

Locality73

Performance Trade-Offs73

Communication versus Computation74

Memory versus Parallelism75

Overhead versus Parallelism75

Measuring Performance77

Execution Time77

Speedup78

Superlinear Speedup78

Efficiency79

Concerns with Speedup79

Scaled Speedup versus Fixed-Size Speedup81

Scalable Performance81

Scalable Performance Is Difficult to Achieve81

Implications for Hardware82

Implications for Software83

Scaling the Problem Size83

Chapter Summary84

Historical Perspective84

Exercises85

PART 2 Parallel Abstractions87

Chapter 4 First Steps Toward Parallel Programming88

Data and Task Parallelism88

Definitions88

Illustrating Data and Task Parallelism89

The Peril-L Notation89

Extending C90

Parallel Threads90

Synchronization and Coordination91

Memory Model92

Synchronized Memory94

Reduce and Scan95

The Reduce Abstraction96

Count 3s Example97

Formulating Parallelism97

Fixed Parallelism97

Unlimited Parallelism98

Scalable Parallelism99

Alphabetizing Example100

Unlimited Parallelism101

Fixed Parallelism102

Scalable Parallelism104

Comparing the Three Solutions109

Chapter Summary110

Historical Perspective110

Exercises110

Chapter 5 Scalable Algorithmic Techniques112

Blocks of Independent Computation112

Schwartz'Algorithm113

The Reduce and Scan Abstractions115

Example of Generalized Reduces and Scans116

The Basic Structure118

Structure for Generalized Reduce119

Example of Components of a Generalized Scan122

Applying the Generalized Scan124

Generalized Vector Operations125

Assigning Work to Processes Statically125

Block Allocations126

Overlap Regions128

Cyclic and Block Cyclic Allocations129

Irregular Allocations132

Assigning Work to Processes Dynamically134

Work Queues134

Variations of Work Queues137

Case Study:Concurrent Memory Allocation137

Trees139

Allocation by Sub-Tree139

Dynamic Allocations140

Chapter Summary141

Historical Perspective142

Exercises142

PART 3 Parallel Programming Languages143

Chapter 6 Programming with Threads145

POSIX Threads145

Thread Creation and Destruction146

Mutual Exclusion150

Synchronization153

Safety Issues163

Performance Issues167

Case Study:Successive Over-Relaxation174

Case Study:Overlapping Synchronization with Computation179

Case Study:Streaming Computations on a Multi-Core Chip187

Java Threads187

Synchronized Methods189

Synchronized Statements189

The Count 3s Example190

Volatile Memory192

Atomic Objects192

Lock Objects193

Executors193

Concurrent Collections193

OpenMP193

The Count 3s Example194

Semantic Limitations on parallel for195

Reduction196

Thread Behavior and Interaction197

Sections199

Summary of OpenMP199

Chapter Summary200

Historical Perspective200

Exercises200

Chapter 7 MPI and Other Local View Languages202

MPI:The Message Passing Interface202

The Count 3s Example203

Groups and Communicators211

Point-to-Point Communication212

Collective Communication214

Example:Successive Over-Relaxation219

Performance Issues222

Safety Issues228

Partitioned Global Address Space Languages229

Co-Array Fortran230

Unified Parallel C231

Titanium232

Chapter Summary233

Historical Perspective234

Exercises234

Chapter 8 ZPL and Other Global View Languages236

The ZPL Programming Language236

Basic Concepts of ZPL237

Regions237

Array Computation240

Life,an Example242

The Problem242

The Solution242

How It Works243

The Philosophv of Life245

Distinguishing Features of ZPL245

Regions245

Statement-Level Indexing245

Restrictions Imposed by Regions246

Performance Model246

Addition by Subtraction247

Manipulating Arrays of Different Ranks247

Partial Reduce248

Flooding249

The Flooding Principle250

Data Manipulation,an Example251

Flood Regions252

Matrix Multiplication253

Reordering Data with Remap255

Index Arrays255

Remap256

Ordering Example258

Parallel Execution of ZPL Programs260

Role of the Compiler260

Specifying the Number of Processes261

Assigning Regions to Processes261

Array Allocation262

Scalar Allocation263

Work Assignment263

Performance Model264

Applying the Performance Model:Life265

Applying the Performance Model:SUMMA266

Summary of the Performance Model266

NESL Parallel Language267

Language Concepts267

Matrix Product Using Nested Parallelism268

NESL Complexity Model269

Chapter Summary269

Historical Perspective269

Exercises270

Chapter 9 Assessing the State of the Art271

Four Important Properties of Parallel Languages271

Correctness271

Performance273

Scalability274

Portability274

Evaluating Existing Approaches275

POSIX Threads275

Java Threads276

OpenMP276

MPI276

PGAS Languages277

ZPL278

NESL278

Lessons for the Future279

Hidden Parallelism279

Transparent Performance280

Locality280

Constrained Parallelism280

Implicit versus Explicit Parallelism281

Chapter Summary282

Historical Perspective282

Exercises282

PART 4 Looking Forward283

Chapter 10 Future Directions in Parallel Programming284

Attached Processors284

Graphics Processing Units285

Cell Processors288

Attached Processors Summary288

Grid Computing290

Transactional Memory291

Comparison with Locks292

Implementation Issues293

Open Research Issues295

MapReduce296

Problem Space Promotion298

Emerging Languages299

Chapel300

Fortress300

X10302

Chapter Summary304

Historical Perspective304

Exercises304

Chapter 11 Writing Parallel Programs305

Getting Started305

Access and Software305

Hello,World306

Parallel Programming Recommendations307

Incremental Development307

Focus on the Parallel Structure307

Testing the Parallel Structure308

Sequential Programming309

Be Willing to Write Extra Code309

Controlling Parameters during Testing310

Functional Debugging310

Capstone Project Ideas311

Implementing Existing Parallel Algorithms311

Competing with Standard Benchmarks312

Developing New Parallel Computations313

Performance Measurement314

Comparing against a Sequential Solution315

Maintaining a Fair Experimental Setting315

Understanding Parallel Performance316

Performance Analysis317

Experimental Methodology318

Portability and Tuning319

Chapter Summary319

Historical Perspective319

Exercises320

Glossary321

References325

Index328

热门推荐