Yahoo Search Búsqueda en la Web

Resultado de búsqueda

  1. 6 de may. de 2014 · Cliff Woolley is a senior developer technology engineer with NVIDIA. He received his master's degree in Computer Science from the University of Virginia in 2003, where he was among the earliest academic researchers to explore the use of GPUs for general purpose computing.

  2. View Cliff Woolleys profile on LinkedIn, a professional community of 1 billion members. Experience: NVIDIA · Location: San Jose, California, United States · 12 connections on LinkedIn.

    • 12
    • 25
    • NVIDIA
    • San Jose, California, United States
    • 1 Instruction-Level Parallelism
    • 2 Data-Level Parallelism
    • Example 35-1. The Standard Rasterization Pipeline as A Series of Nested Loops
    • 1 Precomputation of Loop Invariants
    • 2 Precomputation Using Lookup Tables
    • 3 Avoid Inner-Loop Branching
    • 4 The Swizzle Operator
    • Copyright

    While modern CPUs do have SIMD processing extensions such as MMX or SSE, most CPU programmers never attempt to use these capabilities themselves. Some count on the compiler to make use of SIMD extensions when possible; some ignore the extensions entirely. As a result, it's not uncommon to see new GPU programmers writing code that ineffectively util...

    Some problems are inherently scalar in nature and can be more effectively parallelized by operating on multiple data elements simultaneously. This data-level parallelism is particularly common in GPGPU applications, where textures and render targets are used to store large 2D arrays of scalar source data and output values. Packing the data more eff...

    For each operation we perform, we must be mindful of how computationally expensive that operation is and how frequently it is performed. In a normal CPU program, this is fairly straightforward. With an actual series of nested loops (as opposed to the merely conceptual nested loops seen here), it's easy to see that a given expression inside an inner...

    The first mistake a new GPU programmer is likely to make is to needlessly recompute values that vary linearly or are uniform across the geometric primitives inside a fragment program. Texture coordinates are a prime example. They vary linearly across the primitive being drawn, and the rasterizer interpolates them automatically. But when multiple re...

    In the more classic sense, "precomputation" means computation that is done offline in advance—the classic storage versus computation trade-off. This concept also maps readily onto GPUs: functions with a constant-size domain and range that are constant across runs of an algorithm—even if they vary in complex ways based on their input—can be precompu...

    In CPU programming, it is often desirable to avoid branching inside inner loops. This usually involves making several copies of the loop, with each copy acting on a subset of the data and following the execution path specific to that subset. This technique is sometimes called static branch resolution or substreaming. The same concept applies to GPU...

    An easily overlooked or underutilized feature of GPU programming is the swizzleoperator. Because all registers on the GPU are four-vectors but not all instructions take four-vectors as arguments, some mechanism for creating other-sized vectors out of these four-vector registers is necessary. The swizzle operator provides this functionality. It is s...

    Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the prep...

  3. About Cliff Woolley Cliff Woolley is a senior developer technology engineer with NVIDIA. He received his master's degree in Computer Science from the University of Virginia in 2003, where he was among the earliest academic researchers to explore the use of GPUs for general purpose computing.

  4. Cliff Woolley, Sr. Manager, Developer Technology Software, NVIDIA NCCL: ACCELERATED MULTI-GPU COLLECTIVE COMMUNICATIONS

    • 1MB
    • 56
  5. cuDNN: Efficient Primitives for Deep Learning. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran. NVIDIA. Santa Clara, CA 95050. fschetlur, jwoolley, philippev, jocohen, johntrang@nvidia.com.

  6. 3 de mar. de 2024 · Abstract. We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time.