Why Consider LayoutTensor?

Looking at our traditional implementation above, you might notice some potential issues:

Current approach

local_i = thread_idx.x
out[local_i] = a[local_i] + 10.0

This works for 1D arrays, but what happens when we need to:

  • Handle 2D or 3D data?
  • Deal with different memory layouts?
  • Ensure coalesced memory access?

Preview of future challenges

As we progress through the puzzles, array indexing will become more complex:

# 2D indexing coming in later puzzles
idx = row * WIDTH + col

# 3D indexing
idx = (batch * HEIGHT + row) * WIDTH + col

# With padding
idx = (batch * padded_height + row) * padded_width + col

LayoutTensor preview

LayoutTensor will help us handle these cases more elegantly:

# Future preview - don't worry about this syntax yet!
out[i, j] = a[i, j] + 10.0  # 2D indexing
out[b, i, j] = a[b, i, j] + 10.0  # 3D indexing

We’ll learn about LayoutTensor in detail in Puzzle 4, where these concepts become essential. For now, focus on understanding:

  • Basic thread indexing
  • Simple memory access patterns
  • One-to-one mapping of threads to data

💡 Key Takeaway: While direct indexing works for simple cases, we’ll soon need more sophisticated tools for complex GPU programming patterns.