LayoutTensor Version
Overview
Implement a kernel that adds 10 to each position of 2D LayoutTensor a
and stores it in 2D LayoutTensor out
.
Note: You have more threads than positions.
Key concepts
In this puzzle, you’ll learn about:
- Using
LayoutTensor
for 2D array access - Direct 2D indexing with
tensor[i, j]
- Handling bounds checking with
LayoutTensor
The key insight is that LayoutTensor
provides a natural 2D indexing interface, abstracting away the underlying memory layout while still requiring bounds checking.
- 2D access: Natural \((i,j)\) indexing with
LayoutTensor
- Memory abstraction: No manual row-major calculation needed
- Guard condition: Still need bounds checking in both dimensions
- Thread bounds: More threads \((3 \times 3)\) than tensor elements \((2 \times 2)\)
Code to complete
alias SIZE = 2
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = (3, 3)
alias dtype = DType.float32
alias layout = Layout.row_major(SIZE, SIZE)
fn add_10_2d(
out: LayoutTensor[mut=True, dtype, layout],
a: LayoutTensor[mut=True, dtype, layout],
size: Int,
):
local_i = thread_idx.x
local_j = thread_idx.y
# FILL ME IN (roughly 2 lines)
View full file: problems/p04/p04_layout_tensor.mojo
Tips
- Get 2D indices:
local_i = thread_idx.x
,local_j = thread_idx.y
- Add guard:
if local_i < size and local_j < size
- Inside guard:
out[local_i, local_j] = a[local_i, local_j] + 10.0
Running the code
To test your solution, run the following command in your terminal:
magic run p04_layout_tensor
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
Solution
fn add_10_2d(
out: LayoutTensor[mut=True, dtype, layout],
a: LayoutTensor[mut=True, dtype, layout],
size: Int,
):
local_i = thread_idx.x
local_j = thread_idx.y
if local_i < size and local_j < size:
out[local_i, local_j] = a[local_i, local_j] + 10.0
This solution:
- Gets 2D thread indices with
local_i = thread_idx.x
,local_j = thread_idx.y
- Guards against out-of-bounds with
if local_i < size and local_j < size
- Uses
LayoutTensor
’s 2D indexing:out[local_i, local_j] = a[local_i, local_j] + 10.0