Key concepts
In this puzzle, you’ll learn about:
- Broadcasting 1D vectors across different dimensions
- Using 2D thread indices for broadcast operations
- Handling boundary conditions in broadcast patterns
The key insight is understanding how to map elements from two 1D vectors to create a 2D output matrix through broadcasting, while handling thread bounds correctly.
- Broadcasting: Each element of
a
combines with each element ofb
- Thread mapping: 2D thread grid \((3 \times 3)\) for \(2 \times 2\) output
- Vector access: Different access patterns for
a
andb
- Bounds checking: Guard against threads outside matrix dimensions
Code to complete
alias SIZE = 2
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = (3, 3)
alias dtype = DType.float32
fn broadcast_add(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
b: UnsafePointer[Scalar[dtype]],
size: Int,
):
local_i = thread_idx.x
local_j = thread_idx.y
# FILL ME IN (roughly 2 lines)
View full file: problems/p05/p05.mojo
Tips
- Get 2D indices:
local_i = thread_idx.x
,local_j = thread_idx.y
- Add guard:
if local_i < size and local_j < size
- Inside guard:
out[local_j * size + local_i] = a[local_i] + b[local_j]
Running the code
To test your solution, run the following command in your terminal:
magic run p05
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([0.0, 1.0, 1.0, 2.0])
Solution
fn broadcast_add(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
b: UnsafePointer[Scalar[dtype]],
size: Int,
):
local_i = thread_idx.x
local_j = thread_idx.y
if local_i < size and local_j < size:
out[local_j * size + local_i] = a[local_i] + b[local_j]
This solution:
- Gets 2D thread indices with
local_i = thread_idx.x
,local_j = thread_idx.y
- Guards against out-of-bounds with
if local_i < size and local_j < size
- Broadcasts by adding
a[local_i]
andb[local_j]
into the output matrix