Puzzle 2: Zip
Overview
Implement a kernel that adds together each position of vector a
and vector b
and stores it in out
.
Note: You have 1 thread per position.
Key concepts
In this puzzle, you’ll learn about:
- Processing multiple input arrays in parallel
- Element-wise operations with multiple inputs
- Thread-to-data mapping across arrays
- Memory access patterns with multiple arrays
For each thread \(i\): \[\Large out[i] = a[i] + b[i]\]
Memory access pattern
Thread 0: a[0] + b[0] → out[0]
Thread 1: a[1] + b[1] → out[1]
Thread 2: a[2] + b[2] → out[2]
...
💡 Note: Notice how we’re now managing three arrays (a
, b
, out
) in our kernel. As we progress to more complex operations, managing multiple array accesses will become increasingly challenging.
Code to complete
alias SIZE = 4
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = SIZE
alias dtype = DType.float32
fn add(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
b: UnsafePointer[Scalar[dtype]],
):
local_i = thread_idx.x
# FILL ME IN (roughly 1 line)
View full file: problems/p02/p02.mojo
Tips
- Store
thread_idx.x
inlocal_i
- Add
a[local_i]
andb[local_i]
- Store result in
out[local_i]
Running the code
To test your solution, run the following command in your terminal:
magic run p02
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([0.0, 2.0, 4.0, 6.0])
Solution
fn add(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
b: UnsafePointer[Scalar[dtype]],
):
local_i = thread_idx.x
out[local_i] = a[local_i] + b[local_i]
This solution:
- Gets thread index with
local_i = thread_idx.x
- Adds values from both arrays:
out[local_i] = a[local_i] + b[local_i]
Looking ahead
While this direct indexing works for simple element-wise operations, consider:
- What if arrays have different layouts?
- What if we need to broadcast one array to another?
- How to ensure coalesced access across multiple arrays?
These questions will be addressed when we introduce LayoutTensor in Puzzle 4.