Key concepts
In this puzzle, you’ll learn about:
-
Basic GPU kernel structure
-
Thread indexing with
thread_idx.x
-
Simple parallel operations
-
Parallelism: Each thread executes independently
-
Thread indexing: Access element at position
i = thread_idx.x
-
Memory access: Read from
a[i]
and write toout[i]
-
Data independence: Each output depends only on its corresponding input
Code to complete
alias SIZE = 4
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = SIZE
alias dtype = DType.float32
fn add_10(out: UnsafePointer[Scalar[dtype]], a: UnsafePointer[Scalar[dtype]]):
local_i = thread_idx.x
# FILL ME IN (roughly 1 line)
View full file: problems/p01/p01.mojo
Tips
- Store
thread_idx.x
inlocal_i
- Add 10 to
a[local_i]
- Store result in
out[local_i]
Running the code
To test your solution, run the following command in your terminal:
magic run p01
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([10.0, 11.0, 12.0, 13.0])
Solution
fn add_10(out: UnsafePointer[Scalar[dtype]], a: UnsafePointer[Scalar[dtype]]):
local_i = thread_idx.x
out[local_i] = a[local_i] + 10.0
This solution:
- Gets thread index with
local_i = thread_idx.x
- Adds 10 to input value:
out[local_i] = a[local_i] + 10.0