Key concepts
In this puzzle, you’ll learn about:
- Working with 2D block and thread arrangements
- Handling matrix data larger than block size
- Converting between 2D and linear memory access
The key insight is understanding how to coordinate multiple blocks of threads to process a 2D matrix that’s larger than a single block’s dimensions.
Configuration
- Matrix size: \(5 \times 5\) elements
- 2D blocks: Each block processes a \(3 \times 3\) region
- Grid layout: Blocks arranged in \(2 \times 2\) grid
- Total threads: \(36\) for \(25\) elements
- Memory pattern: Row-major storage for 2D data
- Coverage: Ensuring all matrix elements are processed
Code to complete
alias SIZE = 5
alias BLOCKS_PER_GRID = (2, 2)
alias THREADS_PER_BLOCK = (3, 3)
alias dtype = DType.float32
fn add_10_blocks_2d(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
size: Int,
):
global_i = block_dim.x * block_idx.x + thread_idx.x
global_j = block_dim.y * block_idx.y + thread_idx.y
# FILL ME IN (roughly 2 lines)
View full file: problems/p07/p07.mojo
Tips
- Calculate global indices:
global_i = block_dim.x * block_idx.x + thread_idx.x
- Add guard:
if global_i < size and global_j < size
- Inside guard:
out[global_j * size + global_i] = a[global_j * size + global_i] + 10.0
Running the code
To test your solution, run the following command in your terminal:
magic run p07
Your output will look like this if the puzzle isn’t solved yet:
out: HostBuffer([0.0, 0.0, 0.0, ... , 0.0])
expected: HostBuffer([11.0, 11.0, 11.0, ... , 11.0])
Solution
fn add_10_blocks_2d(
out: UnsafePointer[Scalar[dtype]],
a: UnsafePointer[Scalar[dtype]],
size: Int,
):
global_i = block_dim.x * block_idx.x + thread_idx.x
global_j = block_dim.y * block_idx.y + thread_idx.y
if global_i < size and global_j < size:
out[global_j * size + global_i] = a[global_j * size + global_i] + 10.0
This solution:
- Computes global indices with
block_dim * block_idx + thread_idx
- Guards against out-of-bounds with
if global_i < size and global_j < size
- Uses row-major indexing to access and update matrix elements