Key concepts

In this puzzle, you’ll learn about:

  • Broadcasting 1D vectors across different dimensions
  • Using 2D thread indices for broadcast operations
  • Handling boundary conditions in broadcast patterns

The key insight is understanding how to map elements from two 1D vectors to create a 2D output matrix through broadcasting, while handling thread bounds correctly.

  • Broadcasting: Each element of a combines with each element of b
  • Thread mapping: 2D thread grid \((3 \times 3)\) for \(2 \times 2\) output
  • Vector access: Different access patterns for a and b
  • Bounds checking: Guard against threads outside matrix dimensions

Code to complete

alias SIZE = 2
alias BLOCKS_PER_GRID = 1
alias THREADS_PER_BLOCK = (3, 3)
alias dtype = DType.float32


fn broadcast_add(
    out: UnsafePointer[Scalar[dtype]],
    a: UnsafePointer[Scalar[dtype]],
    b: UnsafePointer[Scalar[dtype]],
    size: Int,
):
    local_i = thread_idx.x
    local_j = thread_idx.y
    # FILL ME IN (roughly 2 lines)


View full file: problems/p05/p05.mojo

Tips
  1. Get 2D indices: local_i = thread_idx.x, local_j = thread_idx.y
  2. Add guard: if local_i < size and local_j < size
  3. Inside guard: out[local_j * size + local_i] = a[local_i] + b[local_j]

Running the code

To test your solution, run the following command in your terminal:

magic run p05

Your output will look like this if the puzzle isn’t solved yet:

out: HostBuffer([0.0, 0.0, 0.0, 0.0])
expected: HostBuffer([0.0, 1.0, 1.0, 2.0])

Solution

fn broadcast_add(
    out: UnsafePointer[Scalar[dtype]],
    a: UnsafePointer[Scalar[dtype]],
    b: UnsafePointer[Scalar[dtype]],
    size: Int,
):
    local_i = thread_idx.x
    local_j = thread_idx.y
    if local_i < size and local_j < size:
        out[local_j * size + local_i] = a[local_i] + b[local_j]


This solution:

  • Gets 2D thread indices with local_i = thread_idx.x, local_j = thread_idx.y
  • Guards against out-of-bounds with if local_i < size and local_j < size
  • Broadcasts by adding a[local_i] and b[local_j] into the output matrix