Introduction to LayoutTensor
Letās take a quick break from solving puzzles to preview a powerful abstraction that will make our GPU programming journey more enjoyable: š„ ⦠the LayoutTensor.
š” This is a motivational overview of LayoutTensorās capabilities. Donāt worry about understanding everything now - weāll explore each feature in depth as we progress through the puzzles.
The challenge: Growing complexity
Letās look at the challenges weāve faced so far:
# Puzzle 1: Simple indexing
output[i] = a[i] + 10.0
# Puzzle 2: Multiple array management
output[i] = a[i] + b[i]
# Puzzle 3: Bounds checking
if i < size:
output[i] = a[i] + 10.0
As dimensions grow, code becomes more complex:
# Traditional 2D indexing for row-major 2D matrix
idx = row * WIDTH + col
if row < height and col < width:
output[idx] = a[idx] + 10.0
The solution: A peek at LayoutTensor
LayoutTensor will help us tackle these challenges with elegant solutions. Hereās a glimpse of whatās coming:
- Natural Indexing: Use
tensor[i, j]instead of manual offset calculations - Flexible Memory Layouts: Support for row-major, column-major, and tiled organizations
- Performance Optimization: Efficient memory access patterns for GPU
A taste of whatās ahead
Letās look at a few examples of what LayoutTensor can do. Donāt worry about understanding all the details now - weāll cover each feature thoroughly in upcoming puzzles.
Basic usage example
from layout import Layout, LayoutTensor
# Define layout
alias HEIGHT = 2
alias WIDTH = 3
alias layout = Layout.row_major(HEIGHT, WIDTH)
# Create tensor
tensor = LayoutTensor[dtype, layout](buffer.unsafe_ptr())
# Access elements naturally
tensor[0, 0] = 1.0 # First element
tensor[1, 2] = 2.0 # Last element
To learn more about Layout and LayoutTensor, see these guides from the Mojo manual
Quick example
Letās put everything together with a simple example that demonstrates the basics of LayoutTensor:
from gpu.host import DeviceContext
from layout import Layout, LayoutTensor
alias HEIGHT = 2
alias WIDTH = 3
alias dtype = DType.float32
alias layout = Layout.row_major(HEIGHT, WIDTH)
fn kernel[
dtype: DType, layout: Layout
](tensor: LayoutTensor[dtype, layout, MutAnyOrigin]):
print("Before:")
print(tensor)
tensor[0, 0] += 1
print("After:")
print(tensor)
def main():
ctx = DeviceContext()
a = ctx.enqueue_create_buffer[dtype](HEIGHT * WIDTH)
a.enqueue_fill(0)
tensor = LayoutTensor[dtype, layout, MutAnyOrigin](a)
# Note: since `tensor` is a device tensor we can't print it without the kernel wrapper
ctx.enqueue_function_checked[kernel[dtype, layout], kernel[dtype, layout]](
tensor, grid_dim=1, block_dim=1
)
ctx.synchronize()
When we run this code with:
pixi run layout_tensor_intro
pixi run -e amd layout_tensor_intro
pixi run -e apple layout_tensor_intro
uv run poe layout_tensor_intro
Before:
0.0 0.0 0.0
0.0 0.0 0.0
After:
1.0 0.0 0.0
0.0 0.0 0.0
Letās break down whatās happening:
- We create a
2 x 3tensor with row-major layout - Initially, all elements are zero
- Using natural indexing, we modify a single element
- The change is reflected in our output
This simple example demonstrates key LayoutTensor benefits:
- Clean syntax for tensor creation and access
- Automatic memory layout handling
- Natural multi-dimensional indexing
While this example is straightforward, the same patterns will scale to complex GPU operations in upcoming puzzles. Youāll see how these basic concepts extend to:
- Multi-threaded GPU operations
- Shared memory optimizations
- Complex tiling strategies
- Hardware-accelerated computations
Ready to start your GPU programming journey with LayoutTensor? Letās dive into the puzzles!
š” Tip: Keep this example in mind as we progress - weāll build upon these fundamental concepts to create increasingly sophisticated GPU programs.