Introduction to TileTensor
Letās take a quick break from solving puzzles to preview a powerful abstraction that will make our GPU programming journey more enjoyable: š„⦠the TileTensor.
š” This is a motivational overview of TileTensorās capabilities. Donāt worry about understanding everything now - weāll explore each feature in depth as we progress through the puzzles.
The challenge: Growing complexity
Letās look at the challenges weāve faced so far:
# Puzzle 1: Simple indexing
output[i] = a[i] + 10.0
# Puzzle 2: Multiple array management
output[i] = a[i] + b[i]
# Puzzle 3: Bounds checking
if i < size:
output[i] = a[i] + 10.0
As dimensions grow, code becomes more complex:
# Traditional 2D indexing for row-major 2D matrix
idx = row * WIDTH + col
if row < height and col < width:
output[idx] = a[idx] + 10.0
The solution: A peek at TileTensor
TileTensor will help us tackle these challenges with elegant solutions. Hereās a glimpse of whatās coming:
- Natural Indexing: Use
tensor[i, j]instead of manual offset calculations - Flexible Memory Layouts: Support for row-major, column-major, and tiled organizations
- Performance Optimization: Efficient memory access patterns for GPU
A taste of whatās ahead
Letās look at a few examples of what TileTensor can do. Donāt worry about understanding all the details now - weāll cover each feature thoroughly in upcoming puzzles.
Basic usage example
from layout import TileTensor
from layout.tile_layout import row_major
# Define layout
comptime HEIGHT = 2
comptime WIDTH = 3
comptime layout = row_major[HEIGHT, WIDTH]()
comptime LayoutType = type_of(layout)
# Create tensor
tensor = TileTensor(buffer, layout)
# Access elements naturally
tensor[0, 0] = 1.0 # First element
tensor[1, 2] = 2.0 # Last element
To learn more about Layout and TileTensor, see these guides from the
Mojo manual
Quick example
Letās put everything together with a simple example that demonstrates the basics of TileTensor:
# ===----------------------------------------------------------------------=== #
#
# This file is Modular Inc proprietary.
#
# ===----------------------------------------------------------------------=== #
from std.gpu.host import DeviceContext
from layout import TileTensor
from layout.tile_layout import row_major
comptime HEIGHT = 2
comptime WIDTH = 3
comptime dtype = DType.float32
comptime layout = row_major[HEIGHT, WIDTH]()
comptime LayoutType = type_of(layout)
def kernel(
tensor: TileTensor[mut=True, dtype, LayoutType, MutAnyOrigin],
):
print("Before:")
print(tensor)
tensor[0, 0] += 1
print("After:")
print(tensor)
def main() raises:
ctx = DeviceContext()
a = ctx.enqueue_create_buffer[dtype](HEIGHT * WIDTH)
a.enqueue_fill(0)
tensor = TileTensor(a, layout)
# Note: since `tensor` is a device tensor we can't print it without the kernel wrapper
ctx.enqueue_function[kernel, kernel](tensor, grid_dim=1, block_dim=1)
ctx.synchronize()
When we run this code with:
pixi run tile_tensor_intro
pixi run -e amd tile_tensor_intro
pixi run -e apple tile_tensor_intro
uv run poe tile_tensor_intro
Before:
0.0 0.0 0.0
0.0 0.0 0.0
After:
1.0 0.0 0.0
0.0 0.0 0.0
Letās break down whatās happening:
- We create a
2 x 3tensor with row-major layout - Initially, all elements are zero
- Using natural indexing, we modify a single element
- The change is reflected in our output
This simple example demonstrates key TileTensor benefits:
- Clean syntax for tensor creation and access
- Automatic memory layout handling
- Natural multi-dimensional indexing
While this example is straightforward, the same patterns will scale to complex GPU operations in upcoming puzzles. Youāll see how these basic concepts extend to:
- Multi-threaded GPU operations
- Shared memory optimizations
- Complex tiling strategies
- Hardware-accelerated computations
Ready to start your GPU programming journey with TileTensor? Letās dive into the puzzles!
š” Tip: Keep this example in mind as we progress - weāll build upon these fundamental concepts to create increasingly sophisticated GPU programs.