Puzzle 8: Shared Memory
Overview
Implement a kernel that adds 10 to each position of a
and stores it in out
.
Note: You have fewer threads per block than the size of a
.
Implementation approaches
🔰 Raw memory approach
Learn how to manually manage shared memory and synchronization.
📐 LayoutTensor Version
Use LayoutTensor’s built-in shared memory management features.
💡 Note: Experience how LayoutTensor simplifies shared memory operations while maintaining performance.