Puzzle 10: Dot Product

Overview

Implement a kernel that computes the dot-product of vector a and vector b and stores it in out.

Note: You have 1 thread per position. You only need 2 global reads and 1 global write per thread.

Dot product visualization