Puzzle 10: Dot Product
Overview
Implement a kernel that computes the dot-product of vector a
and vector b
and stores it in out
.
Note: You have 1 thread per position. You only need 2 global reads and 1 global write per thread.
Implement a kernel that computes the dot-product of vector a
and vector b
and stores it in out
.
Note: You have 1 thread per position. You only need 2 global reads and 1 global write per thread.