Jul 9, 2024
vecAdd
function adds two vectors and stores the result in a third arraycuda.h
and cuda_runtime.h
cudaMalloc
for memory on GPUcudaMemcpy
from host (CPU) to device (GPU)cudaGetLastError
to check for errorscudaMemcpy
from device to hostcudaFree
to free device memorycudaMalloc
allocates memory in GPU
void **)
) requiredcudaSuccess
cudaMemcpy
transfers data between host and device
cudaMemcpyHostToDevice
, cudaMemcpyDeviceToHost
, cudaMemcpyDeviceToDevice
kernel<<<blocksPerGrid, threadsPerBlock>>>(args)
blocksPerGrid
: Number of blocks in the gridthreadsPerBlock
: Number of threads in each block (max 1024)n
threads: n/256
blocks, 256 threads per blockvectorAdd
FunctionC[i] = A[i] + B[i]
cudaGetLastError