Jul 9, 2024
vecAdd function adds two vectors and stores the result in a third arraycuda.h and cuda_runtime.hcudaMalloc for memory on GPUcudaMemcpy from host (CPU) to device (GPU)cudaGetLastError to check for errorscudaMemcpy from device to hostcudaFree to free device memorycudaMalloc allocates memory in GPU
void **)) requiredcudaSuccesscudaMemcpy transfers data between host and device
cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, cudaMemcpyDeviceToDevicekernel<<<blocksPerGrid, threadsPerBlock>>>(args)
blocksPerGrid: Number of blocks in the gridthreadsPerBlock: Number of threads in each block (max 1024)n threads: n/256 blocks, 256 threads per blockvectorAdd FunctionC[i] = A[i] + B[i]cudaGetLastError