COM4521/COM6521: Lab 4 FAQ

Lab 4 - FAQ

These are some common questions that come up during this lab.

Unsupported PTX Toolchain

For an unknown reason, the default CUDA project (within Visual Studio) is not currently compatible with the managed desktops.

This can be resolved by targeting CUDA compilation to the architecture of the managed desktops’ GPU.

Open: Project Properties -> CUDA C/C++ -> Device

Update Code Generation to compute_61,sm_61

If you are affected on your personal machine you may need to change this to a different value based on your GPU, refer to this week’s lecture notes (slide 44).

#Intellisense Errors (Red Lines Under Code) Visual Studio’s Intellisense (static analysis tool), which suggests when your code is incorrect prior to compilation, does not understand CUDA. Hence, it regularly complains about components of CUDA (for example the <<< notation that is used to launch kernels).

You should ignore Intellisense errors related to CUDA syntax, and only pay attention to compilation errors and warnings.

The “Error List” panel within Visual Studio has a drop down set to Build + Intellisense, this can be changed to Build to prevent Intellisense errors from being listed alongside compilation warnings and errors. It’s not possible to disable the red lines.

Exercise 2.3: Enable CUDA Memory Checker

The lab sheet was out of date, and has now been updated.

NSight debugger no longer has a built in memory checker, instead the standalone command line tool Compute Sanitizer (which installs alongside CUDA) must be used.

The assignment handout code comes with a .bat, that can be downloaded here, which helps with using Compute Sanitizer.

Call the below command, in a command window (outside of visual studio), specifying the path to your application and any required arguments.

cudamemchk.bat "<path to executable>" <args>

For example:

cudamemchk.bat "x64\Debug\Lab04_Exercise01.exe"

If you are working on a personal machine with a different version of CUDA you may need to update the path to Compute Sanitizer within cudamemchk.bat.

This should produce a log similar to that below, explaining the first memory error detected.

========= COMPUTE-SANITIZER
========= Invalid __global__ write of size 4 bytes
=========     at 0x3c8 in U:/com4521/Lab04_Exercise01/exercise01_sln.cu:42:affine_decrypt_multiblock(int*,int*)
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x4 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:GlobalInitializer [0x7ff8b9c080d4]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\compute-sanitizer\sanitizer-collection.dll
=========     Host Frame:GlobalInitializer [0x7ff8b9bc3fc7]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\compute-sanitizer\sanitizer-collection.dll
=========     Host Frame: [0x7ff8b9bac9b5]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\compute-sanitizer\sanitizer-collection.dll
=========     Host Frame: [0x7ff8b9bb39c8]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\compute-sanitizer\sanitizer-collection.dll
=========     Host Frame: [0x7ff8d6af79e0]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\compute-sanitizer\sanitizer-public.dll
=========     Host Frame:sanitizerUnsubscribe [0x7ff8d6afe1f1]
=========                in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\compute-sanitizer\sanitizer-public.dll
=========     Host Frame: [0x7ff8b6fccde5]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_bf0f2e72ecc4a6c6\nvcuda64.dll
=========     Host Frame:cuProfilerStop [0x7ff8b7162342]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_bf0f2e72ecc4a6c6\nvcuda64.dll
=========     Host Frame: [0x7ff8b6e982fd]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_bf0f2e72ecc4a6c6\nvcuda64.dll
=========     Host Frame: [0x7ff8b6e987ec]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_bf0f2e72ecc4a6c6\nvcuda64.dll
=========     Host Frame: [0x7ff8b6e98ac4]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_bf0f2e72ecc4a6c6\nvcuda64.dll
=========     Host Frame:cuLaunchKernel [0x7ff8b7060e14]
=========                in C:\Windows\system32\DriverStore\FileRepository\nv_dispui.inf_amd64_bf0f2e72ecc4a6c6\nvcuda64.dll
=========     Host Frame: [0x7ff8d6b8a5fa]
=========                in U:\com4521\x64\Debug\cudart64_110.dll
=========     Host Frame: [0x7ff8d6b8a4a6]
=========                in U:\com4521\x64\Debug\cudart64_110.dll
=========     Host Frame:cudaLaunchKernel [0x7ff8d6bad1c4]
=========                in U:\com4521\x64\Debug\cudart64_110.dll
=========     Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include\cuda_runtime.h:211:cudaLaunchKernel<char> [0x7ff769fe221f]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:C:\Users\ac1rch\AppData\Local\Temp\tmpxft_000040a0_00000000-7_exercise01_sln.cudafe1.stub.c:9:__device_stub__Z25affine_decrypt_multiblockPiS_ [0x7ff769fe200d]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:C:\Users\ac1rch\AppData\Local\Temp\tmpxft_000040a0_00000000-7_exercise01_sln.cudafe1.stub.c:12:affine_decrypt_multiblock [0x7ff769fe1992]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:U:\com4521\Lab04_Exercise01\exercise01_sln.cu:78:main [0x7ff769fe1af0]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:79:invoke_main [0x7ff769fe4879]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288:__scrt_common_main_seh [0x7ff769fe475e]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:331:__scrt_common_main [0x7ff769fe461e]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:17:mainCRTStartup [0x7ff769fe4909]
=========                in U:\com4521\x64\Debug\Lab04_Exercise01.exe
=========     Host Frame:BaseThreadInitThunk [0x7ff901ec7c24]
=========                in C:\Windows\System32\KERNEL32.DLL
=========     Host Frame:RtlUserThreadStart [0x7ff902eed721]
=========                in C:\Windows\SYSTEM32\ntdll.dll
========= 
========= Target application returned an error
========= ERROR SUMMARY: 1026 errors

Only the first few lines are really important

========= Invalid __global__ write of size 4 bytes

There was an invalid read from global device memory (something allocated with cudaMalloc() or at compile time using __device__).

=========     at 0x3c8 in U:/com4521/Lab04_Exercise01/exercise01_sln.cu:42:affine_decrypt_multiblock(int*,int*)

It occurred at line 42 of exercise01_sln.cu in the kernel/function with prototype affine_decrypt_multiblock(int*,int*)

=========     by thread (0,0,0) in block (0,0,0)

It first occurred in thread 0 of block 0. (Without requesting 1 error, this whole error stack would be repeated for every thread with an error, possibly thousands of times).

=========     Address 0x4 is out of bounds

The bad address in this case was 0x4 (e.g. I forced an error by writing int *test=0; test[1]=12;).

COM4521/COM6521: Lab 4 FAQ

Lab 4 - FAQ

Unsupported PTX Toolchain

Exercise 2.3: Enable CUDA Memory Checker

Contact Us