I realize a more appropriate forum for my question might be the devtalk.nvidia forums, but linux questions seem to get more support here on the khronos forums.
I have followed vulkan-tutorial.com all the way up to Rendering and presentation. The equivalent to my code would be:
https://vulkan-tutorial.com/code/hello_triangle.cpp
My code is available at GitHub - davidhubbard/v0lum3: GPL3 Vulkan voxel library. It should compile on Windows but I have not tested it – to build on Windows, I have not yet created a project file or linked the external dependencies. To build on Linux, clone the repo and run “build.sh” then “make”. “build.sh” will ask you to type “export PKG_CONFIG_PATH=$PWD/vendor/lib/pkgconfig” in your shell.
The issue I am seeing is that if the main loop runs for 665 iterations, vkDeviceWaitIdle hangs for a while but exits cleanly. (I bisected the number of iterations by changing the test if (count > 1000) break; in main.cpp.)
If the main loop runs for 666 iterations, vkDeviceWaitIdle fails with VK_ERROR_DEVICE_LOST.
Enabling VK_INSTANCE_LAYERS = VK_LAYER_LUNARG_standard_validation does not reveal any useful information about why the nvidia driver is pausing for so long. The following output is easy to connect to the code in main.cpp:
vkQueuePresentKHR done, end of loop 665
vkAcquireNextImageKHR
vkQueueSubmit
D MEM: code0: Details of Memory Object list (of size 0 elements)
D MEM: code0: =============================
D MEM: code0: Details of CB list (of size 3 elements)
D MEM: code0: ==================
D MEM: code0: CB Info (0x0x2581dd0) has CB 0x0x2588720
D MEM: code0: CB Info (0x0x257ecf0) has CB 0x0x2587900
D MEM: code0: CB Info (0x0x2592be0) has CB 0x0x257f5d0
vkQueuePresentKHR
vkQueuePresentKHR done, end of loop 666
vkDeviceWaitIdle
W ParameterValidation: code9: vkDeviceWaitIdle: returned VK_ERROR_DEVICE_LOST, indicating that the logical device has been lost
vkDeviceWaitIdle returned -4
This is my first attempt at a Vulkan project and I am trying to stick close to vulkan-tutorial.com, but I have wrapped many of the function calls in a class hierarchy. I suspect the issue to be:
-
Maybe I have an incorrect parameter during all the function calls to init everything. I have double- and triple-checked that all the values are right, and can’t find where I am passing the wrong parameters. standard_validation of course reports no errors. I have also tested with valgrind for any memory errors, and don’t think I have any.
-
Maybe I am missing a necessary function call or sequencing. It seems I am leaking a resource within the graphics driver and this results in the driver “crashing” after the 666th iteration. Googling VK_ERROR_DEVICE_LOST shows that this error generally means the driver crashed and was restarted. I also notice that when I run the app (and the X server freezes up for a few seconds) I see this in the /var/log/Xorg.0.log, which seems to mean the GPU was reset and reinitialized:
[709440.056] (--) NVIDIA(GPU-0): CRT-0: disconnected
[709440.056] (--) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
[709440.056] (--) NVIDIA(GPU-0):
[709440.059] (--) NVIDIA(GPU-0): DFP-0: disconnected
[709440.059] (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
[709440.059] (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
[709440.059] (--) NVIDIA(GPU-0):
[709440.059] (--) NVIDIA(GPU-0): DFP-1: disconnected
[709440.059] (--) NVIDIA(GPU-0): DFP-1: Internal TMDS
[709440.059] (--) NVIDIA(GPU-0): DFP-1: 165.0 MHz maximum pixel clock
[709440.059] (--) NVIDIA(GPU-0):
[709440.060] (--) NVIDIA(GPU-0): HP Z30i (DFP-2): connected
[709440.060] (--) NVIDIA(GPU-0): HP Z30i (DFP-2): Internal DisplayPort
[709440.060] (--) NVIDIA(GPU-0): HP Z30i (DFP-2): 960.0 MHz maximum pixel clock
[709440.060] (--) NVIDIA(GPU-0):
[709440.066] (--) NVIDIA(GPU-0): CRT-0: disconnected
[709440.066] (--) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
[709440.066] (--) NVIDIA(GPU-0):
[709440.069] (--) NVIDIA(GPU-0): DFP-0: disconnected
[709440.069] (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
[709440.069] (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
[709440.069] (--) NVIDIA(GPU-0):
[709440.069] (--) NVIDIA(GPU-0): DFP-1: disconnected
[709440.069] (--) NVIDIA(GPU-0): DFP-1: Internal TMDS
[709440.069] (--) NVIDIA(GPU-0): DFP-1: 165.0 MHz maximum pixel clock
[709440.069] (--) NVIDIA(GPU-0):
[709440.070] (--) NVIDIA(GPU-0): HP Z30i (DFP-2): connected
[709440.070] (--) NVIDIA(GPU-0): HP Z30i (DFP-2): Internal DisplayPort
[709440.070] (--) NVIDIA(GPU-0): HP Z30i (DFP-2): 960.0 MHz maximum pixel clock
[709440.070] (--) NVIDIA(GPU-0):
Here is the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K600 Off | 0000:04:00.0 On | N/A |
| 25% 47C P8 N/A / N/A | 187MiB / 979MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX TIT... Off | 0000:05:00.0 Off | N/A |
| 22% 34C P8 15W / 250W | 1MiB / 12206MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3817 G /usr/bin/X 96MiB |
| 0 5543 G ...iveTaskBlocking/Enabled/StrictSecureCooki 89MiB |
+-----------------------------------------------------------------------------+