Possible pImmutableSamplers Error(?)

Hi,
until now I was not using VkDescriptorSetLayoutBinding.pImmutableSamplers and always took the way via DescriptorSets to set them.

Now using VkDescriptorSetLayoutBinding.pImmutableSamplers with a single VkSampler pointer creates no problem at all, but I have a TextureCube[3] in one of my shaders and setting pImmutableSamplers to an array of VkSampler’s seems to create havok within Vulkan and crashes my program in total random locations every time I execute it.
I assume an buffer overflow or another pointer problem, but I cant find the source.
The pointer to the first VkSampler-Array element is valid at all times. I even added a NULL element at the end.

Is there an open, known bug in that direction?

Code:

class Vulkan::DescriptorSetLayout {
	/*-------------------------------------------------------------------------------------------------------------------------------------------------*/
	/*//////// Variables //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////*/
	/*-------------------------------------------------------------------------------------------------------------------------------------------------*/
private:
	VkDescriptorSetLayout _vkDescriptorSetLayout = VK_NULL_HANDLE;

	VkDescriptorSetLayoutCreateInfo vkDescriptorSetLayoutCreateInfo = {};
	std::vector<VkDescriptorSetLayoutBinding> vkDescriptorSetLayoutBindings;
        std::vector<VkSampler> _vkSamplers;

public:
	const std::shared_ptr<Vulkan::LogicalDevice> logicalDevice;
	const VkDescriptorSetLayout& vkDescriptorSetLayout = _vkDescriptorSetLayout;
        const std::vector<VkSampler>& vkSamplers = _vkSamplers;


	/*-------------------------------------------------------------------------------------------------------------------------------------------------*/
	/*//////// Functions //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////*/
	/*-------------------------------------------------------------------------------------------------------------------------------------------------*/
public:
	DescriptorSetLayout(const std::shared_ptr<Vulkan::LogicalDevice>& logicaldevice, VkDescriptorSetLayoutCreateFlags flags = 0) : logicalDevice(logicaldevice) {
		vkDescriptorSetLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
		vkDescriptorSetLayoutCreateInfo.flags = flags;
	}

	~DescriptorSetLayout() {
		if (vkDescriptorSetLayout != VK_NULL_HANDLE) vkDestroyDescriptorSetLayout(logicalDevice->vkLogicalDevice, vkDescriptorSetLayout, NULL);
	}


	/* Controller */
	/*-------------------------------------------------------------------------------------------------------------------------------------------------*/
public:
	void Initialize() {
		if (vkDescriptorSetLayout != VK_NULL_HANDLE) vkDestroyDescriptorSetLayout(logicalDevice->vkLogicalDevice, vkDescriptorSetLayout, NULL);
		if (vkCreateDescriptorSetLayout(logicalDevice->vkLogicalDevice, &vkDescriptorSetLayoutCreateInfo, NULL, &_vkDescriptorSetLayout) != VK_SUCCESS)
			throw "Vulkan::DescriptorSetLayout.Initialize():
 Failed to create Vulkan descriptor Set layout.

";
	}

        void SetImmutableSamplers(const std::vector<VkSampler>& vksamplers) { _vkSamplers = vksamplers; }

	void SetBindings(const std::vector<VkDescriptorSetLayoutBinding> vkdescriptorsetlayoutbinding) {
		vkDescriptorSetLayoutBindings = vkdescriptorsetlayoutbinding;
		vkDescriptorSetLayoutCreateInfo.bindingCount = (uint32)vkDescriptorSetLayoutBindings.size();
		vkDescriptorSetLayoutCreateInfo.pBindings = vkDescriptorSetLayoutBindings.data();
	}
};
    static std::shared_ptr<Utilities::Vulkan::DescriptorSetLayout> CreateDescriptorSetLayout(const std::shared_ptr<Utilities::Vulkan::LogicalDevice>& logicaldevice,
        const std::shared_ptr<Utilities::Vulkan::Sampler>& sampler)
    {
        std::shared_ptr<Utilities::Vulkan::DescriptorSetLayout> descriptorsetlayout(new Utilities::Vulkan::DescriptorSetLayout(logicaldevice));
        descriptorsetlayout->SetImmutableSamplers({ sampler->vkSampler, sampler->vkSampler, sampler->vkSampler, NULL });
	descriptorsetlayout->SetBindings({
		CreateTextureDescriptorSetLayoutBinding(BindingIndexes::uAlbedoTexture, &sampler->vkSampler),
		CreateTextureDescriptorSetLayoutBinding(BindingIndexes::uSpecularEmissionTexture, &sampler->vkSampler),
		CreateTextureDescriptorSetLayoutBinding(BindingIndexes::uNormalGlossTexture, &sampler->vkSampler),
		CreateTextureDescriptorSetLayoutBinding(BindingIndexes::uDiffuseEnvironmentTextureCube, descriptorsetlayout->vkSamplers.data(), MAX_ENVIRONMENTS),//descriptorsetlayout->vkSamplers.data()
		CreateTextureDescriptorSetLayoutBinding(BindingIndexes::uSpecularPrefilteredEnvironmentTextureCube, NULL, MAX_ENVIRONMENTS)
	});
	descriptorsetlayout->Initialize();
	return descriptorsetlayout;
    }

private:
    static VkDescriptorSetLayoutBinding CreateTextureDescriptorSetLayoutBinding(uint32 bindingindex, const VkSampler* samplers, uint32 descriptorcount = 1) {
        VkDescriptorSetLayoutBinding result = {};
        result.binding = bindingindex;
        result.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
        result.descriptorCount = descriptorcount;
        result.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
        result.pImmutableSamplers = samplers;
        return result;
    }

sampler as well as descriptorsetlayout are saved in a higher context and are valid at all times after their creation.

Do you have any idea?

You are not allowed to use NULL as an element of pImmutableSamplers.

What is MAX_ENVIRONMENTS?

Yes, the NULL was just an experiment since MAX_ENVIRONMENTS is just 3 and NULL is the 4th element.
It doesnt make any difference.

In the last months I upgraded the LunarG Vulkan SDK twice.
Nothing changed.
I found the exact same issue on Github for “KhronosGroup/Vulkan-LoaderAndValidationLayers”.

Isn’t this the same stuff used in the (newest) LunarG Vulkan SDK?

The crashes only occure in debug mode, when using the validation layer.

The crashes mostly occure when calling vkAllocateDescriptorSets or vkCreateDescriptorSetLayout now.
I checked the parameters a dozen times and they all seem valid.

Is there anyone I can ask to look into this matter on the other side?
As it seems almost none uses pImmutableSamplers, except me, as always. <.<

My environment: OS = Windows 10, GPU = NVIDIA GeForce GTX 1070, Driver = 388.59, Vulkan SDK = 1.0.65.1.

It is exactly the same stuff used for LunarG Vulkan SDK.
That being said, you can make an Issue there, or even fix it yourself.

It is suspicious though. Crashes in debug mode should be quite deterministic – no “randomly” about it. Is this multithreaded?
Maybe I can help if you give me repro project (preferrably minimal one).

Hm, I couldn’t find any source code there, that used the written code snippets of the (related) posts, but have to admit, that I didn’t perform an in depth search.
I certainly do not have enough knowledge of their code structures and practices and an issue report would most certainly go unanswered. (As it did here.) ^^

I’ll try to create a minimal project ASAP.
But that will take a lot of time since my original project is massiv and extremely multithreaded. I can’t use it here.
I have to write the thing from the ground up which is still faster than trying to make an “easy” internet example work.

If you already have some test projects in your environment it would most certainly go faster if you just make some adjustments locally and test it directly.
Maybe I can’t reproduce the error that easy since my original project is so much more complex.

Wow, that was fast. I could recycle far more than I thought and the error occured immediately.

Please use Windows 10 since there are reports that this kind of error does not occure on Windows 7 if I remember correctly.
It is a single threaded, straightforward, header-only Visual Studio 2017 Project which uses latest standard C++ 17. (I love the new shit!)
In a perfect world you just have to set the path to the LunarG SDK “include” and “lib” directories and press F5.

I have ziped and uploaded the solution to Onedrive.
https://1drv.ms/u/s!Arfx89YPy2U1jQA96ukkLME4LjCX

Still I hope I just use something in a way I shouldn’t. Resolving that would be faster than waiting for yet another LunarG SDK release. <.<

gn8

Actually, in this world, you can just set $(VULKAN_SDK)\Include and $(VULKAN_SDK)\Lib or Lib32 and it will just automagically work for most people.

Anyway, works for me with no crashes; well, prints “Press Enter to Finish!”. It needed to rewrite some #include &lt;&gt; into #include "", not sure what that’s about. Windows 10 1709, VS 2017 15.5.1, SDK 1.0.65.1, AMD software 17.11.4

Trying with trunk version of layers reveals new error:

vkCreateSampler(): Anisotropic sampling feature is not enabled, pCreateInfo->anisotropyEnable must be VK_FALSE. The spec valid usage text states ‘If the anisotropic sampling feature is not enabled, anisotropyEnable must be VK_FALSE’ (Vulkan® 1.0.274 - A Specification)

Hm, I get a:

Exception thrown at 0x00007FF811ED07D4 (VkLayer_threading.dll) in VulkanError.exe: 0xC0000005: Access violation reading location 0x0000000000000010.

at “vkAllocateDescriptorSets(logicalDevice->vkLogicalDevice, &vkDescriptorSetAllocateInfo, &_vkDescriptorSet)”
when opening the same (cleaned) project I just uploaded.

Windows 10 1709, VS 2017 15.5.1, SDK 1.0.65.1, NVIDIA driver 388.59

I did get the error already some months ago with a different Windows subversion, VS, SDK and even different NVIDIA driver version.
Do you know anyone with an NVIDIA card to test it? Even at work and in my family none uses AMD.

I will try to test the project at work tomorrow. (Just got the rights to install the SDK.)

Is there anything you can think of that could cause this weird behavior?

Now I also did some more test changes to the DescriptorLayout bindings:

  1. set “auto maxshadowcount = 4;”
  2. commented “//CreateDescriptorSetLayoutBinding(9, VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER, vksamplers.data(), maxshadowcount),”

Result:
RANDOM(!) crashes. Either at vkAllocateDescriptorSets or even AFTER pressing ENTER (startup worked) when destructing the VkLogicalDevice.
In the original project I also had crashes at vkCreateDescriptorSetLayout.

I replaced the last multi-sampler binding ( 8 ) with multiple single-sampler bindings ( 21-28 ) and nothing happened.

It is most obvious in my opinion that there is some kind of freeing/deleting/overwriting memory issue behind the curtain since there is still a lot of dangerous raw array usage in the SDK from what I saw.

That’s suspicious…

Well:

  1. “undefined behavior” means your PC can become sentient and seduce your wife and steal your kids. So you should fix the error in my previous post.

  2. I have seen drivers input new (bad) Vulkan commands and run them through the layer chain. It should be visible in the callstack. Or if you put api_dump as your first layer.

  3. I have seen entitled support software do all kinds of weird stuff. Firstly check what kind of Implicit Layers you have. Then also make a minidump of the app and look what dlls are loaded.

  4. I got corrupted SDK or Vulkan RT quite a few times, because of AV interference or because someone messed up the installer. At the minimum it is worth to look in the HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ExplicitLayers and HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Khronos\Vulkan\ExplicitLayers if there are only the latest versions of layers there, and each one only once. Running the via should theoretically also help troubleshoot installation issues.

That’s surely true (also not sure I trust the mutex locking strategy), but should crash for me as well for same inputs. Your example is not exactly minimal, it is possible it does something else based on limits or something. Maybe make the api_dump and we can diff.

At that point I am tempted to recommend an exorcist :stuck_out_tongue:

Then also make a minidump of the app and look what dlls are loaded.

  • or simply “Modules” tab in the Debug mode when paused on the crash.

Were it that easy!
Since I neither have wife nor children I would call her Cortana and try to find the next Halo ring with her! ^^

Done. Didn’t change anything. Btw. samplerAnisotropy is a feature my graphics card is able to handle. :wink: https://vulkan.gpuinfo.org/displayreport.php?id=2204#features

[QUOTE=krOoze;42917]2) I have seen drivers input new (bad) Vulkan commands and run them through the layer chain. It should be visible in the callstack. Or if you put api_dump as your first layer.
3) I have seen entitled support software do all kinds of weird stuff. Firstly check what kind of Implicit Layers you have. Then also make a minidump of the app and look what dlls are loaded.
4) I got corrupted SDK or Vulkan RT quite a few times, because of AV interference or because someone messed up the installer. At the minimum it is worth to look in the HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ExplicitLayers and HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Khronos\Vulkan\ExplicitLayers if there are only the latest versions of layers there, and each one only once. Running the via should theoretically also help troubleshoot installation issues.[/QUOTE]

I don’t think there are installation issues. The registry folders/paths are totally fine. Only one SDK version exists.
The "mini"dump is 145 MB… I am not going to plow through that!

If you know the generated assembler code of the SDK you might be able to tell where the error really occures. -.-

All used dlls in the moment of the crash.

There are a lot of implicit layers and extensions since VK_LAYER_LUNARG_standard_validation uses:
“VK_LAYER_GOOGLE_threading”,
“VK_LAYER_LUNARG_parameter_validation”,
“VK_LAYER_LUNARG_object_tracker”,
“VK_LAYER_LUNARG_core_validation”,
“VK_LAYER_GOOGLE_unique_objects”
And in the registry there are also implicit entries for RenderDoc (default windows installed application) and nv-vk64 (Driver I believe).

api_dump seems not to work. I now added it as the first element to the layer list (VK_LAYER_LUNARG_api_dump, VK_LAYER_LUNARG_standard_validation). see LogicalDevice.h
I tried to make use of the vk_layer_settings.txt and copied it to the working directory of the test application but neither did it print anything to std in “lunarg_api_dump.file = FALSE” mode nor does it create a file in “lunarg_api_dump.file = TRUE” mode. (used https://vulkan.lunarg.com/doc/view/1.0.65.1/windows/layers.html and the comments in the file for help)
Btw. I already listen to messages of the validation layer (VK_DEBUG_REPORT_ERROR_BIT_EXT | VK_DEBUG_REPORT_WARNING_BIT_EXT) via a callback function. see Instance.h

I couldn’t find an example how to configure VK_LAYER_LUNARG_api_dump other than the vk_layer_settings.txt.
I also wonder how it should be able to dump anything if it doesn’t log everything all the time but instead is supposed to print something when an error occures.
In this case the “error” is a crash which also prevents the api_dump from printing.

Can you recommend someone/thing?
I could buy a silver cross or some garlic… -.o
I live in a protestant area and personally only believe in the “Architect”. Thus I don’t have much knowledge in this regard.

Btw. Symbol loading might help to hammer out the problem but that would require a debug compilation of the LunarG SDK and the source files. The stuff isn’t open source, is it?
And no luck at work yet. I only have a GTX 570 there… which NVIDIA originally wanted to support Vulkan… <.<

That’s not quite how that works. You have vkSamplerCreateInfo.anisotropyEnable = VK_TRUE;, but you do not have any feature enabled – you are passing VK_FALSE initialized VkPhysicalDeviceFeatures to your vkCreateDevice. You either have to stop using that feature or enable it when you vkCreateDevice.

That won’t be necessary.
Seeing the callstack of the crash would be nice though. Is the screenshot the crash? It shows that it happens inside the NVIDIA driver.

What is the hooxpot64.dll? Try to disable/uninstall it.

Not quite what I meant. I mean like Steam, AMD\NV swichable graphics, and possibly others, which some software feels the need to enforce upon unsuspecting users. But nevermind – it would show in the dll list too, which seems to be clear of any unwanted Vulkan layers.

It does not require any settings. By default, you just add it to enabled layers list on vkCreateInstance and it dumps every Vulkan calls as well as arguments to cout.

Oh, but it is. I believe I mentioned you are perfectly allowed to find the bug and contribute a fix back (assuming the bug is in the layers).
The source is even part of the installed SDK, and so are even the compiled debug versions with symbols. They are in the Source\lib folder. If you link to that, then you should be able to do step-by-step debugging even inside the layer code. You can do so by setting VK_LAYER_PATH environment variable to that path.

Alternatively, the source repo is at GitHub - KhronosGroup/Vulkan-LoaderAndValidationLayers: **Deprecated repository** for Vulkan loader and validation layers

Yeah, I just forgot to copy’n’paste it too. In my last post I wanted to say, that I edited VK_TRUE to VK_FALSE but nothing changed. :wink:

Yes it is the crashed call stack.
Things have to be reproducible on another NVIDIA machine then. Please, could anyone else test the project and give feedback? https://1drv.ms/u/s!Arfx89YPy2U1jQA96ukkLME4LjCX
I found some minor code … “bugs” in the mean time but they do not influence the compiler or the troublesome behavior.

Still I wouldn’t be so sure to say NVIDIA’s Driver causes the error. I had crash call stacks that ended after multiple calls in VkLayer_object_tracker but some where NVIDIA dlls were directly after my Vulkan calls.
I’d rather say those end points are only the results of a real memory bug.

It is just an ancient desktop virtualization tool called Dexpot, I am used to. Shouldn’t make any trouble.

I found the reason it (api_dump) didn’t work… because it is not supported for my environment. (info via vkEnumerateInstanceLayerProperties) see Instance.h
Only RenderDoc, NVIDIA Optimus and Standard Validation layer are supported.

I also put the VULKAN_SDK\Source\lib path into the environment PATH replacing the previous VULKAN_SDK\bin (+restart) and also added VULKAN_SDK\Source\lib to the Visual Studio lib directories.
Still I do get only the disassembler code when looking into a layer library in the call stack when crashed.
How exactly do I have to configure Visual Studio to get all the layer debug/code information into the call stack? (C++ Library/Binary/Header/Source import/export/debug madness is still somewhat new to me since I mostly did Header-Only stuff yet)

Btw. there are f*cker files of more than 16000(!!) LOD in the Source directory. I simple do not have the time to work into the layer code and find the error manually.

You need to add the program database files (*.pdb) for the layers to your debugger settings. If you’re using the prebuilt SDK they’re located in the lib/lib32 sub folder. Add these to your debugger via settings->debugging->symbols.

See this thread for details.

Neither am I if you say it works with layers disabled.
The VK_LAYER_LUNARG_standard_validation is the same as

VK_LAYER_GOOGLE_threading
VK_LAYER_LUNARG_parameter_validation
VK_LAYER_LUNARG_object_tracker
VK_LAYER_LUNARG_core_validation
VK_LAYER_GOOGLE_unique_objects

(in this order). Maybe try disabling one of them at a time to find the culprit. My bet would be on VK_LAYER_GOOGLE_unique_objects. Location 0x010 seems like an Vulkan object, and maybe the layer does not properly translate all the VkSamplers there; I will give it a look.

You would be surprised. I mean, if you have to inject dlls into other processes, then your middle name is Trouble. Just sayin’ from experience. I have seen “innocent” software like e.g. mouse accompaniing software to cause crashes.

Why the hell not? It only needs the SDK installed, that’s all. Weird.

Yea, it is bit hairy to set up (mostly VS fault).
Anyway:

  1. Use the debug (unoptimized) versions of layers. You do so by adding VK_LAYER_PATH=$(VULKAN_SDK)\Source\lib into your project-properties -&gt; Debugging -&gt; Environment field.

  2. In Tools -&gt; Options -&gt; Debugging -&gt; Symbols add %VULKAN_SDK%\Source\lib to the list and check it enabled.

  3. If you also want to see the Loader code, you have to somehow force your app to use the SDK version of the loader. I do that by copying %VULKAN_SDK%\Source\lib\vulkan-1.dll next to the app executable.

  4. In solution-properties -&gt; Debug Source Files add $(VULKAN_SDK)\Source\ to the list.

  5. Make the app crash in Debug x64 config and debug mode. In the call stack you should be seeing proper function names when inside layers, and if you double click you should see the code (and everything). Of course, still wont see anything much inside a NV driver.

Alternatively you can do all in steps 2) 3) 4) at this point. In the Modules window (I think even the Call Stack window) you can right click the layer dll and there should be an option to choose a symbols (*.pdb) file. Then clicking inside Call Stack into some point in the layers should trigger a file chooser for source files.

Uuhh, the grandmaster, Sascha Willems himself gave me a perfect advice. O.O
Thanks!

That and your 5 steps, krOoze, enabled me to get much deeper information.
It also gave me the required support for api_dump. (For whatever reason. Registry stuff seems not to work as expected. <.<)

First tests revealed that with the debug dll version the crash now occures when destroying the LogicalDevice (vkDestroyDevice) or when waiting for it to finish just before destroying it (vkDeviceWaitIdle). see LogicalDevice.h (Destructor)
But no matter with or without any layers enabled, the crash happens. (as before)
The moment I remove the two sampler array DescriptorSetLayoutBinding definitions it works perfectly.
Could mean that either the driver itself is corrupt or it is fed wrong data at some point (DescriptorSetLayout creation).

Now. I disabled all layers except api_dump and copied the printout.

Current Project:
https://1drv.ms/u/s!Arfx89YPy2U1jQOX0PahehzOEK_c

api_dump (crash within vkDeviceWaitIdle deep in driver):
https://1drv.ms/t/s!Arfx89YPy2U1jQK-cDnJX12Ug6TQ

All data fed into the driver api function seemed valid, but of course I dont have any idea what it means.

Seems I have to wait for a wonder within NVIDIA and hope for a fixed driver release(?) <.<

I do get crashes with your code on my GT980 too, mostly when destroying the device but also randomly at different points. You’re using two bindings with each havin 64 samplers set, so you may hit some odd (undocumented) driver limit that causes this behaviour.

So you may want to supply this repo case to nvidia for further investigation. They have a Vulkan driver support forum at: Vulkan - NVIDIA Developer Forums

I will do that. Thanks for the link.
I hope reaction times in this case are not what I see in other threads over there (months). <.<

Btw. the crash already happens when using <10 sampler array sizes. I only used 64 “to make sure” it crashes.
In reality the potential lookup of 64 textures would be a bit overkill. ^^

Login Impossible in NVIDIA DevTalk.
I have filled a support ticket.
Starts nicely. <.<