OpenCL problem with Nvidia Driver 378.49 and CUDA SDK v8.0

Hello everyone,

i’m quite new to OpenCL and everything around it. At the moment i’m working on my first OpenCL project, implementing an AES implementation in OpenCL. My development environment consists of three PCs: Two PCs with Nvidia graphic card and one with an ATI card. I’m working on it for about 1 month now, the algorithm is working quite good by now, but something strange has happened at the weekend:

I was using one of my notebooks with Nvidia graphic card and installed a new driver for it, exactly the version mentioned in the thread title. I didn’t mind about it any further and continued my work, but from then on my algorithm wasn’t working correctly anymore. This is what happens till then:

I use a kernel which consists i.a. of a constant declared array for my sbox:


__constant unsigned char AES_SBox[256] =
{
	0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
	0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
	0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
	0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
	0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
	0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
	0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
	0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
	0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
	0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
	0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
	0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
	0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
	0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
	0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
	0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16
};

This is the very first thing in my kernel. After some tests, i just tried to iterate through it, because the byte substitution returned unexpected results:


for(int i = 0; i < 256; i++)
	{
		printf("%x, %d
", AES_SBox[i], AES_SBox[i]);
	}

The following is the output on console:
NVIDIA CUDA GeForce GTX 970
Total device memory: 4096MB
Maximum buffer size: 1024 MB
Number of compute units: 13
0, 0
0, 0
0, 0
63, 99
7c, 124
77, 119
7b, 123
f2, 242
6b, 107
6f, 111
c5, 197
30, 48
1, 1
67, 103
2b, 43
fe, 254
d7, 215
ab, 171
76, 118
ca, 202
82, 130
c9, 201
7d, 125
fa, 250
59, 89
47, 71
f0, 240
ad, 173
d4, 212
a2, 162
af, 175
9c, 156
a4, 164
72, 114

You see that the first three lines are incorrect. I was very surprised about that. But it goes on: As you can see i’ve declared that the AES_SBox Array has 256 elements, but i can get the 257th to 259th elements also! After that array i got another array (the inverse to the sbox) and the first three elements of it are the last three of the first array.

So i assume that the data in the allocated memory has been moved by three bytes.
Just to say again: Before i installed that update, everything was working fine.

Does somebody knows something about that?

Best regards
Patrick

I had some weird behavior with printf on AMD’s driver. I worked it around by loading the number into private memory first. It can be a presentation problem of some sort in your case as well.

for(int i = 0; i < 256; i++)
{int t = AES_SBox[i]
printf("%x, %d
", ,t,t);
}

I just tried that, but same result. For now i will use an older driver, but i’m curious what’s the problem with current driver…