Performance 1 .cl file vs several .cl files

In what concerns to performance, is it different to use only one .cl file with several kernels instead of for example one kernel on its own file?

That is going to depend a lot on the OpenCL implementation and whether some of the kernels share some auxiliary functions, etc. You will have to try different ways and measure the performance.

Generally speaking, most of the time will be spent running the kernels rather than loading them, so for regular applications I doubt that you will see a significant difference either way.