For example, if I'm declaring a function like so:
_global_ void square(float *d_out, float *d_in);
CodePudding user response:
It is not standard C, only some extension or #define.
It looks like CUDA.
__global__ function is executed on GPU, it can be called from CPU or the GPU.
Calling __global__ functions is often more expensive than __device__.
If it is a #define, compile the file with the -E (gcc) or similar option and see how this macro is expanded.
