I am locking this topic.
As per OpenCL specification:
6.5.2 __local (or local)
The __local or local address space name is used to describe variables that need to be
allocated in local memory and are shared by all work-items of a work-group. This qualifier can
be used with arguments to functions (including __kernel functions) declared as pointers, or
with variables declared inside a __kernel function.
6.5.4 __private (or private)
All variables inside a function (including __kernel functions), or passed into the function as
arguments are in the __private or private address space. Variables declared as pointers
are considered to point to the __private address space if an address space qualifier is not
specified except for arguments declared to be of type image2d_t and image3d_t which
implicitly point to the __global address space.
In other words, the hardware threads are stomping each others work and producing nonsense.