c++ - How to reduce in CUDA if __syncthreads can't be called inside conditional branches? -
the reduction method suggested nvidia uses __syncthreads()
inside conditional branching e.g.:
if (blocksize >= 512) { if (tid < 256) { sdata[tid] += sdata[tid + 256]; } __syncthreads(); }
or
for (unsigned int s=blockdim.x/2; s>32; s>>=1) { if (tid < s) sdata[tid] += sdata[tid + s]; __syncthreads(); }
in second example __syncthreads()
inside for
loop body, conditional branch.
however, number of questions on raise problem of __syncthreads()
inside conditional branches (e.g. can use __syncthreads() after having dropped threads? , conditional syncthreads & deadlock (or not) ), , answers __syncthreads()
in conditional branches may lead deadlock. consequently, reduction method suggested nvidia may deadlock (if believing documentation on answers based).
furthermore, if _syncthreads()
can't used inside conditional branches, i'm afraid many of basic operations blocked , reduction example.
so how reduction in cuda without using __syncthreads()
in conditional branches? or bug in documentation?
the limitation not
__syncthreads
cannot used in conditional branches
the limitation is
__syncthreads
cannot used in branches not traversed threads @ same time
notice in both examples give, __syncthreads
not covered condition depend on thread id (or per-thread data). in first case, blocksize
template parameter not depend on thread id. in second case, it's likewise after if
.
yes, loop's s > 32
condition, condition truth value not depend on thread or data in way. blockdim.x
same threads. , threads execute same modifications of s
. means all threads reach __syncthreads
in same point of control flow. ok.
the other case, cannot use __syncthreads
, condition can true threads , false other ones. in such case, have close conditions use __syncthreads
. instead of this:
if (threadidx.x < some_constant) { operation1(); __syncthreads(); operation2(); }
you must this:
if (threadidx.x < some_constant) { operation1(); } __syncthreads(); if (threadidx.x < some_constant) { operation2(); }
both of examples gave demonstrate too: thread-id-dependent condition closed before __syncthreads
called.
Comments
Post a Comment