c++ - How to reduce in CUDA if __syncthreads can't be called inside conditional branches? -
the reduction method suggested nvidia uses __syncthreads() inside conditional branching e.g.:
if (blocksize >= 512) { if (tid < 256) { sdata[tid] += sdata[tid + 256]; } __syncthreads(); } or
for (unsigned int s=blockdim.x/2; s>32; s>>=1) { if (tid < s) sdata[tid] += sdata[tid + s]; __syncthreads(); } in second example __syncthreads() inside for loop body, conditional branch.
however, number of questions on raise problem of __syncthreads() inside conditional branches (e.g. can use __syncthreads() after having dropped threads? , conditional syncthreads & deadlock (or not) ), , answers __syncthreads() in conditional branches may lead deadlock. consequently, reduction method suggested nvidia may deadlock (if believing documentation on answers based).
furthermore, if _syncthreads() can't used inside conditional branches, i'm afraid many of basic operations blocked , reduction example.
so how reduction in cuda without using __syncthreads() in conditional branches? or bug in documentation?
the limitation not
__syncthreadscannot used in conditional branches
the limitation is
__syncthreadscannot used in branches not traversed threads @ same time
notice in both examples give, __syncthreads not covered condition depend on thread id (or per-thread data). in first case, blocksize template parameter not depend on thread id. in second case, it's likewise after if.
yes, loop's s > 32 condition, condition truth value not depend on thread or data in way. blockdim.x same threads. , threads execute same modifications of s. means all threads reach __syncthreads in same point of control flow. ok.
the other case, cannot use __syncthreads, condition can true threads , false other ones. in such case, have close conditions use __syncthreads. instead of this:
if (threadidx.x < some_constant) { operation1(); __syncthreads(); operation2(); } you must this:
if (threadidx.x < some_constant) { operation1(); } __syncthreads(); if (threadidx.x < some_constant) { operation2(); } both of examples gave demonstrate too: thread-id-dependent condition closed before __syncthreads called.
Comments
Post a Comment