c++ - How to reduce in CUDA if __syncthreads can't be called inside conditional branches? -


the reduction method suggested nvidia uses __syncthreads() inside conditional branching e.g.:

if (blocksize >= 512) { if (tid < 256) { sdata[tid] += sdata[tid + 256]; } __syncthreads(); } 

or

for (unsigned int s=blockdim.x/2; s>32; s>>=1) {     if (tid < s)         sdata[tid] += sdata[tid + s];     __syncthreads(); } 

in second example __syncthreads() inside for loop body, conditional branch.

however, number of questions on raise problem of __syncthreads() inside conditional branches (e.g. can use __syncthreads() after having dropped threads? , conditional syncthreads & deadlock (or not) ), , answers __syncthreads() in conditional branches may lead deadlock. consequently, reduction method suggested nvidia may deadlock (if believing documentation on answers based).

furthermore, if _syncthreads() can't used inside conditional branches, i'm afraid many of basic operations blocked , reduction example.

so how reduction in cuda without using __syncthreads() in conditional branches? or bug in documentation?

the limitation not

__syncthreads cannot used in conditional branches

the limitation is

__syncthreads cannot used in branches not traversed threads @ same time

notice in both examples give, __syncthreads not covered condition depend on thread id (or per-thread data). in first case, blocksize template parameter not depend on thread id. in second case, it's likewise after if.

yes, loop's s > 32 condition, condition truth value not depend on thread or data in way. blockdim.x same threads. , threads execute same modifications of s. means all threads reach __syncthreads in same point of control flow. ok.

the other case, cannot use __syncthreads, condition can true threads , false other ones. in such case, have close conditions use __syncthreads. instead of this:

if (threadidx.x < some_constant) {   operation1();   __syncthreads();   operation2(); } 

you must this:

if (threadidx.x < some_constant) {   operation1(); } __syncthreads(); if (threadidx.x < some_constant) {   operation2(); } 

both of examples gave demonstrate too: thread-id-dependent condition closed before __syncthreads called.


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

Sound is not coming out while implementing Text-to-speech in Android activity -