c++ - Busy loop slows down latency-critical computation -


my code following:

  1. do long-running intense computation (called useless below)
  2. do small latency-critical task

i find time takes execute latency-critical task higher long-running computation without it.

here stand-alone c++ code reproduce effect:

    #include <stdio.h>     #include <stdint.h>      #define len 128     #define useless 1000000000     //#define useless 0      // read timestamp counter     static inline long long get_cycles()     {             unsigned low, high;             unsigned long long val;             asm volatile ("rdtsc" : "=a" (low), "=d" (high));             val = high;             val = (val << 32) | low;             return val;     }      // compute simple hash     static inline uint32_t hash(uint32_t *arr, int n)     {             uint32_t ret = 0;             for(int = 0; < n; i++) {                     ret = (ret + (324723947 + arr[i])) ^ 93485734985;             }             return ret;     }      int main()     {             uint32_t sum = 0;       // adding dependencies             uint32_t arr[len];      // we'll compute hash of array              for(int iter = 0; iter < 3; iter++) {                     // create new array hash iteration                     for(int = 0; < len; i++) {                             arr[i] = (iter + i);                     }                      // intense computation                     for(int useless = 0; useless < useless; useless++) {                             sum += (sum + useless) * (sum + useless);                     }                      // latency-critical task                     long long start_cycles = get_cycles() + (sum & 1);                     sum += hash(arr, len);                     long long end_cycles = get_cycles() + (sum & 1);                      printf("iteration %d cycles: %lld\n", iter, end_cycles - start_cycles);             }     } 

when compiled -o3 useless set 1 billion, 3 iterations took 588, 4184, , 536 cycles, respectively. when compiled useless set 0, iterations took 394, 358, , 362 cycles, respectively.

why (particularly 4184 cycles) happening? suspected cache misses or branch mis-predictions induced intense computation. however, without intense computation, zeroth iteration of latency critical task pretty fast don't think cold cache/branch predictor cause.

moving speculative comment answer:

it possible while busy loop running, other tasks on server pushing cached arr data out of l1 cache, first memory access in hash needs reload lower level cache. without compute loop wouldn't happen. try moving arr initialization after computation loop, see effect is.


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -