While optimizing a matrix manipulation code in C, I used CilkPlus to spawn a thread to execute in parallel two functions that are data independent and somewhat computationally intensive. Cilk_spawn is used in only one place in the code as follows:
//(test_function declarations) cilk_spawn highPrep(d, x, half); d = temp_0; r = malloc(sizeof(int)*(half)); temp_1 = r; x = x_alloc + F_EXTPAD; lowPrep(r, d, x, half); cilk_sync; //test_function return
According to the documentation I have read so far, cilk_spawn is expected to -maybe since CilkPlus does not enforce parallelism- take the highPrep() function and execute it in a different hardware thread if one is available. At the same time it will continue executing the rest of the code including the function lowPrep() until the cilk_sync is reached. At that point the threads sync before the execution proceeds.
The tests are ran on a Xeon E5-2680, dedicated for these experiments. When I change the environment variable CILK_NWORKERS and try values such as 2, 4, 8, 16 the time that the test_function requires to be executed increases as the number of available workers grows larger than 2.
I would expect the available number of threads not to change anything in the execution of this code. I would expect that if 2 threads are available then the function highPrep is executed a thread different than the main. Any thread after that I would expected to remain idle.
Could anyone help in understanding what is going wrong here?
Thank you in advance.