I compiled the new release of gcc-5.1 with the Cilkplus parallel processing extensions and runtime library for ARMv7 architecture on the Raspberry Pi 2B single board computer. Two changes were needed.
The first change corrects a typo in generic/cilk-abi-vla.c by changing the second to the last line of the file from
vla_internal_heap_free(t, full_size);
to
vla_internal_heap_free(p, full_size);
the second change was to generic/os-fence.c and ARM specific. Comment out the line
COMMON_SYSDEP void __cilkrts_fence(void); ///< MFENCE instruction
as
// COMMON_SYSDEP void __cilkrts_fence(void); ///< MFENCE instruction
and then add the define
#define __cilkrts_fence() __asm__ volatile ("DSB")
right above it. I've been testing the results and getting reasonable parallel speedup using 4-cores on a number of algorithms. My results are posted on the Raspberry Pi forum under the topic "Programming C/C++" in the thread "Cilkplus on RPi2B."
It appears that cilk_spawn, cilk_sync and cilk_for are running without errors; however, I've not optimized the stack swapping code in generic as has been done for Intel architecture CPUs.
Is anyone working on this?