Abstract
Abstract: One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message‐passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware‐implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade‐offs between the related overheads and performance factors on a message‐passing prototype machine with 49,152 RISC‐V threads distributed over 48 FPGAs (called the Partially Ordered Event‐Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model‐driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.