Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

Compilation and Automatic Parallelisation of Functional Code for Data-Parallel Architectures

OAI: oai:purehost.bath.ac.uk:publications/df1a79b0-e1ed-48ef-b534-5e9f2060f400

Abstract

Over recent years, there has been a stagnation of the increase in CPU clock speed, and consequently, it has become increasingly popular to offload general-purpose computing problems to graphics processors to try to exploit the massively data-parallel processing capabilities of these devices.This project presents the design of a functional programming language and the implementation of a prototype compiler which aims to produce code that exploits the powerful processing capabilities of data-parallel hardware components, such as CUDA enabled graphics processors. One of the long-term goals is to provide programmers with a tool that simplifies the development of algorithms for parallel architectures.Previous work in the area of automatic parallellisation of code is predominantly concerned with the exploitation of task parallelism in functional languages, such as Lisp and Haskell, and data parallelism in imperative languages, such as Fortran. In the cases where data-parallelism has been exploited in functional languages, e.g., in Data Parallel Haskell, this has mostly been done by introducing library support for CUDA, OpenCL and other data-parallel frameworks.The main focus in the course of this project has been directed towards the optimisation techniques that can be applied to seemingly sequential, functional-style code to prepare it for automatic parallelisation. The pre-eminent transformation in this context is the conversion of augmenting recursion and tail recursion into iteration which, consequently,can enable the translation of iterative constructs into parallel loops, given that there are no loop-carried dependences.Thee compiler strives to identify natural mapping and reduction constructs in sequential code. Furthermore, a dynamic performance model is employed to ensure that only beneficial sections of the code are parallelised. It is concluded from the initial results, that tenfold to hundredfold speed-ups can be achieved from the parallelisation of sequential representations of naturally data-parallel constructs, depending on the pointof comparison.