Papers by Sven-Bodo Scholz
With the rising variety of hardware designs for multi-core systems, the effectiveness in exploiti... more With the rising variety of hardware designs for multi-core systems, the effectiveness in exploiting implicit concurrency of programs plays a more vital role for programming such systems than ever before. We believe that a combination of a data-parallel approach with a declarative programming-style is up to that task: Dataparallel approaches are known to enable compilers to make efficient use of multi-processors without requiring low-level program annotations. Combining the data-parallel approach with a declarative programming-style guarantees semantic equivalence between sequential and concurrent executions of data parallel operations. Furthermore, the side-effect free setting and explicit model of dependencies enables compilers to maximise the size of the data-parallel program sections. However, the strength of the rigidity of the declarative approach also constitutes its weakness: Being bound to observe all data dependencies categorically rules out the use of side-effecting operations within data-parallel sections. Not only does this limit the size of these regions in certain situations, but it may also hamper an effective workload distribution. Considering side effects such as plotting individual pixels of an image or output for debugging purposes, there are situations where a non-deterministic order of side-effects would not be considered harmful at all. We propose a mechanism for enabling such non-determinism on the execution of side-effecting operations within data-parallel sections without sacrificing the side-effect free setting in general. Outside of the data-parallel sections we ensure single-threading of side-effecting operations using uniqueness typing. Within dataparallel operations however we allow the side-effecting operations of different threads to occur in any order, as long as effects of different threads are not interleaved. Furthermore, we still model the dependencies arising from the manipulated states within the data parallel sections. This measure preserves the explicitness of all data dependencies and therefore it preserves the transformational potential of any restructuring compiler.
Proceedings of the 29th Symposium on the Implementation and Application of Functional Programming Languages
Recursive value definitions in the context of functional programming languages that are based on ... more Recursive value definitions in the context of functional programming languages that are based on a call-by-value semantics are known to be challenging. A lot of prior work exists in the context of languages such as Scheme and OCaml. In this paper, we look at the problem of recursive array definitions within a call-by-value setting. We propose a solution that enables recursive array definitions as long as there are no cyclic dependences between array elements. The paper provides a formal semantics definition, sketches possible compiler implementations and relates to a prototypical implementation of an interpreter in OCaml. Furthermore, we briefly discuss how this approach could be extended to other data structures and how it could serve as a basis to further extend mutually recursive value definitions in a call-by-value setting in general. CCS CONCEPTS • Software and its engineering → Functional languages; Recursion; Compilers; • Computing methodologies → Parallel programming languages;
33rd Symposium on Implementation and Application of Functional Languages
Data-Parallelism on multi-dimensional arrays can be conveniently specified as element-wise comput... more Data-Parallelism on multi-dimensional arrays can be conveniently specified as element-wise computations that are executed across-dimensional index-spaces. While this at first glance matches very well the concept of-dimensional thread-spaces as provided by the programming model of GPUs, in practice, a one-to-one correspondence often does not work out. This paper proposes a small set of combinators for mapping multi-dimensional index spaces onto multi-dimensional threadindex spaces suitable for execution on GPUs. For each combinator, we provide an inverse operation, which allows the original indices to be recovered within the individual threads. This setup facilitates arbitrary-dimensional array computations to be executed on GPUs with arbitrary thread-space constraints provided the overall resources required for the computation do not exceed those of the GPU.
33rd Symposium on Implementation and Application of Functional Languages
Memory management plays a key role when trying to compile functional programs into efficiently ex... more Memory management plays a key role when trying to compile functional programs into efficiently executable code. In particular when using flat representations for multi-dimensional arrays, i.e., when using a single memory block for the entire data of a multi-dimensional array, in-place updates become crucial for highly competitive performance. This paper proposes a novel code generation technique for performing fold-operations on hyper-planes of multi-dimensional arrays, where the fold-operation itself operates on non-scalar subarrays, i.e., on vectors or higher-dimensional arrays. This technique allows for a single result array allocation over the entire folding operation without requiring the folding operation itself to be scalarised. It enables the utilisation of vector operations without any added memory allocation or copying overhead. We describe our technique in the context of SaC, sketch our implementation in the context of the compiler sac2c and provide some initial performance measurements that give an indication of the effectiveness of this new technique. CCS CONCEPTS • Software and its engineering → Compilers; Functional languages.
This poster outlines the speed increase in the inversion part of the BGS global modelling softwar... more This poster outlines the speed increase in the inversion part of the BGS global modelling software after rewriting some of it in the SAC language.
This paper investigates how branch and bound algorithms can be implemented in a functional, data ... more This paper investigates how branch and bound algorithms can be implemented in a functional, data parallel setting. We identify a general programming pattern for such algorithms and we discuss compilation and runtime aspects when it comes to mapping the programming pattern into parallel code. We use the maximum clique problem in undirected graphs as a running example and we present first experiences in the context of SaC.
Lecture Notes in Computer Science, 1999
Sac is a functional C variant with e cient support for highlevel array operations. This paper inv... more Sac is a functional C variant with e cient support for highlevel array operations. This paper investigates the applicability of a Sac speci c optimization technique called with-loop-folding to real world applications. As an example program which originates from the Numerical Aerodynamic Simulation (NAS) Program developed at NASA Ames Research Center, the so-called NAS benchmark MG is chosen. It comprises a kernel from the NAS Program which implements 3-dimensional multigrid relaxation. Several run-time measurements exploit two di erent bene ts of withloop-folding: First, an overall speed-up of about 20% can be observed. Second, a comparison between the run-times of a hand-optimized specication and of Apl-like speci cations yields identical run-times, although a naive compilation that does not apply with-loop-folding leads to slowdowns of more than an order of magnitude. Furthermore, With-loopfolding makes a slight variation of the algorithm feasible which substantially simpli es the program speci cation and requires less memory during execution. Finally, the optimized run-times are compared against run-times gained from the original Fortran program, which shows that for di erent problem sizes, the code generated from the Sac program does not only reach the execution times of the code generated from the Fortran program but even outperforms them by about 10%.
ACM SIGAPL APL Quote Quad, 1999
Most of the existing high-level array-processing languages support a fixed set of pre-defined arr... more Most of the existing high-level array-processing languages support a fixed set of pre-defined array operations and a few higher-order functions for constructing new array operations from existing ones. In this paper, we discuss a more general approach made feasible by S AC (for Single Assignement C), a functional variant of C.S AC provides a meta-level language construct called WITH -loop which may be considered a sophisticated variant of the FORALL -loops in H PF or of array comprehensions in functional languages. It allows for the element-wise specification of high-level operations on arrays of any dimensionality: any set of high-level array operations can be specified by means of WITH -loops and be made available in a library. This not only improves the flexibility of specifications, but also simplifies the compilation process.By means of a few examples it is shown that the high-level operations that are typically available in array processing languages such as A PL or F ORTRAN 9...
Lecture Notes in Computer Science, 1998
This paper introduces a new compiler optimization called with-loop-folding. It is based on a spec... more This paper introduces a new compiler optimization called with-loop-folding. It is based on a special loop construct, the withloop, which in the functional language Sac (for Single Assignment C) serves as a versatile vehicle to describe array operations on an elementwise basis. A general mechanism for combining two of these with-loops into a single loop construct is presented. This mechanism constitutes a powerful tool when it comes to generate efficiently executable code from high-level array specifications. By means of a few examples it is shown that even complex nestings of array operations similar to those available in Apl can be transformed into single loop operations which are similar to hand-optimized with-loop specifications. As a consequence, the way a complex array operation is combined from primitive array operations does not affect the runtime performance of the compiled code, i.e., the programmer is liberated from the burden to take performance considerations into account when specifying complex array operations.
Lecture Notes in Computer Science, 2006
These notes present an introduction into array-based programming from a functional, i.e., side-ef... more These notes present an introduction into array-based programming from a functional, i.e., side-effect-free perspective. The first part focuses on promoting arrays as predominant, stateless data structure. This leads to a programming style that favors compositions of generic array operations that manipulate entire arrays over specifications that are made in an element-wise fashion. An algebraicly consistent set of such operations is defined and several examples are given demonstrating the expressiveness of the proposed set of operations. The second part shows how such a set of array operations can be defined within the first-order functional array language SaC. It does not only discuss the language design issues involved but it also tackles implementation issues that are crucial for achieving acceptable runtimes from such genericly specified array operations.
Lecture Notes in Computer Science, 2005
A general homomorphic overloading in a first-order type system is discussed. Type inference is ap... more A general homomorphic overloading in a first-order type system is discussed. Type inference is applied within predefined classes each containing an arbitrary first-order subtyping hierarchy. We propose a computationally efficient type inference algorithm by converting the attendant constraint-satisfaction problem into the algebraic path problem for a constraint graph weighted with elements of a specially constructed non-commutative star semiring. The elements of the semiring are monotonic functions from integers to integers (including ±∞) with pointwise maximum and function composition as semiring operations. The computational efficiency of our method is due to Klene's algebraic path method's cubic complexity. Our algorithm is applicable to type inference in the presence of unknown external types and supports distributed type inference.
2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops, 2008
Text-based approaches to the analysis of software evolution are attractive because of the fine-gr... more Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses ngrams of 3 tokens, to characterise the evolution of software code. Ferret's implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferret's output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.
2013 International Conference on High Performance Computing & Simulation (HPCS), 2013
Amorphous Data Parallelism has proven to be a suitable vehicle for implementing concurrent graph ... more Amorphous Data Parallelism has proven to be a suitable vehicle for implementing concurrent graph algorithms effectively on multi-core architectures. In view of the growing complexity of graph algorithms for information analysis, there is a need to facilitate modular design techniques in the context of Amorphous Data Parallelism. In this paper, we investigate what it takes to formulate algorithms possessing Amorphous Data Parallelism in a modular fashion enabling a large degree of code re-use. Using the betweenness centrality algorithm, a widely popular algorithm in the analysis of social networks, we demonstrate that a single optimisation technique can suffice to enable a modular programming style without loosing the efficiency of a tailor-made monolithic implementation.
Proceedings of the Fifth International Plagiarism Conference, 2012
Many academic staff will recognise that unusual shared elements in student submissions trigger su... more Many academic staff will recognise that unusual shared elements in student submissions trigger suspicion of inappropriate collusion. These elements may be odd phrases, strange constructs, peculiar layout, or spelling mistakes. In this paper we review twenty-nine approaches to source-code plagiarism detection, showing that the majority focus on overall file similarity, and not on unusual shared elements, and that none directly measure these elements. We describe an approach to detecting similarity between files ...
This document explains a method for identifying dense blocks of copied text in pairs of files. Th... more This document explains a method for identifying dense blocks of copied text in pairs of files. The files are compared suing Ferret, a copy-detection tool which computes a similarity score based on trigrams. This similarity score cannot determine the arrangement of copied text in a file; two files with the same similarity to another file may have different distributions of matched trigrams in the file. For example, in one file the matched trigrams may be in a large block, while they are scattered throughout the other file. However, Ferret produces an XML ...
Uploads
Papers by Sven-Bodo Scholz