Skip to main content

Alex Shafarenko

University of Hertfordshire, Computer Science, Faculty Member

Followers

9

Following

4

Co-authors

3

Public Views

Address: No address

less

Friedrich-Schiller-Universität Jena

Alex Shafarenko

Richard (Rich) Vuduc

Georgia Institute of Technology

University of Hertfordshire

Sven-Bodo Scholz

Rice University

Nguyen Thuy Nga

Dragoș Sbîrlea

Interests

Uploads

Papers by Alex Shafarenko

A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections

We present a programming methodology and runtime performance case study comparing the declarative... more We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-NET with Intel's Concurrent Collections (CnC). As a coordination language S-NET achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous data flow streaming network. We investigate the merits of S-NET and CnC with the help of a relevant and non-trivial linear algebra problem: tiled Cholesky decomposition. We describe two alternative S-NET implementations of tiled Cholesky factorization and compare them with two CnC implementations, one with explicit performance tuning and one without, that have previously been used to illustrate Intel CnC. Our experiments on a 48-core machine demonstrate that S-NET manages to outperform CnC on this problem.

Guest Editors’ Editorial: Special Issue on the Second International Workshop on Microgrids

International Journal of Parallel Programming, Oct 27, 2009

Arrays and streams, this is what multi-/manycore computing could soon be all about. Indeed array ... more Arrays and streams, this is what multi-/manycore computing could soon be all about. Indeed array processing is the basis of much DSP, machine graphics and multimedia, while streams are a universal glue that keeps processing pipelines chugging away at full speed. Both possess regularity and both are amenable to heterogeneity. The regularity of arrays is spatial, and of streams temporal. Heterogeneity is a requirement that stems from specialisation of computing and communication resources according to stable patterns of usage. The regularity makes it possible to develop compiler optimisations and hierarchical abstractions to achieve both expressiveness and performance. But it is heterogeneity that gives computing platforms their adaptivity and improves the ability to specialize at run time, when the power of static analyses and optimisations has been exhausted. The second Microgrids Workshop that took place in Hertfordshire in December 2007 explored the issues of regularity and heterogeneity at great length. Out of 2 days of talks, four presentations have been selected for publication as journal papers, for which extended versions have been solicited from the authors. After substantial revisions, we present them here in a special issue of IJPP. The motifs of array computing and stream processing permeate all four publications. The first, "Compilation techniques for high level parallel code" contributed by authors from the AMD Corp., ClearSpeed and XMOS, the latter two being parallel computing specialist ventures, focuses on retargeting existing compilers towards dataparallel computing platforms. This is done by extending the datatypes of conventional C by attributes that indicate whether a particular C statement is to be executed on each processor element or only once by the master thread. For a distributed control statement this introduces heterogeneity, which is the focus of this work. Heterogeneity

RETRAN: a recurrent paradigm for data-parallel computing

Oxford University Press eBooks, Sep 14, 1995

Symmetries in data parallelism

The Computer Journal, May 1, 1995

A Massively Aliasing System for Support of Agent-based Internet Computing

Parallel and Distributed Processing Techniques and Applications, 1999

Mitigation of Patterning Effects at 40 Gb/s by Skewed Channel Pre-Encoding

Through modelling of direct error computation, a reduction of pattern- dependent errors in a stan... more

Linear Support Vector Machines for Error Correction in Optical Data Transmission

Lecture Notes in Computer Science, 2013

Reduction of bit error rates in optical transmission systems is an important task that is difficu... more Reduction of bit error rates in optical transmission systems is an important task that is difficult to achieve. As speeds increase, the difficulty in reducing bit error rates also increases. Channels have differing characteristics, which may change over time, and any error correction employed must be capable of operating at extremely high speeds. In this paper, a linear support vector machine is used to classify large-scale data sets of simulated optical transmission data in order to demonstrate their effectiveness at reducing bit error rates and their adaptability to the specifics of each channel. For the classification, LIBLINEAR is used, which is related to the popular LIBSVM classifier. It is found that is possible to reduce the error rate on a very noisy channel to about 3 bits in a thousand. This is done by a linear separator that can be built in hardware and can operate at the high speed required of an operationally useful decoder.

An adaptive, reconfigurable interconnect for computational clusters

This paper describes the principles of an original adaptive interconnect for a computational clus... more This paper describes the principles of an original adaptive interconnect for a computational cluster. Torus topology (2d or 3d) is used as a basis but nodes are allowed to effectively migrate along the torus cycles. An optoelectronic scheme which makes such migrations possible with only local synchronisation is outlined. Between the instances of migration the interconnect behaves as a direct packet-routing network which constantly monitors its traffic parameters. A decentralised predictive algorithm is applied periodically to decide whether the current topology is consistent with the predominant traffic flow and if it is not, a reconfiguration to a better-matched topology occurs. We present simulation results that show that on some standard computational benchmarks a significant speedup is possible as a result of automatic matching between the effective topology of the application's message-passing infrastructure and that of the interconnect.

Use of F-code as a very high level intermediate laguage for DSP

Lecture Notes in Computer Science, 1997

Block diagram languages provide an effective approach to developing Digital Signal Processing app... more Block diagram languages provide an effective approach to developing Digital Signal Processing applications. The tools that support block diagram languages use existing compilation systems to produce code. The inefficiencies of the compilation systems are compounded with inefficiencies interfacing to them. Generation of intermediate code direct from the block diagram bypasses these inefficiencies. We describe the direct generation of F-code, a

A simple yet accurate neural branch predictor

In this paper, we examine the application of simple neural processing elements to the problem of ... more In this paper, we examine the application of simple neural processing elements to the problem of dynamic branch prediction in high-performance processors. A single neural network model is considered: the Perceptron. We demonstrate that a predictor based on the Perceptron can achieve a prediction accuracy in excess of that given by conventional Two-level Adaptive Predictors and suggest that neural predictors merit further investigation.

Concurrency engineering

This is a discussion paper on a very important topic that is about to become mainstream. It deals... more This is a discussion paper on a very important topic that is about to become mainstream. It deals with the issues of software engineering in concurrent systems. It introduces this topic and illustrates the arguments for a change of perspective. It underlines these arguments with two examples, an asynchronous stream-based programming model and an asynchronous thread-based virtual machine model. Both support concurrency on very different abstractions but both capture similar support for concurrency engineering.

Patterning effects in a WDM RZ-DBPSK SMF/DCF optical transmission at 40Gbit/s channel rate

Optics Communications, Sep 1, 2007

Through extensive direct modelling we quantify the error statistics and patterning effects in a W... more Through extensive direct modelling we quantify the error statistics and patterning effects in a WDM RZ-DBPSK SMF/DCF fibre link using hybrid Raman/ EDFA amplification at 40 Gbit/s channel rate. We examine the BER improvement through skewed channel precoding reducing the frequency of appearance of the triplets 101 and 010 in a long data stream.

Information-Theory Analysis of Skewed Coding for Suppression of Pattern-Dependent Errors in Digital Communications

IEEE Transactions on Communications, Feb 1, 2007

We present information-theory analysis of the tradeoff between bit-error rate improvement and the... more We present information-theory analysis of the tradeoff between bit-error rate improvement and the data-rate loss using skewed channel coding to suppress pattern-dependent errors in digital communications. Without loss of generality, we apply developed general theory to the particular example of a high-speed fiber communication system with a strong patterning effect.

The effect of weak anisotropy of the source on the Kolmogorov spectrum of acoustic turbulence

Soviet physics. Doklady, Jul 1, 1988

Configuring Cloud-Service Interfaces Using Flow Inheritance

arXiv (Cornell University), Oct 25, 2016

Technologies for composition of loosely-coupled web services in a modular and flexible way are in... more Technologies for composition of loosely-coupled web services in a modular and flexible way are in high demand today. On the one hand, the services must be flexible enough to be reused in a variety of contexts. On the other hand, they must be specific enough so that their composition may be provably consistent. The existing technologies (WS-CDL, WSCI and session types) require a behavioural contract associated with each service, which is impossible to derive automatically. Furthermore, neither technology supports flow inheritance: a mechanism that automatically and transparently propagates data through service pipelines. This paper presents a novel mechanism for automatic interface configuration of such services. Instead of checking consistency of the behavioural contracts, our approach focuses solely on that of data formats in the presence of subtyping, polymorphism and flow inheritance. The paper presents a toolchain that automatically derives service interfaces from the code and performs interface configuration taking non-local constraints into account. Although the configuration mechanism is global, the services are compiled separately. As a result, the mechanism does not raise source security issues despite global service availability in adaptable form.

Special issue on Micro-grids – Guest Editor Introduction

International Journal of Parallel Programming, Jun 1, 2006

Adaptive electrical signal post-processing in optical communications systems

Artificial Neural Networks and Intelligent Information Processing, May 1, 2008

Improving bit error rates in optical communication systems is a difficult and important problem. ... more Improving bit error rates in optical communication systems is a difficult and important problem. The error correction must take place at high speed and be extremely accurate. We show the feasibility of using hardware implementable machine learning techniques. This may enable some error correction at the speed required.

A Constraint Satisfaction Method for Configuring Non-local Service Interfaces

Lecture Notes in Computer Science, 2016

Modularity and decontextualisation are core principles of a service-oriented architecture. Howeve... more Modularity and decontextualisation are core principles of a service-oriented architecture. However, the principles are often lost when it comes to an implementation of services, as a result of a rigidly defined service interface. The interface, which defines a data format, is typically specific to a particular context and its change entails significant redevelopment costs. This paper focuses on a twofold problem. On the one hand, the interface description language must be flexible enough for maintaining service compatibility in a variety of different contexts without modification of the service itself. On the other hand, the composition of interfaces in a distributed environment must be provably consistent. The existing approaches for checking compatibility of service choreographies are either inflexible (WS-CDL and WSCI) or require behaviour specification associated with each service, which is often impossible to provide in practice. We present a novel approach for automatic interface configuration in distributed stream-connected components operating as closed-source services (i.e. the behavioural protocol is unknown). We introduce a Message Definition Language (MDL), which can extend the existing interfaces description languages, such as WSDL, with support of subtyping, inheritance and polymorphism. The MDL supports configuration variables that link input and output interfaces of a service and propagate requirements over an application graph. We present an algorithm that solves the interface reconciliation problem using constraint satisfaction that relies on Boolean satisfiability as a subproblem.

RETRAN: a recurrent paradigm for massively parallel array computing

An applicative paradigm of parallel array processing based on recurrence relations and a data-par... more An applicative paradigm of parallel array processing based on recurrence relations and a data-parallel overloading of constants is presented. It is shown that the suggested principle of anti-currying together with introduction of function-based, eager arrays result in a denotational system superior to array extensions of pragmatic languages in that it can exploit spatial symmetries of arrays to unify the notation. The main novelty here is completely asynchronous treatment of arrays of arrow types (arrays of possibly array-valued functions) which lends itself nicely to a massively parallel data-flow implementation with yet static scheduling due to the imposed strictness of the array constructor. The evolution of data is defined in the tradition form of stream transformation.<<ETX>>

Dynamic evaluation strategy for fine-grain data-parallel computing

IEE proceedings, 1996

The placement of elemental operations (as opposed to data) of a data-driven dataparallel computat... more The placement of elemental operations (as opposed to data) of a data-driven dataparallel computation in a network of processors is examined. A fast suboptimal algorithm is proposed for such placement which tends to minimise the overall network load when the computation is essentially nonlocal. The cases of grid, torus and hypercube topology are considered. It is shown that the proposed algorithm, while having moderate computational complexity, demonstrates up to a 50% reduction in required network throughput over some straightforward placement schemes in the practical range of network sizes.

A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections

We present a programming methodology and runtime performance case study comparing the declarative... more We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-NET with Intel's Concurrent Collections (CnC). As a coordination language S-NET achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous data flow streaming network. We investigate the merits of S-NET and CnC with the help of a relevant and non-trivial linear algebra problem: tiled Cholesky decomposition. We describe two alternative S-NET implementations of tiled Cholesky factorization and compare them with two CnC implementations, one with explicit performance tuning and one without, that have previously been used to illustrate Intel CnC. Our experiments on a 48-core machine demonstrate that S-NET manages to outperform CnC on this problem.

Guest Editors’ Editorial: Special Issue on the Second International Workshop on Microgrids

International Journal of Parallel Programming, Oct 27, 2009

Arrays and streams, this is what multi-/manycore computing could soon be all about. Indeed array ... more Arrays and streams, this is what multi-/manycore computing could soon be all about. Indeed array processing is the basis of much DSP, machine graphics and multimedia, while streams are a universal glue that keeps processing pipelines chugging away at full speed. Both possess regularity and both are amenable to heterogeneity. The regularity of arrays is spatial, and of streams temporal. Heterogeneity is a requirement that stems from specialisation of computing and communication resources according to stable patterns of usage. The regularity makes it possible to develop compiler optimisations and hierarchical abstractions to achieve both expressiveness and performance. But it is heterogeneity that gives computing platforms their adaptivity and improves the ability to specialize at run time, when the power of static analyses and optimisations has been exhausted. The second Microgrids Workshop that took place in Hertfordshire in December 2007 explored the issues of regularity and heterogeneity at great length. Out of 2 days of talks, four presentations have been selected for publication as journal papers, for which extended versions have been solicited from the authors. After substantial revisions, we present them here in a special issue of IJPP. The motifs of array computing and stream processing permeate all four publications. The first, "Compilation techniques for high level parallel code" contributed by authors from the AMD Corp., ClearSpeed and XMOS, the latter two being parallel computing specialist ventures, focuses on retargeting existing compilers towards dataparallel computing platforms. This is done by extending the datatypes of conventional C by attributes that indicate whether a particular C statement is to be executed on each processor element or only once by the master thread. For a distributed control statement this introduces heterogeneity, which is the focus of this work. Heterogeneity

RETRAN: a recurrent paradigm for data-parallel computing

Oxford University Press eBooks, Sep 14, 1995

Symmetries in data parallelism

The Computer Journal, May 1, 1995

A Massively Aliasing System for Support of Agent-based Internet Computing

Parallel and Distributed Processing Techniques and Applications, 1999

Mitigation of Patterning Effects at 40 Gb/s by Skewed Channel Pre-Encoding

Through modelling of direct error computation, a reduction of pattern- dependent errors in a stan... more

Linear Support Vector Machines for Error Correction in Optical Data Transmission

Lecture Notes in Computer Science, 2013

Reduction of bit error rates in optical transmission systems is an important task that is difficu... more Reduction of bit error rates in optical transmission systems is an important task that is difficult to achieve. As speeds increase, the difficulty in reducing bit error rates also increases. Channels have differing characteristics, which may change over time, and any error correction employed must be capable of operating at extremely high speeds. In this paper, a linear support vector machine is used to classify large-scale data sets of simulated optical transmission data in order to demonstrate their effectiveness at reducing bit error rates and their adaptability to the specifics of each channel. For the classification, LIBLINEAR is used, which is related to the popular LIBSVM classifier. It is found that is possible to reduce the error rate on a very noisy channel to about 3 bits in a thousand. This is done by a linear separator that can be built in hardware and can operate at the high speed required of an operationally useful decoder.

An adaptive, reconfigurable interconnect for computational clusters

This paper describes the principles of an original adaptive interconnect for a computational clus... more This paper describes the principles of an original adaptive interconnect for a computational cluster. Torus topology (2d or 3d) is used as a basis but nodes are allowed to effectively migrate along the torus cycles. An optoelectronic scheme which makes such migrations possible with only local synchronisation is outlined. Between the instances of migration the interconnect behaves as a direct packet-routing network which constantly monitors its traffic parameters. A decentralised predictive algorithm is applied periodically to decide whether the current topology is consistent with the predominant traffic flow and if it is not, a reconfiguration to a better-matched topology occurs. We present simulation results that show that on some standard computational benchmarks a significant speedup is possible as a result of automatic matching between the effective topology of the application's message-passing infrastructure and that of the interconnect.

Use of F-code as a very high level intermediate laguage for DSP

Lecture Notes in Computer Science, 1997

Block diagram languages provide an effective approach to developing Digital Signal Processing app... more Block diagram languages provide an effective approach to developing Digital Signal Processing applications. The tools that support block diagram languages use existing compilation systems to produce code. The inefficiencies of the compilation systems are compounded with inefficiencies interfacing to them. Generation of intermediate code direct from the block diagram bypasses these inefficiencies. We describe the direct generation of F-code, a

A simple yet accurate neural branch predictor

In this paper, we examine the application of simple neural processing elements to the problem of ... more In this paper, we examine the application of simple neural processing elements to the problem of dynamic branch prediction in high-performance processors. A single neural network model is considered: the Perceptron. We demonstrate that a predictor based on the Perceptron can achieve a prediction accuracy in excess of that given by conventional Two-level Adaptive Predictors and suggest that neural predictors merit further investigation.

Concurrency engineering

This is a discussion paper on a very important topic that is about to become mainstream. It deals... more This is a discussion paper on a very important topic that is about to become mainstream. It deals with the issues of software engineering in concurrent systems. It introduces this topic and illustrates the arguments for a change of perspective. It underlines these arguments with two examples, an asynchronous stream-based programming model and an asynchronous thread-based virtual machine model. Both support concurrency on very different abstractions but both capture similar support for concurrency engineering.

Patterning effects in a WDM RZ-DBPSK SMF/DCF optical transmission at 40Gbit/s channel rate

Optics Communications, Sep 1, 2007

Through extensive direct modelling we quantify the error statistics and patterning effects in a W... more Through extensive direct modelling we quantify the error statistics and patterning effects in a WDM RZ-DBPSK SMF/DCF fibre link using hybrid Raman/ EDFA amplification at 40 Gbit/s channel rate. We examine the BER improvement through skewed channel precoding reducing the frequency of appearance of the triplets 101 and 010 in a long data stream.

Information-Theory Analysis of Skewed Coding for Suppression of Pattern-Dependent Errors in Digital Communications

IEEE Transactions on Communications, Feb 1, 2007

We present information-theory analysis of the tradeoff between bit-error rate improvement and the... more We present information-theory analysis of the tradeoff between bit-error rate improvement and the data-rate loss using skewed channel coding to suppress pattern-dependent errors in digital communications. Without loss of generality, we apply developed general theory to the particular example of a high-speed fiber communication system with a strong patterning effect.

The effect of weak anisotropy of the source on the Kolmogorov spectrum of acoustic turbulence

Soviet physics. Doklady, Jul 1, 1988

Configuring Cloud-Service Interfaces Using Flow Inheritance

arXiv (Cornell University), Oct 25, 2016

Technologies for composition of loosely-coupled web services in a modular and flexible way are in... more Technologies for composition of loosely-coupled web services in a modular and flexible way are in high demand today. On the one hand, the services must be flexible enough to be reused in a variety of contexts. On the other hand, they must be specific enough so that their composition may be provably consistent. The existing technologies (WS-CDL, WSCI and session types) require a behavioural contract associated with each service, which is impossible to derive automatically. Furthermore, neither technology supports flow inheritance: a mechanism that automatically and transparently propagates data through service pipelines. This paper presents a novel mechanism for automatic interface configuration of such services. Instead of checking consistency of the behavioural contracts, our approach focuses solely on that of data formats in the presence of subtyping, polymorphism and flow inheritance. The paper presents a toolchain that automatically derives service interfaces from the code and performs interface configuration taking non-local constraints into account. Although the configuration mechanism is global, the services are compiled separately. As a result, the mechanism does not raise source security issues despite global service availability in adaptable form.

Special issue on Micro-grids – Guest Editor Introduction

International Journal of Parallel Programming, Jun 1, 2006

Adaptive electrical signal post-processing in optical communications systems

Artificial Neural Networks and Intelligent Information Processing, May 1, 2008

Improving bit error rates in optical communication systems is a difficult and important problem. ... more Improving bit error rates in optical communication systems is a difficult and important problem. The error correction must take place at high speed and be extremely accurate. We show the feasibility of using hardware implementable machine learning techniques. This may enable some error correction at the speed required.

A Constraint Satisfaction Method for Configuring Non-local Service Interfaces

Lecture Notes in Computer Science, 2016

Modularity and decontextualisation are core principles of a service-oriented architecture. Howeve... more Modularity and decontextualisation are core principles of a service-oriented architecture. However, the principles are often lost when it comes to an implementation of services, as a result of a rigidly defined service interface. The interface, which defines a data format, is typically specific to a particular context and its change entails significant redevelopment costs. This paper focuses on a twofold problem. On the one hand, the interface description language must be flexible enough for maintaining service compatibility in a variety of different contexts without modification of the service itself. On the other hand, the composition of interfaces in a distributed environment must be provably consistent. The existing approaches for checking compatibility of service choreographies are either inflexible (WS-CDL and WSCI) or require behaviour specification associated with each service, which is often impossible to provide in practice. We present a novel approach for automatic interface configuration in distributed stream-connected components operating as closed-source services (i.e. the behavioural protocol is unknown). We introduce a Message Definition Language (MDL), which can extend the existing interfaces description languages, such as WSDL, with support of subtyping, inheritance and polymorphism. The MDL supports configuration variables that link input and output interfaces of a service and propagate requirements over an application graph. We present an algorithm that solves the interface reconciliation problem using constraint satisfaction that relies on Boolean satisfiability as a subproblem.

RETRAN: a recurrent paradigm for massively parallel array computing

An applicative paradigm of parallel array processing based on recurrence relations and a data-par... more An applicative paradigm of parallel array processing based on recurrence relations and a data-parallel overloading of constants is presented. It is shown that the suggested principle of anti-currying together with introduction of function-based, eager arrays result in a denotational system superior to array extensions of pragmatic languages in that it can exploit spatial symmetries of arrays to unify the notation. The main novelty here is completely asynchronous treatment of arrays of arrow types (arrays of possibly array-valued functions) which lends itself nicely to a massively parallel data-flow implementation with yet static scheduling due to the imposed strictness of the array constructor. The evolution of data is defined in the tradition form of stream transformation.<<ETX>>

Dynamic evaluation strategy for fine-grain data-parallel computing

IEE proceedings, 1996

The placement of elemental operations (as opposed to data) of a data-driven dataparallel computat... more The placement of elemental operations (as opposed to data) of a data-driven dataparallel computation in a network of processors is examined. A fast suboptimal algorithm is proposed for such placement which tends to minimise the overall network load when the computation is essentially nonlocal. The cases of grid, torus and hypercube topology are considered. It is shown that the proposed algorithm, while having moderate computational complexity, demonstrates up to a 50% reduction in required network throughput over some straightforward placement schemes in the practical range of network sizes.