Twelfth Annual IEEE International ASIC/SOC Conference (Cat. No.99TH8454)
A novel methodology for realizing Globally-Asynchronous Locally-Synchronous (GALS) architectures ... more A novel methodology for realizing Globally-Asynchronous Locally-Synchronous (GALS) architectures is reported. We developed a library of predesigned modules that facilitate the assembly of independently clocked modules to on-chip systems. The components of this library establish high-performance data exchange channels which are instrumental in constructing flexible architectures. The validity of our concept is proven by applying it to an ASIC design
Embedded cores are gaining widespread use to deal with the complex DSP systems where flexibility ... more Embedded cores are gaining widespread use to deal with the complex DSP systems where flexibility is of utmost importance. The design of such a system offers several problems, which is not addressed by the existing methodology. The authors previously presented an integrated grammar based DSP design methodology that separates architectural and functional specification, can create a virtual prototype and has
ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349), 1999
Clock nets are the major source of power consumption in large, high-performance ASICs and a desig... more Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way to obviate the global clock net is to partition the design into large synchronous blocks each having its own clock. Data with other blocks is exchanged asynchronously using handshake signals. Adopting such a strategy requires a methodology that supports: 1) a partitioning method dividing a design into the number of synchronous blocks such that the gain due to global clock net removal exceeds the communication overhead and 2) synthesis of handshake protocols to implement the data transfer between synchronous blocks. We describe this methodology and present results of applying it to a realistic design done in 0.25 micron, ranging in operating frequencies from 20 Mhz to 1 GHz. The results show that the net power savings compared to fully synchronous designs are on an average about 30 %.
We present an analysis of a fully automatic method to accelerate standard software in C or C++ by... more We present an analysis of a fully automatic method to accelerate standard software in C or C++ by use of field programmable gate arrays. Traditional compiler techniques are applied to the hardware/software partitioning problem and a compiler is linked to state of the art hardware synthesis tools. Time critical regions are identified by means of profiling and are automatically implemented in user programmable logic with high level and logic synthesis design tools. The underlying architecture is an add-on board with user programmable logic connected to a Spare based workstation via the system bus. We present an analysis and case study of this method. Eight programs are used as test cases and the data collected by applying this method to programs is used to discuss potentials and limitations of this and similar methods. We discuss architectural parameters, programming language properties, and analysis techniques
High Level Synthesis has made it possible to describe designs at behavioural level in lan- guages... more High Level Synthesis has made it possible to describe designs at behavioural level in lan- guages like VHDL and to synthesize detailed circuits automatically. Recently , new ways of describing and synthesizing control-dominated communicating hardware and protocols has sailed up to challenge HLS as the method for the future: hardware synthesis from grammar- based specifications. Grammar-based specifications allows the designer to specify the behaviour of contr ol- dominated designs at a high-level in terms of sequences of incoming symbols that together forms a valid input sentence. When a meaningful sequence has been detected the associated action is performed. In this case study, we compare the results obtained using an in-house hardware compiler for a grammar-based specification language called Pr oGram with the results obtained using three other modern methods: 1) the in-house CMIST appr oach to HLS, 2) Conventional HLS and 3) Direct synthesis of Behavioural RTL VHDL.
In this paper we show how the High Level Synthesis (HLS) tool can efficiently be used for DSP ASI... more In this paper we show how the High Level Synthesis (HLS) tool can efficiently be used for DSP ASIC development. The performance of general HLS tool is improved with simple transformations and code optimizations, and a direct mapping to technology optimized parameterizable ASIC Register Transfer Level (RTL) library. The library mapping contains three phases: a structure recognition, an architecture selection and a parameter optimization. As an optimization framework SYNT, Synopsys and Matlab design environments are integrated. Lsi10k and Xilinx 4000 series are used as target technologies to demonstrate the performance of the approach
Even though high-level and logic synthesis tools can speed up the design process, the growing com... more Even though high-level and logic synthesis tools can speed up the design process, the growing complexity of designs together with the number of tradeoffs possible during syn- thesis makes it impossible to exhaustively search the design space. Estimation tools to predict the impact of tentative design decisions can be extremely beneficial in reducing the iteration time. Besides the use of estimators as an exploratory tool, they can also serve as a potent guide for the synthesis tools. We present a novel approach to estimation based on a multilayer feed-forward neural network. The main benefit of using the neural network based estimator compared to a predefined analytical function is to build the estimation function during the learning process, which allows the neural net to learn arbitrary complex estimation functions. This flexible approach allows to tune on varying application areas and design environments.
Introduction 1 Overview . Part I: Introduction, Axel Jantsch, KTH . Part II: Physical Issues in N... more Introduction 1 Overview . Part I: Introduction, Axel Jantsch, KTH . Part II: Physical Issues in NOCs, Li-Rong Zheng, KTH . Part III: Introduction to concepts in parallel computing, Martti Forsell, VTT . Part VI: NOC Architecture, Axel Jantsch, KTH . Part V: A NOC Design Methodology, Juha Pekka Soininen, VTT A. Jantsch, KTH ESSCIRC, September 2001, Villach Introduction 2 The Challenge 10 processors 10 processors # Gates # Processors Year 6 M 4 2000 24 M 16 2003 96 M 64 2006 384 M 256 2009 A. Jantsch, KTH ESSCIRC, September 2001, Villach Introduction 3 Functions, Architecture, and Physics Concurrent processes Large number of resoucres Physical issues A. Jantsch, KTH ESSCIRC, September 2001, Villach Introduction 4 Challenge Areas: Physical Issues . Deep submicron e#ects, nois
An efficient hardware implementation of Gaussian Random Number (GRN) generator based on Central L... more An efficient hardware implementation of Gaussian Random Number (GRN) generator based on Central Limit Theorem (CLT) is presented. CLT, although very simple to implement, is never used to generate high quality Gaussian numbers. This is due to the fact that direct implementation of CLT provides very poor accuracy in tail regions of the probability density function. In this work, we have shown that it is possible to achieve high tail accuracy by empirically computing the error in CLT, which can be compensated with a simple correction algorithm. The error has been modeled as first degree piece-wise polynomial approximation, using a novel non-uniform segmentation algorithm to compute the coefficients of polynomial segments. A novel hardware architecture of GRN generator is presented which requires only 420 slices and 1 DSP block of Xilinx Virtex-4 XC4VLX15 operating at 220 MHz. This resource utilization is better than any of the previously reported designs. Demonstrated for the tail accu...
... of Electronics.). Hemani, Ahmed. Ellervee, Peeter. Postula, Adam (Univ. of Queensland). Öberg... more ... of Electronics.). Hemani, Ahmed. Ellervee, Peeter. Postula, Adam (Univ. of Queensland). Öberg,Johnny. Jantsch, Axel. Tenhunen, Hannu (KTH, Superseded Departments, Electronic Systems Design). Title: Modelling and Synthesis of Operational and Management System (OAM ...
Proceedings of the IEEE International Conference on VLSI Design
Communication sub-systems that deal with switching, routing and protocol implementation often hav... more Communication sub-systems that deal with switching, routing and protocol implementation often have their functionality dominated by control logic and interaction with memory. Synthesis of such Control and Memory Intensive Systems (hereafter abbreviated to CMISTs) poses demands that in the past have not been met satisfactorily by general purpose high-level synthesis (HLS) tools and have led to several research efforts to address these demands. In this paper we: characterise CMISTs from the synthesis viewpoint, contend that the synthesis demands of CMISTs can be met within the framework of a general purpose High-level synthesis tool, by making parts of it adaptive to the input, rather than develop a complete tool for a particular type of application, present an allocation strategy that automatically adapts for CMISTs, present the Operation and Maintenance (OAM) Protocol of the ATM, its modelling in VHDL and synthesis aspects of the VHDL model, present the results of applying the synth...
We present a grammar based specification method for hardware synthesis of data communication prot... more We present a grammar based specification method for hardware synthesis of data communication protocols in which the specification is independent of the port size. Instead, it is used during the synthesis process as a constraint. When the width of the output assignments exceed the chosen out-put port width, the assignments are split and scheduled over the available states. We present a solution to this problem and results of applying it to some relevant problems.
Abstract The emergence of intellectual property blocks and embedded cores has made interface desi... more Abstract The emergence of intellectual property blocks and embedded cores has made interface design one of the crucial design problems. The validation of these interfaces can often only be done at the system level, when all the involved blocks are developed and ...
We propose a conceptual framework, called the Rugby Model, in which designs, design processes and... more We propose a conceptual framework, called the Rugby Model, in which designs, design processes and design tools can be studied. The model has similar objectives as the well known Y chart (1) but its scope is extended to handle designs and design processes required for complex systems requiring concurrent proc- esses and mixed HW/SW implementation. The Rugby model has four domains, namely, Computation, Com- munication, Data and Time. The behavioural domain of the Y chart is replaced with more restricted computation domain. The structural and physical domain of the Y chart are merged into a more generic domain called Communication. The new domains Data and Time have become necessary to model data abstractions used at various levels of design, and to explicitly model timing constraints at various levels in the design process, respectively. We show that the Rugby model is able to represent mixed HW/SW designs and design processes for HW/SW codesign at various levels of abstraction. It no...
We propose a conceptual framework, called the Rugby Model, in which designs, design processes and... more We propose a conceptual framework, called the Rugby Model, in which designs, design processes and design tools can be studied. It is an extension of the Y chart and adds two dimensions for design representation, namely Data and Time. The behavioural domain of Y chart is replaced by a more restricted domain called Computation. The structural and physical domains of Y chart are merged into a more general domain called Communication. A fifth dimension deals with design manipulations and transfor- mations at three abstraction levels. The model shall establish a common understanding of modelling and design process concepts for communication and education in the community. In a case study we illus- trate how a design can be characterized with the concepts of the Rugby model.
this paper, we describe the design of an Internal Representation for our Hardware-Software codesi... more this paper, we describe the design of an Internal Representation for our Hardware-Software codesign environment. We envisage that our environment will allow specification in a mix of many languages like SDL, Matlab, C etc. The tool is expected to integrate synthesis, partitioning, co-simulation, testing and formal verification, performance
Twelfth Annual IEEE International ASIC/SOC Conference (Cat. No.99TH8454)
A novel methodology for realizing Globally-Asynchronous Locally-Synchronous (GALS) architectures ... more A novel methodology for realizing Globally-Asynchronous Locally-Synchronous (GALS) architectures is reported. We developed a library of predesigned modules that facilitate the assembly of independently clocked modules to on-chip systems. The components of this library establish high-performance data exchange channels which are instrumental in constructing flexible architectures. The validity of our concept is proven by applying it to an ASIC design
Embedded cores are gaining widespread use to deal with the complex DSP systems where flexibility ... more Embedded cores are gaining widespread use to deal with the complex DSP systems where flexibility is of utmost importance. The design of such a system offers several problems, which is not addressed by the existing methodology. The authors previously presented an integrated grammar based DSP design methodology that separates architectural and functional specification, can create a virtual prototype and has
ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349), 1999
Clock nets are the major source of power consumption in large, high-performance ASICs and a desig... more Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way to obviate the global clock net is to partition the design into large synchronous blocks each having its own clock. Data with other blocks is exchanged asynchronously using handshake signals. Adopting such a strategy requires a methodology that supports: 1) a partitioning method dividing a design into the number of synchronous blocks such that the gain due to global clock net removal exceeds the communication overhead and 2) synthesis of handshake protocols to implement the data transfer between synchronous blocks. We describe this methodology and present results of applying it to a realistic design done in 0.25 micron, ranging in operating frequencies from 20 Mhz to 1 GHz. The results show that the net power savings compared to fully synchronous designs are on an average about 30 %.
We present an analysis of a fully automatic method to accelerate standard software in C or C++ by... more We present an analysis of a fully automatic method to accelerate standard software in C or C++ by use of field programmable gate arrays. Traditional compiler techniques are applied to the hardware/software partitioning problem and a compiler is linked to state of the art hardware synthesis tools. Time critical regions are identified by means of profiling and are automatically implemented in user programmable logic with high level and logic synthesis design tools. The underlying architecture is an add-on board with user programmable logic connected to a Spare based workstation via the system bus. We present an analysis and case study of this method. Eight programs are used as test cases and the data collected by applying this method to programs is used to discuss potentials and limitations of this and similar methods. We discuss architectural parameters, programming language properties, and analysis techniques
High Level Synthesis has made it possible to describe designs at behavioural level in lan- guages... more High Level Synthesis has made it possible to describe designs at behavioural level in lan- guages like VHDL and to synthesize detailed circuits automatically. Recently , new ways of describing and synthesizing control-dominated communicating hardware and protocols has sailed up to challenge HLS as the method for the future: hardware synthesis from grammar- based specifications. Grammar-based specifications allows the designer to specify the behaviour of contr ol- dominated designs at a high-level in terms of sequences of incoming symbols that together forms a valid input sentence. When a meaningful sequence has been detected the associated action is performed. In this case study, we compare the results obtained using an in-house hardware compiler for a grammar-based specification language called Pr oGram with the results obtained using three other modern methods: 1) the in-house CMIST appr oach to HLS, 2) Conventional HLS and 3) Direct synthesis of Behavioural RTL VHDL.
In this paper we show how the High Level Synthesis (HLS) tool can efficiently be used for DSP ASI... more In this paper we show how the High Level Synthesis (HLS) tool can efficiently be used for DSP ASIC development. The performance of general HLS tool is improved with simple transformations and code optimizations, and a direct mapping to technology optimized parameterizable ASIC Register Transfer Level (RTL) library. The library mapping contains three phases: a structure recognition, an architecture selection and a parameter optimization. As an optimization framework SYNT, Synopsys and Matlab design environments are integrated. Lsi10k and Xilinx 4000 series are used as target technologies to demonstrate the performance of the approach
Even though high-level and logic synthesis tools can speed up the design process, the growing com... more Even though high-level and logic synthesis tools can speed up the design process, the growing complexity of designs together with the number of tradeoffs possible during syn- thesis makes it impossible to exhaustively search the design space. Estimation tools to predict the impact of tentative design decisions can be extremely beneficial in reducing the iteration time. Besides the use of estimators as an exploratory tool, they can also serve as a potent guide for the synthesis tools. We present a novel approach to estimation based on a multilayer feed-forward neural network. The main benefit of using the neural network based estimator compared to a predefined analytical function is to build the estimation function during the learning process, which allows the neural net to learn arbitrary complex estimation functions. This flexible approach allows to tune on varying application areas and design environments.
Introduction 1 Overview . Part I: Introduction, Axel Jantsch, KTH . Part II: Physical Issues in N... more Introduction 1 Overview . Part I: Introduction, Axel Jantsch, KTH . Part II: Physical Issues in NOCs, Li-Rong Zheng, KTH . Part III: Introduction to concepts in parallel computing, Martti Forsell, VTT . Part VI: NOC Architecture, Axel Jantsch, KTH . Part V: A NOC Design Methodology, Juha Pekka Soininen, VTT A. Jantsch, KTH ESSCIRC, September 2001, Villach Introduction 2 The Challenge 10 processors 10 processors # Gates # Processors Year 6 M 4 2000 24 M 16 2003 96 M 64 2006 384 M 256 2009 A. Jantsch, KTH ESSCIRC, September 2001, Villach Introduction 3 Functions, Architecture, and Physics Concurrent processes Large number of resoucres Physical issues A. Jantsch, KTH ESSCIRC, September 2001, Villach Introduction 4 Challenge Areas: Physical Issues . Deep submicron e#ects, nois
An efficient hardware implementation of Gaussian Random Number (GRN) generator based on Central L... more An efficient hardware implementation of Gaussian Random Number (GRN) generator based on Central Limit Theorem (CLT) is presented. CLT, although very simple to implement, is never used to generate high quality Gaussian numbers. This is due to the fact that direct implementation of CLT provides very poor accuracy in tail regions of the probability density function. In this work, we have shown that it is possible to achieve high tail accuracy by empirically computing the error in CLT, which can be compensated with a simple correction algorithm. The error has been modeled as first degree piece-wise polynomial approximation, using a novel non-uniform segmentation algorithm to compute the coefficients of polynomial segments. A novel hardware architecture of GRN generator is presented which requires only 420 slices and 1 DSP block of Xilinx Virtex-4 XC4VLX15 operating at 220 MHz. This resource utilization is better than any of the previously reported designs. Demonstrated for the tail accu...
... of Electronics.). Hemani, Ahmed. Ellervee, Peeter. Postula, Adam (Univ. of Queensland). Öberg... more ... of Electronics.). Hemani, Ahmed. Ellervee, Peeter. Postula, Adam (Univ. of Queensland). Öberg,Johnny. Jantsch, Axel. Tenhunen, Hannu (KTH, Superseded Departments, Electronic Systems Design). Title: Modelling and Synthesis of Operational and Management System (OAM ...
Proceedings of the IEEE International Conference on VLSI Design
Communication sub-systems that deal with switching, routing and protocol implementation often hav... more Communication sub-systems that deal with switching, routing and protocol implementation often have their functionality dominated by control logic and interaction with memory. Synthesis of such Control and Memory Intensive Systems (hereafter abbreviated to CMISTs) poses demands that in the past have not been met satisfactorily by general purpose high-level synthesis (HLS) tools and have led to several research efforts to address these demands. In this paper we: characterise CMISTs from the synthesis viewpoint, contend that the synthesis demands of CMISTs can be met within the framework of a general purpose High-level synthesis tool, by making parts of it adaptive to the input, rather than develop a complete tool for a particular type of application, present an allocation strategy that automatically adapts for CMISTs, present the Operation and Maintenance (OAM) Protocol of the ATM, its modelling in VHDL and synthesis aspects of the VHDL model, present the results of applying the synth...
We present a grammar based specification method for hardware synthesis of data communication prot... more We present a grammar based specification method for hardware synthesis of data communication protocols in which the specification is independent of the port size. Instead, it is used during the synthesis process as a constraint. When the width of the output assignments exceed the chosen out-put port width, the assignments are split and scheduled over the available states. We present a solution to this problem and results of applying it to some relevant problems.
Abstract The emergence of intellectual property blocks and embedded cores has made interface desi... more Abstract The emergence of intellectual property blocks and embedded cores has made interface design one of the crucial design problems. The validation of these interfaces can often only be done at the system level, when all the involved blocks are developed and ...
We propose a conceptual framework, called the Rugby Model, in which designs, design processes and... more We propose a conceptual framework, called the Rugby Model, in which designs, design processes and design tools can be studied. The model has similar objectives as the well known Y chart (1) but its scope is extended to handle designs and design processes required for complex systems requiring concurrent proc- esses and mixed HW/SW implementation. The Rugby model has four domains, namely, Computation, Com- munication, Data and Time. The behavioural domain of the Y chart is replaced with more restricted computation domain. The structural and physical domain of the Y chart are merged into a more generic domain called Communication. The new domains Data and Time have become necessary to model data abstractions used at various levels of design, and to explicitly model timing constraints at various levels in the design process, respectively. We show that the Rugby model is able to represent mixed HW/SW designs and design processes for HW/SW codesign at various levels of abstraction. It no...
We propose a conceptual framework, called the Rugby Model, in which designs, design processes and... more We propose a conceptual framework, called the Rugby Model, in which designs, design processes and design tools can be studied. It is an extension of the Y chart and adds two dimensions for design representation, namely Data and Time. The behavioural domain of Y chart is replaced by a more restricted domain called Computation. The structural and physical domains of Y chart are merged into a more general domain called Communication. A fifth dimension deals with design manipulations and transfor- mations at three abstraction levels. The model shall establish a common understanding of modelling and design process concepts for communication and education in the community. In a case study we illus- trate how a design can be characterized with the concepts of the Rugby model.
this paper, we describe the design of an Internal Representation for our Hardware-Software codesi... more this paper, we describe the design of an Internal Representation for our Hardware-Software codesign environment. We envisage that our environment will allow specification in a mix of many languages like SDL, Matlab, C etc. The tool is expected to integrate synthesis, partitioning, co-simulation, testing and formal verification, performance
Uploads
Papers by Ahmed Hemani