, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection w... more , except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Proceedings of the 1989 26th ACM/IEEE conference on Design automation conference - DAC '89
Synthesis of digital systems, involves a number of tasks ranging from scheduling to generating in... more Synthesis of digital systems, involves a number of tasks ranging from scheduling to generating interconnections.
Proceedings Design, Automation and Test in Europe Conference and Exhibition
In the context of portable embedded systems, reducing energy is one of the prime objectives. Most... more In the context of portable embedded systems, reducing energy is one of the prime objectives. Most high-end embedded microprocessors include onchip instruction and data caches, along with a small energy efficient scratchpad. Previous approaches for utilizing scratchpad did not consider caches and hence fail for the au courant architecture. In the presented work, we use the scratchpad for storing instructions and propose a generic Cache Aware Scratchpad Allocation (CASA) algorithm. We report an average reduction of 8-29% in instruction memory energy consumption compared to a previously published technique for benchmarks from the Mediabench suite. The scratchpad in the presented architecture is similar to a preloaded loop cache. Comparing the energy consumption of our approach against preloaded loop caches, we report average energy savings of 20-44%.
Proceedings of International Conference on Computer Aided Design
This paper presents DSP code optimization techniques, which originate from dedicated memory addre... more This paper presents DSP code optimization techniques, which originate from dedicated memory address generation hardware. We de ne a generic model of DSP address generation units. Based on this model, we present e cient heuristics for computing memory layouts for program variables, which optimize utilization of parallel address generation units. Improvements and generalizations of previous work are described, and the e cacy of the proposed algorithms is demonstrated through experimental evaluation.
Proceedings European Design and Test Conference. ED & TC 97
Besides high code quality, a primary issue in embedded c o de generation is retargetability of co... more Besides high code quality, a primary issue in embedded c o de generation is retargetability of code generators. This paper presents techniques for automatic generation of code selectors from externally speci ed p r o c essor models. In contrast to previous work, our retargetable compiler Record does not require t o olspeci c modelling formalisms, but starts from general HDL processor models. From an HDL model, all processor aspects needed for code generation are automatically derived. As demonstrated by experimental results, short turnaround times for retargeting are achieved, which permits to study the HW SW trade-o between processor architectures and program execution speed.
Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition
The number of embedded systems is increasing and a remarkable percentage is designed as mobile ap... more The number of embedded systems is increasing and a remarkable percentage is designed as mobile applications. For the latter, the energy consumption is a limiting factor because of today's battery capacities. Besides the processor, memory accesses consume a high amount of energy. The use of additional less power hungry memories like caches or scratchpads is thus common. Caches incorporate the hardware control logic for moving data in and out automatically. On the other hand, this logic requires chip area and energy. A scratchpad memory is much more energy efficient, but there is a need for software control of its content. In this paper, an algorithm integrated into a compiler is presented which analyses the application and selects program and data parts which are placed into the scratchpad. Comparisons against a cache solution show remarkable advantages between 12% and 43% in energy consumption for designs of the same memory size. 1
Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition
tion in code generation for embedded digital signal processors. Recent work has shown that this t... more tion in code generation for embedded digital signal processors. Recent work has shown that this task can be efciently solved b y t r e e c overing with dynamic programming, even in combination with the task of register allocation. However, performing instruction selection by tree c overing only does not exploit available instructionlevel parallelism, for instance in form of multiplyaccumulate instructions or parallel data moves. In this paper we investigate how such complex instructions may aect detection of optimal tree c overs, and we present a two-phase scheme for instruction selection which exploits available instruction-level parallelism. At the expense of higher compilation time, this technique may signicantly increase the code quality compared t o p r evious work, which is demonstrated for a widespread DSP.
A compiler for the generation of microcode for a high-level microprogramming language is presente... more A compiler for the generation of microcode for a high-level microprogramming language is presented. The compiler is target machine independent. The input to the compiler consists of a hardware description, a high-level microprogram and a set of program transformation rules. The compiler is able to take advantage of optimization techniques which are used by microprogrammers because many of these can be represented by program transformation rules.
The M I M O L A design method is a method for the design of digital processors from a very high-l... more The M I M O L A design method is a method for the design of digital processors from a very high-level bevavioral specification. A key feature of this method is the synthesis of a processor from a description of programs which are expected to be typical for the applications of that processor. Design cycles, in which the designer tries to improve automatically generated hardware structures, are supported by a retargetable ~i c r o c o d e generator and by an utilization and performance analyzer. This paper describes the design method, available software tools and some applications.
Journal of Statistical Computation and Simulation, 2014
Many different models for the analysis of high-dimensional survival data have been developed over... more Many different models for the analysis of high-dimensional survival data have been developed over the past years. While some of the models and implementations come with an internal parameter tuning automatism, others require the user to accurately adjust defaults, which often feels like a guessing game. Exhaustively trying out all model and parameter combinations will quickly become tedious or infeasible in computationally intensive settings, even if parallelization is employed. Therefore, we propose to use modern algorithm configuration techniques, e. g., iterated F-racing, to efficiently move through the model hypothesis space and to simultaneously configure algorithm classes and their respective hyperparameters. In our application we study four lung cancer microarray data sets. For these we configure a predictor based on five survival analysis algorithms in combination with eight feature selection filters. We parallelize the optimization and all comparison experiments with the BatchJobs and BatchExperiments R packages.
One of the key problems in hardware software codesign is hardware software partitioning. This pap... more One of the key problems in hardware software codesign is hardware software partitioning. This paper describes a new approach to hardware software partitioning using integer programming IP. The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach w orks fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioningare needed. The paper presents an algorithm using integer programming for solving the hardware software partitioning problem leading to promising results.
Design automation for embedded systems comprising both hardware and software components demands f... more Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement.
Proceedings the European Design and Test Conference. ED&TC 1995
In this paper we present a uni ed frontend for retargetable compilers that performs analysis of t... more In this paper we present a uni ed frontend for retargetable compilers that performs analysis of the target processor model. Our approach bridges the gap between structural and behavioral processor models for retargetable compilation. This is achieved by means of instruction set extraction. The extraction technique is based on a BDD data structure which signi cantly improves control signal analysis in the target processor compared to previous approaches. 1
Proceedings ED&TC European Design and Test Conference
One of the key problems in hardware/software codesign is hardware/software partitioning. This pap... more One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated respective to the chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating the cost metrics is compensated by an improved quality of the estimations compared to the results of estimators. Therefore, fewer iteration steps of partitioning will be needed. The paper will show that using integer programming to solve the hardware/software partitioning problem is feasible and leads to promising results.
, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection w... more , except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Proceedings of the 1989 26th ACM/IEEE conference on Design automation conference - DAC '89
Synthesis of digital systems, involves a number of tasks ranging from scheduling to generating in... more Synthesis of digital systems, involves a number of tasks ranging from scheduling to generating interconnections.
Proceedings Design, Automation and Test in Europe Conference and Exhibition
In the context of portable embedded systems, reducing energy is one of the prime objectives. Most... more In the context of portable embedded systems, reducing energy is one of the prime objectives. Most high-end embedded microprocessors include onchip instruction and data caches, along with a small energy efficient scratchpad. Previous approaches for utilizing scratchpad did not consider caches and hence fail for the au courant architecture. In the presented work, we use the scratchpad for storing instructions and propose a generic Cache Aware Scratchpad Allocation (CASA) algorithm. We report an average reduction of 8-29% in instruction memory energy consumption compared to a previously published technique for benchmarks from the Mediabench suite. The scratchpad in the presented architecture is similar to a preloaded loop cache. Comparing the energy consumption of our approach against preloaded loop caches, we report average energy savings of 20-44%.
Proceedings of International Conference on Computer Aided Design
This paper presents DSP code optimization techniques, which originate from dedicated memory addre... more This paper presents DSP code optimization techniques, which originate from dedicated memory address generation hardware. We de ne a generic model of DSP address generation units. Based on this model, we present e cient heuristics for computing memory layouts for program variables, which optimize utilization of parallel address generation units. Improvements and generalizations of previous work are described, and the e cacy of the proposed algorithms is demonstrated through experimental evaluation.
Proceedings European Design and Test Conference. ED & TC 97
Besides high code quality, a primary issue in embedded c o de generation is retargetability of co... more Besides high code quality, a primary issue in embedded c o de generation is retargetability of code generators. This paper presents techniques for automatic generation of code selectors from externally speci ed p r o c essor models. In contrast to previous work, our retargetable compiler Record does not require t o olspeci c modelling formalisms, but starts from general HDL processor models. From an HDL model, all processor aspects needed for code generation are automatically derived. As demonstrated by experimental results, short turnaround times for retargeting are achieved, which permits to study the HW SW trade-o between processor architectures and program execution speed.
Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition
The number of embedded systems is increasing and a remarkable percentage is designed as mobile ap... more The number of embedded systems is increasing and a remarkable percentage is designed as mobile applications. For the latter, the energy consumption is a limiting factor because of today's battery capacities. Besides the processor, memory accesses consume a high amount of energy. The use of additional less power hungry memories like caches or scratchpads is thus common. Caches incorporate the hardware control logic for moving data in and out automatically. On the other hand, this logic requires chip area and energy. A scratchpad memory is much more energy efficient, but there is a need for software control of its content. In this paper, an algorithm integrated into a compiler is presented which analyses the application and selects program and data parts which are placed into the scratchpad. Comparisons against a cache solution show remarkable advantages between 12% and 43% in energy consumption for designs of the same memory size. 1
Proceedings EURO-DAC '96. European Design Automation Conference with EURO-VHDL '96 and Exhibition
tion in code generation for embedded digital signal processors. Recent work has shown that this t... more tion in code generation for embedded digital signal processors. Recent work has shown that this task can be efciently solved b y t r e e c overing with dynamic programming, even in combination with the task of register allocation. However, performing instruction selection by tree c overing only does not exploit available instructionlevel parallelism, for instance in form of multiplyaccumulate instructions or parallel data moves. In this paper we investigate how such complex instructions may aect detection of optimal tree c overs, and we present a two-phase scheme for instruction selection which exploits available instruction-level parallelism. At the expense of higher compilation time, this technique may signicantly increase the code quality compared t o p r evious work, which is demonstrated for a widespread DSP.
A compiler for the generation of microcode for a high-level microprogramming language is presente... more A compiler for the generation of microcode for a high-level microprogramming language is presented. The compiler is target machine independent. The input to the compiler consists of a hardware description, a high-level microprogram and a set of program transformation rules. The compiler is able to take advantage of optimization techniques which are used by microprogrammers because many of these can be represented by program transformation rules.
The M I M O L A design method is a method for the design of digital processors from a very high-l... more The M I M O L A design method is a method for the design of digital processors from a very high-level bevavioral specification. A key feature of this method is the synthesis of a processor from a description of programs which are expected to be typical for the applications of that processor. Design cycles, in which the designer tries to improve automatically generated hardware structures, are supported by a retargetable ~i c r o c o d e generator and by an utilization and performance analyzer. This paper describes the design method, available software tools and some applications.
Journal of Statistical Computation and Simulation, 2014
Many different models for the analysis of high-dimensional survival data have been developed over... more Many different models for the analysis of high-dimensional survival data have been developed over the past years. While some of the models and implementations come with an internal parameter tuning automatism, others require the user to accurately adjust defaults, which often feels like a guessing game. Exhaustively trying out all model and parameter combinations will quickly become tedious or infeasible in computationally intensive settings, even if parallelization is employed. Therefore, we propose to use modern algorithm configuration techniques, e. g., iterated F-racing, to efficiently move through the model hypothesis space and to simultaneously configure algorithm classes and their respective hyperparameters. In our application we study four lung cancer microarray data sets. For these we configure a predictor based on five survival analysis algorithms in combination with eight feature selection filters. We parallelize the optimization and all comparison experiments with the BatchJobs and BatchExperiments R packages.
One of the key problems in hardware software codesign is hardware software partitioning. This pap... more One of the key problems in hardware software codesign is hardware software partitioning. This paper describes a new approach to hardware software partitioning using integer programming IP. The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach w orks fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioningare needed. The paper presents an algorithm using integer programming for solving the hardware software partitioning problem leading to promising results.
Design automation for embedded systems comprising both hardware and software components demands f... more Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement.
Proceedings the European Design and Test Conference. ED&TC 1995
In this paper we present a uni ed frontend for retargetable compilers that performs analysis of t... more In this paper we present a uni ed frontend for retargetable compilers that performs analysis of the target processor model. Our approach bridges the gap between structural and behavioral processor models for retargetable compilation. This is achieved by means of instruction set extraction. The extraction technique is based on a BDD data structure which signi cantly improves control signal analysis in the target processor compared to previous approaches. 1
Proceedings ED&TC European Design and Test Conference
One of the key problems in hardware/software codesign is hardware/software partitioning. This pap... more One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated respective to the chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating the cost metrics is compensated by an improved quality of the estimations compared to the results of estimators. Therefore, fewer iteration steps of partitioning will be needed. The paper will show that using integer programming to solve the hardware/software partitioning problem is feasible and leads to promising results.
Uploads
Papers by Peter Marwedel