Guillermo Paya-Vaya

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Guillermo Paya-Vaya

Instruction-set extension for an ASIP-based SIFT feature extraction

by Werner Ritter, Holger Blume, and Guillermo Paya-Vaya

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

One of the key problems in the field of Computer Vision is recovering the geometry from multiple ... more One of the key problems in the field of Computer Vision is recovering the geometry from multiple views of the same scene. A feature-based approach to solve the challenge of finding matching points in different views is the scale-invariant feature transform (SIFT). SIFT requires complex accelerated feature extraction combined with low energy requirements to meet the strict constraints of advanced driver assistance systems (ADAS) with regard to power consumption, processing speed and flexibility for future algorithms. This paper presents an application-specific instruction-set extension for a Tensilica Xtensa LX4 ASIP to accelerate a SIFT feature extraction and its evaluation. When compared to the same arithmetic functions processed on an ASIP without any extensions, basic elements of digital image processing and specialized SIFT processing tasks that are accelerated reach a significant speed-up factor for arithmetic functions of x1300. At the same time the accuracy of the SIFT features is preserved. The SIFT feature extraction on an extended processor was accelerated by a factor of x167 compared to the base processor. In addition, the proposed processor extensions maintain the full flexibility of an ASIP for a fast integration of future feature extractors for advanced driver assistance systems.

Download

FPGA Custom DSP for ECG Signal Analysis and Compression

Lecture Notes in Computer Science, 2004

This work describes a Virtex-II implementation of a custom DSP for QRS-Complex detection, ECG sig... more This work describes a Virtex-II implementation of a custom DSP for QRS-Complex detection, ECG signal analysis and data compression for optimum transmission and storage. For QRS-Complex detection we introduce a custom architecture based on a modification of the Hamilton-Tompkins (HT) algorithm oriented to area saving. We also use biorthogonal wavelet transform for ECG signal compression and main ECG parameters estimation.

A mobile SoC-based platform for evaluating hearing aid algorithms and architectures

2015 IEEE 5th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), 2015

An area efficient real- and complex-valued multiply-accumulate SIMD unit for digital signal processors

2015 IEEE Workshop on Signal Processing Systems (SiPS), 2015

Exploring Dynamic Reconfigurable CORDIC Co-Processors Tightly Coupled with a VLIW-SIMD Soft-Processor Architecture

Lecture Notes in Computer Science, 2015

ABSTRACT Most of embedded signal processing applications require the computation of complex arith... more ABSTRACT Most of embedded signal processing applications require the computation of complex arithmetic operations. Tightly coupled with a host processor, the CORDIC algorithm is one possible solution to establish a processing base, which is capable of performing a wide spectrum of operations with high precision. A software implementation as well as fully programmable and dedicated hardware implementations of the CORDIC algorithm are compared with each other in this paper. Especially, the use of non-programmable, area-efficient modules with a pre-determined functionality is particularly suited for use in dynamical partial reconfiguration (DPR) systems. This paper discusses the different aspects and design issues of different implementation strategies and shows that the initial reconfiguration overhead introduced by DPR systems is negligible in contrast to the speedup gained by the reconfiguration of the hardware accelerator when computing the same function more than 51 times consecutively (depending on the arithmetic function to be computed).

RAPANUI: Rapid Prototyping for Media Processor Architecture Exploration

Lecture Notes in Computer Science, 2005

ABSTRACT This paper describes a new rapid prototyping-based design framework for exploring and va... more ABSTRACT This paper describes a new rapid prototyping-based design framework for exploring and validating complex multiprocessor architectures for multimedia applications. The new methodology combines a typical ASIC flow with an FPGA flow focused on rapid prototyping. In order to make an exhaustive verification of the system architecture, a reference model that specifies the hardware implementation is used for validating both, HDL description and emulated system. Functional coverage in addition to traditional code coverage is used to test 100% of data, control and structural hazards of the system architecture. The reference model is also part of a stand-alone simulation environment. This allows hardware and application development be supported by a unique system model.

Architectures for ICT on FPGA

Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921), 2004

Page 1. Architectures for ICT on FPGA Arturo Mendez Patifio, Marcos A. Martinez Peiro,Francisco B... more

1-Gigabit Ethernet Media Access Controller Core

Recursive Bipyramid Architecture for Discrete Wavelet Packet Transform

Fast Ethernet Media Access Controller Core

Evaluation of 2D-DCT Architecture for FPGA

A comprehensive ASIC/FPGA prototyping environment for exploring embedded processing systems for advanced driver assistance applications

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

ABSTRACT The development of complex algorithms for advanced driver assistance systems is a challe... more ABSTRACT The development of complex algorithms for advanced driver assistance systems is a challenging task, due to the high innovation rate and processing demands of applications in this field. The development is usually supported by a software development framework that provides an infrastructure (e.g., access to sensor data) that simulates and evaluates the algorithms. One problem, especially with computationally intensive algorithms, is the slow simulation speed. This paper presents a prototyping environment that connects a software development framework with a FPGA-based hardware platform. This allows implementing computationally intensive tasks in hardware. The proposed rapid prototyping system not only reduces the simulation time, thereby allowing the software designer to evaluate algorithmic parameters with quicker feedback, but also allows verifying and evaluating hardware modules for rapid prototyping. A case study is presented in which a traffic sign detection algorithm is implemented on a soft-core processor. By using the hardware implementation the simulation was accelerated by a factor of 65, compared to the pure software implementation.

A Synthesizable Temperature Sensor on FPGA Using DSP-Slices for Reduced Calibration Overhead and Improved Stability

Lecture Notes in Computer Science, 2015

ABSTRACT Current research on synthesizable temperature sensors, using the reconfigurable logic of... more ABSTRACT Current research on synthesizable temperature sensors, using the reconfigurable logic of the FPGA to measure temperature anywhere on the FPGA, ueses an oscillating, temperature dependent route on the FPGA. These LUT-based routes require a complex calibration process and have a large footprint on the die. The proposed synthesizable temperature sensor uses DSP-slices to reduce the calibration overhead and the footprint as well. The sensor can achieve a resolution of up to 0.12 ∘ C, depending on configuration. A sample rate of up to 1040 samples per second is feasible, in the fastest configuration. The sensor was evaluated and compared. The sensor is more stable, easier to calibrate and features a smaller footprint. This allows a higher density of temperature sensors than before. It uses 45 FF, 69 LUTs, 6 Shift-Registers (SRL32) and 4 DSP-slices to realize a fully digital, synthesizable temperature sensor, including a calibration circuit, a reading circuit and a buffer structure to save multiple data samples.

Instruction-set extension for an ASIP-based SIFT feature extraction

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

Download

Performance evaluation of the Intel Xeon Phi manycore architecture using parallel video-based driver assistance algorithms

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

ABSTRACT Computational-intensive algorithms are often realized with dedicated or customized hardw... more ABSTRACT Computational-intensive algorithms are often realized with dedicated or customized hardware architectures suffering from high development costs and low flexibility thereafter. Instead, modern multicore and manycore processors can execute a diversity of software applications (e.g., driver assistance systems) written in portable high-level programming languages resulting in less porting effort at lower costs for power-consumption tolerant fields. For instance, the Intel Xeon Phi manycore processor featuring 61 cores offers not only a high theoretical peak performance but also a supportive tool chain for the software development in high-level programming languages. In contrast to traditional general-purpose multicore processors, this manycore architecture, however, exhibits different processor characteristics, inter-core communication topologies, and instruction sets. In this paper, we introduce the parallel implementation of a histogram of oriented gradients algorithm for pedestrian detection. Using a parallel semi-global matching algorithm as well, serving as an additional driver assistance algorithm, we present an in-depth performance analysis case study on the Intel Xeon Phi and also note distinct characteristics of this target platform. To allow a fair comparison, we not only rate an Intel Xeon 16-core general-purpose processor, but also present a platform comparison to customized hardware architectures.

FLINT: Layout-Oriented FPGA-Based Methodology for Fault Tolerant ASIC Design

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, 2015

ABSTRACT Research of efficient fault tolerance techniques for digital systems requires insight in... more ABSTRACT Research of efficient fault tolerance techniques for digital systems requires insight into the fault propagation mechanism inside the ASIC design. Radiation, high temperature, or charge sharing effects in ultra-deep submicron technologies influence fault generation and propagation dependent on die location. The proposed methodology links efficient fault injection to fault propagation in the floorplan view of a standard cell ASIC. This is achieved by instrumentation of the gate netlist after place&amp;route, emulation in an FPGA system and experiment control via interactive user interface. Further, automated fault injection campaigns allow exhaustive fault tolerance evaluations taking single faults as well as adjacent cell faults into account. The proposed methodology can be used to identify vulnerable cell nodes in the design and allow the classification of placement strategies of fault tolerant ASIC designs.

Automatic Situation Assessment for Event-Driven Video Analysis

Many complex maneuvers involving aircraft, vehicles and persons are carried out at airport aprons... more Many complex maneuvers involving aircraft, vehicles and persons are carried out at airport aprons. Manual video surveillance used for safety and security purposes is inefficient and privacy protection must be guaranteed. In this paper, we propose a system named ASEV that automatically assesses situations for airport surveillance. It combines four main components: a low-level image processing unit based on a new hardware implementation to extract features in real time, a high-level image processing unit for scene analysis, a real-time inference engine for scene understanding, and a data protection stage for log encryption. In addition, four often neglected aspects are successfully addressed: two-way communication between system and operator, power consumption, monitored people privacy and operator activity control. Extensive evaluation at a real airport shows that the proposed system improves the operator performance with sound and visual alerts based on the automatic assessment of v...

Download

RAPANUI: Rapid Prototyping for Media Processor Architecture Exploration

Lecture Notes in Computer Science, 2005

An Enhanced DMA Controller in SIMD Processors for Video Applications

Lecture Notes in Computer Science, 2009

Although current SIMD processor architectures can improve the processing performance by exploitin... more Although current SIMD processor architectures can improve the processing performance by exploiting the data level parallelism inherent in video applications, an important performance penalty appears when processing data that is not formatted in an amenable way, e.g. unaligned memory access. This paper presents an enhanced DMA controller that performs block-based data transfers and a realignment when accessing a word in an external memory that is not aligned with the natural data memory/bus width boundary. Moreover, the enhanced DMA controller performs a signal extension while accessing data outside a specific region, e.g. a video frame, decreasing the total amount of processing cycles required for a typical video application. Performance improvements of up to 25% can be achieved when running a highly time consuming video encoding task (motion estimation) on a generic VLIW architecture with the enhanced DMA controller compared to a basic block-transfer DMA controller.

A forwarding-sensitive instruction scheduling approach to reduce register file constraints in VLIW architectures

ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, 2010

ABSTRACT This paper presents a forwarding-based approach to increase the code compaction and cons... more ABSTRACT This paper presents a forwarding-based approach to increase the code compaction and consequently the processing performance of VLIW media-processors that implement monolithic or partitioned register file (RF) organizations with reduced number of read/write ports. This approach exploits the forwarding mechanism implemented in common pipelined VLIW architectures to reduce the number of RF accesses, which is one of the main limiting factors of the code compaction process. This RF access reduction enables a higher instruction scheduling efficiency and eventually decreases the power consumption, without requiring extra hardware. A forwarding-sensitive code generation algorithm based on an enhanced list scheduling algorithm is described in detail. In addition, three case studies are presented, where the proposed scheduling algorithm leads to performance improvements of up to 8.4% when running common image and video codec tasks on a generic VLIW architecture. This is attractively close to the maximum performance improvement (11.4%) that can be achieved when investing in hardware by using a RF with twice the number of ports.

Instruction-set extension for an ASIP-based SIFT feature extraction

by Werner Ritter, Holger Blume, and Guillermo Paya-Vaya

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

Download

FPGA Custom DSP for ECG Signal Analysis and Compression

Lecture Notes in Computer Science, 2004

A mobile SoC-based platform for evaluating hearing aid algorithms and architectures

2015 IEEE 5th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), 2015

An area efficient real- and complex-valued multiply-accumulate SIMD unit for digital signal processors

2015 IEEE Workshop on Signal Processing Systems (SiPS), 2015

Exploring Dynamic Reconfigurable CORDIC Co-Processors Tightly Coupled with a VLIW-SIMD Soft-Processor Architecture

Lecture Notes in Computer Science, 2015

RAPANUI: Rapid Prototyping for Media Processor Architecture Exploration

Lecture Notes in Computer Science, 2005

Architectures for ICT on FPGA

Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921), 2004

Page 1. Architectures for ICT on FPGA Arturo Mendez Patifio, Marcos A. Martinez Peiro,Francisco B... more

1-Gigabit Ethernet Media Access Controller Core

Recursive Bipyramid Architecture for Discrete Wavelet Packet Transform

Fast Ethernet Media Access Controller Core

Evaluation of 2D-DCT Architecture for FPGA

A comprehensive ASIC/FPGA prototyping environment for exploring embedded processing systems for advanced driver assistance applications

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

A Synthesizable Temperature Sensor on FPGA Using DSP-Slices for Reduced Calibration Overhead and Improved Stability

Lecture Notes in Computer Science, 2015

Instruction-set extension for an ASIP-based SIFT feature extraction

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

Download

Performance evaluation of the Intel Xeon Phi manycore architecture using parallel video-based driver assistance algorithms

2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014

FLINT: Layout-Oriented FPGA-Based Methodology for Fault Tolerant ASIC Design

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, 2015

Automatic Situation Assessment for Event-Driven Video Analysis

Download

RAPANUI: Rapid Prototyping for Media Processor Architecture Exploration

Lecture Notes in Computer Science, 2005

An Enhanced DMA Controller in SIMD Processors for Video Applications

Lecture Notes in Computer Science, 2009

A forwarding-sensitive instruction scheduling approach to reduce register file constraints in VLIW architectures

ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, 2010

Guillermo Paya-Vaya

Uploads

Papers by Guillermo Paya-Vaya

Log In