Papers by Guillermo Paya-Vaya
2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014
One of the key problems in the field of Computer Vision is recovering the geometry from multiple ... more One of the key problems in the field of Computer Vision is recovering the geometry from multiple views of the same scene. A feature-based approach to solve the challenge of finding matching points in different views is the scale-invariant feature transform (SIFT). SIFT requires complex accelerated feature extraction combined with low energy requirements to meet the strict constraints of advanced driver assistance systems (ADAS) with regard to power consumption, processing speed and flexibility for future algorithms. This paper presents an application-specific instruction-set extension for a Tensilica Xtensa LX4 ASIP to accelerate a SIFT feature extraction and its evaluation. When compared to the same arithmetic functions processed on an ASIP without any extensions, basic elements of digital image processing and specialized SIFT processing tasks that are accelerated reach a significant speed-up factor for arithmetic functions of x1300. At the same time the accuracy of the SIFT features is preserved. The SIFT feature extraction on an extended processor was accelerated by a factor of x167 compared to the base processor. In addition, the proposed processor extensions maintain the full flexibility of an ASIP for a fast integration of future feature extractors for advanced driver assistance systems.
Lecture Notes in Computer Science, 2004
This work describes a Virtex-II implementation of a custom DSP for QRS-Complex detection, ECG sig... more This work describes a Virtex-II implementation of a custom DSP for QRS-Complex detection, ECG signal analysis and data compression for optimum transmission and storage. For QRS-Complex detection we introduce a custom architecture based on a modification of the Hamilton-Tompkins (HT) algorithm oriented to area saving. We also use biorthogonal wavelet transform for ECG signal compression and main ECG parameters estimation.
2015 IEEE 5th International Conference on Consumer Electronics - Berlin (ICCE-Berlin), 2015
2015 IEEE Workshop on Signal Processing Systems (SiPS), 2015
Lecture Notes in Computer Science, 2015
ABSTRACT Most of embedded signal processing applications require the computation of complex arith... more ABSTRACT Most of embedded signal processing applications require the computation of complex arithmetic operations. Tightly coupled with a host processor, the CORDIC algorithm is one possible solution to establish a processing base, which is capable of performing a wide spectrum of operations with high precision. A software implementation as well as fully programmable and dedicated hardware implementations of the CORDIC algorithm are compared with each other in this paper. Especially, the use of non-programmable, area-efficient modules with a pre-determined functionality is particularly suited for use in dynamical partial reconfiguration (DPR) systems. This paper discusses the different aspects and design issues of different implementation strategies and shows that the initial reconfiguration overhead introduced by DPR systems is negligible in contrast to the speedup gained by the reconfiguration of the hardware accelerator when computing the same function more than 51 times consecutively (depending on the arithmetic function to be computed).
Lecture Notes in Computer Science, 2005
ABSTRACT This paper describes a new rapid prototyping-based design framework for exploring and va... more ABSTRACT This paper describes a new rapid prototyping-based design framework for exploring and validating complex multiprocessor architectures for multimedia applications. The new methodology combines a typical ASIC flow with an FPGA flow focused on rapid prototyping. In order to make an exhaustive verification of the system architecture, a reference model that specifies the hardware implementation is used for validating both, HDL description and emulated system. Functional coverage in addition to traditional code coverage is used to test 100% of data, control and structural hazards of the system architecture. The reference model is also part of a stand-alone simulation environment. This allows hardware and application development be supported by a unique system model.
Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921), 2004
Page 1. Architectures for ICT on FPGA Arturo Mendez Patifio, Marcos A. Martinez Peiro,Francisco B... more Page 1. Architectures for ICT on FPGA Arturo Mendez Patifio, Marcos A. Martinez Peiro,Francisco Ballester, G. Paya Instituto Tecnoldgico de Morelia (Me'xico), Universidad Polite'cnicu de Valencia (Espuiiu) mpeiro@,eln.upv.es Abstract ...
2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014
ABSTRACT The development of complex algorithms for advanced driver assistance systems is a challe... more ABSTRACT The development of complex algorithms for advanced driver assistance systems is a challenging task, due to the high innovation rate and processing demands of applications in this field. The development is usually supported by a software development framework that provides an infrastructure (e.g., access to sensor data) that simulates and evaluates the algorithms. One problem, especially with computationally intensive algorithms, is the slow simulation speed. This paper presents a prototyping environment that connects a software development framework with a FPGA-based hardware platform. This allows implementing computationally intensive tasks in hardware. The proposed rapid prototyping system not only reduces the simulation time, thereby allowing the software designer to evaluate algorithmic parameters with quicker feedback, but also allows verifying and evaluating hardware modules for rapid prototyping. A case study is presented in which a traffic sign detection algorithm is implemented on a soft-core processor. By using the hardware implementation the simulation was accelerated by a factor of 65, compared to the pure software implementation.
Lecture Notes in Computer Science, 2015
ABSTRACT Current research on synthesizable temperature sensors, using the reconfigurable logic of... more ABSTRACT Current research on synthesizable temperature sensors, using the reconfigurable logic of the FPGA to measure temperature anywhere on the FPGA, ueses an oscillating, temperature dependent route on the FPGA. These LUT-based routes require a complex calibration process and have a large footprint on the die. The proposed synthesizable temperature sensor uses DSP-slices to reduce the calibration overhead and the footprint as well. The sensor can achieve a resolution of up to 0.12 ∘ C, depending on configuration. A sample rate of up to 1040 samples per second is feasible, in the fastest configuration. The sensor was evaluated and compared. The sensor is more stable, easier to calibrate and features a smaller footprint. This allows a higher density of temperature sensors than before. It uses 45 FF, 69 LUTs, 6 Shift-Registers (SRL32) and 4 DSP-slices to realize a fully digital, synthesizable temperature sensor, including a calibration circuit, a reading circuit and a buffer structure to save multiple data samples.
2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014
One of the key problems in the field of Computer Vision is recovering the geometry from multiple ... more One of the key problems in the field of Computer Vision is recovering the geometry from multiple views of the same scene. A feature-based approach to solve the challenge of finding matching points in different views is the scale-invariant feature transform (SIFT). SIFT requires complex accelerated feature extraction combined with low energy requirements to meet the strict constraints of advanced driver assistance systems (ADAS) with regard to power consumption, processing speed and flexibility for future algorithms. This paper presents an application-specific instruction-set extension for a Tensilica Xtensa LX4 ASIP to accelerate a SIFT feature extraction and its evaluation. When compared to the same arithmetic functions processed on an ASIP without any extensions, basic elements of digital image processing and specialized SIFT processing tasks that are accelerated reach a significant speed-up factor for arithmetic functions of x1300. At the same time the accuracy of the SIFT features is preserved. The SIFT feature extraction on an extended processor was accelerated by a factor of x167 compared to the base processor. In addition, the proposed processor extensions maintain the full flexibility of an ASIP for a fast integration of future feature extractors for advanced driver assistance systems.
2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV), 2014
ABSTRACT Computational-intensive algorithms are often realized with dedicated or customized hardw... more ABSTRACT Computational-intensive algorithms are often realized with dedicated or customized hardware architectures suffering from high development costs and low flexibility thereafter. Instead, modern multicore and manycore processors can execute a diversity of software applications (e.g., driver assistance systems) written in portable high-level programming languages resulting in less porting effort at lower costs for power-consumption tolerant fields. For instance, the Intel Xeon Phi manycore processor featuring 61 cores offers not only a high theoretical peak performance but also a supportive tool chain for the software development in high-level programming languages. In contrast to traditional general-purpose multicore processors, this manycore architecture, however, exhibits different processor characteristics, inter-core communication topologies, and instruction sets. In this paper, we introduce the parallel implementation of a histogram of oriented gradients algorithm for pedestrian detection. Using a parallel semi-global matching algorithm as well, serving as an additional driver assistance algorithm, we present an in-depth performance analysis case study on the Intel Xeon Phi and also note distinct characteristics of this target platform. To allow a fair comparison, we not only rate an Intel Xeon 16-core general-purpose processor, but also present a platform comparison to customized hardware architectures.
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, 2015
ABSTRACT Research of efficient fault tolerance techniques for digital systems requires insight in... more ABSTRACT Research of efficient fault tolerance techniques for digital systems requires insight into the fault propagation mechanism inside the ASIC design. Radiation, high temperature, or charge sharing effects in ultra-deep submicron technologies influence fault generation and propagation dependent on die location. The proposed methodology links efficient fault injection to fault propagation in the floorplan view of a standard cell ASIC. This is achieved by instrumentation of the gate netlist after place&route, emulation in an FPGA system and experiment control via interactive user interface. Further, automated fault injection campaigns allow exhaustive fault tolerance evaluations taking single faults as well as adjacent cell faults into account. The proposed methodology can be used to identify vulnerable cell nodes in the design and allow the classification of placement strategies of fault tolerant ASIC designs.
Many complex maneuvers involving aircraft, vehicles and persons are carried out at airport aprons... more Many complex maneuvers involving aircraft, vehicles and persons are carried out at airport aprons. Manual video surveillance used for safety and security purposes is inefficient and privacy protection must be guaranteed. In this paper, we propose a system named ASEV that automatically assesses situations for airport surveillance. It combines four main components: a low-level image processing unit based on a new hardware implementation to extract features in real time, a high-level image processing unit for scene analysis, a real-time inference engine for scene understanding, and a data protection stage for log encryption. In addition, four often neglected aspects are successfully addressed: two-way communication between system and operator, power consumption, monitored people privacy and operator activity control. Extensive evaluation at a real airport shows that the proposed system improves the operator performance with sound and visual alerts based on the automatic assessment of v...
Lecture Notes in Computer Science, 2005
ABSTRACT This paper describes a new rapid prototyping-based design framework for exploring and va... more ABSTRACT This paper describes a new rapid prototyping-based design framework for exploring and validating complex multiprocessor architectures for multimedia applications. The new methodology combines a typical ASIC flow with an FPGA flow focused on rapid prototyping. In order to make an exhaustive verification of the system architecture, a reference model that specifies the hardware implementation is used for validating both, HDL description and emulated system. Functional coverage in addition to traditional code coverage is used to test 100% of data, control and structural hazards of the system architecture. The reference model is also part of a stand-alone simulation environment. This allows hardware and application development be supported by a unique system model.
Lecture Notes in Computer Science, 2009
Although current SIMD processor architectures can improve the processing performance by exploitin... more Although current SIMD processor architectures can improve the processing performance by exploiting the data level parallelism inherent in video applications, an important performance penalty appears when processing data that is not formatted in an amenable way, e.g. unaligned memory access. This paper presents an enhanced DMA controller that performs block-based data transfers and a realignment when accessing a word in an external memory that is not aligned with the natural data memory/bus width boundary. Moreover, the enhanced DMA controller performs a signal extension while accessing data outside a specific region, e.g. a video frame, decreasing the total amount of processing cycles required for a typical video application. Performance improvements of up to 25% can be achieved when running a highly time consuming video encoding task (motion estimation) on a generic VLIW architecture with the enhanced DMA controller compared to a basic block-transfer DMA controller.
ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, 2010
ABSTRACT This paper presents a forwarding-based approach to increase the code compaction and cons... more ABSTRACT This paper presents a forwarding-based approach to increase the code compaction and consequently the processing performance of VLIW media-processors that implement monolithic or partitioned register file (RF) organizations with reduced number of read/write ports. This approach exploits the forwarding mechanism implemented in common pipelined VLIW architectures to reduce the number of RF accesses, which is one of the main limiting factors of the code compaction process. This RF access reduction enables a higher instruction scheduling efficiency and eventually decreases the power consumption, without requiring extra hardware. A forwarding-sensitive code generation algorithm based on an enhanced list scheduling algorithm is described in detail. In addition, three case studies are presented, where the proposed scheduling algorithm leads to performance improvements of up to 8.4% when running common image and video codec tasks on a generic VLIW architecture. This is attractively close to the maximum performance improvement (11.4%) that can be achieved when investing in hardware by using a RF with twice the number of ports.
Uploads
Papers by Guillermo Paya-Vaya