Skip to main content

Narayanan Harihar

Followers

0

Following

1

Public Views

Madhusudan Atre

Supratik Chakraborty

IIT Bombay

Supratik Chakraborty

Tribhuvan University (IAAS)

Elmir Mahammadov

Helmholtz-Zentrum München

Bankteshwar Tiwari

Maheswaran Govindarajan

Anna University

IIT Bombay

Interests

Uploads

Papers by Narayanan Harihar

Tutorial Four Mathematical Methods in VLSI

The theme of the tutorial is the use of mathematical methods in VLSI. The traditional use of math... more The theme of the tutorial is the use of mathematical methods in VLSI. The traditional use of mathematics in engineering disciplines is via mathematical modeling- concepts and interactions in the problem domain are mapped to objects and relationships of a specific mathematical topic and then the formal deductions within the topic are re-interpreted in the problem domain. After a structured review of VLSI design flow and the identification of mathematical topics applicable to each step of the design flow, the tutorial illustrates these themes by a sampling of mathematical techniques applicable to analysis of modeling and simulation, partitioning, structural and behavioral decomposition, and symbolic reasoning about behavior. The tutorial is aimed at illustrating the importance of mathematics in VLSI, especially in the development of various tools, which are critical for design. The audience will be made to appreciate how some of the tools which are used by designers actually have some...

On the linking of number lattices

arXiv: Number Theory, 2019

In this paper we study ideas which have proved useful in topological network theory in the contex... more In this paper we study ideas which have proved useful in topological network theory in the context of lattices of numbers. A number lattice $L_S$ is a collection of row vectors, over $\mathbb{Q}$ on a finite column set $S,$ generated by integral linear combination of a finite set of row vectors. A generalized number lattice $K_S$ is the sum of a number lattice $L_S$ and a vector space $V_S$ which has only the zero vector in common with it. The dual $K^d_S$ of a generalized number lattice is the collection of all vectors whose dot product with vectors in $K_S$ are integral and is another generalized number lattice. We consider a linking operation ('matched composition`) between generalized number lattices $K_{SP},K_{P}$ (regarded as collections of row vectors on column sets $S\cup P, P,$ respectively with $S,P,$ disjoint) defined by $K_{SP}\leftrightarrow K_{P}\equiv \{f_S:((f_S,g_P)\in K_{SP}, g_P \in K_{P}\}.$ We show that this operation together with contraction and restrictio...

On the Maximum Power Transfer Theorem

The International Journal of Electrical Engineering & Education, 1978

A reformulation of the maximum power transfer theorem, in terms of terminal conditions of the net... more

Chaper 1 Introduction

Submodular Functions and Electrical Networks, 1997

Publisher Summary This chapter introduces the methods for studying the properties of electrical n... more Publisher Summary This chapter introduces the methods for studying the properties of electrical networks, which are independent of the device characteristic. Only topological constraints are used—namely, Krichoff's current law (KCL) and Kirchoff's voltage law (KVL). These methods are also called “network topological.” The chapter presents applications to circuit simulation and circuit partitioning and establishes the relations between the optimization problems that arise naturally, while using these methods, to the central problems in the theory of submodular functions. There are more immediate applications possible. The most popular general purpose simulator currently running—SPICE—uses the modified nodal analysis approach. In this approach, the devices are divided into two classes, generalized admittance type whose currents can be written in terms of voltages appearing somewhere in the circuit, and the remaining devices. The final variables in terms of which the solution is carried out is the set of all nodal voltages and current variables. The resulting coefficient matrix is very sparse but suffers from several defects.

Polyhedrally tight set functions and discrete convexity

Pacific Journal of Optimization, 2008

Fast On-Line/Off-Line Algorithms for Optimal Reinforcement of a Network and Its Connections with Principal Partition

Lecture Notes in Computer Science, 2000

The problem of computing the strength and performing optimal reinforcement for an edge-weighted g... more The problem of computing the strength and performing optimal reinforcement for an edge-weighted graph G(V, E,w) is well-studied [1],[2],[3],[6],[7],[9]. In this paper, we present fast (sequential linear time and parallel logarithmic time) on-line algorithms for optimally reinforcing the graph when the reinforcement material is available continuosly online. These are first on-line algortithms for this problem. Although we invest some time

Principal lattice of partitions of submodular functions on graphs: Fast algorithms for principal partition and generic rigidity

Lecture Notes in Computer Science, 1992

In this paper we use a single unifying approach (which we call the Principal Lattice of Partition... more In this paper we use a single unifying approach (which we call the Principal Lattice of Partitions approach) to construct simple and fast algorithms for problems including and related to the Principal Partition and the Generic Rigidity of graphs. Most of our algorithms are at least as fast as presently known algorithms for these problems, while our algorithm for Principal

Time domain method for reduced order network synthesis of large RC circuits

ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187)

In this article we present a time domain method to approximate linear RC multiport networks, accu... more In this article we present a time domain method to approximate linear RC multiport networks, accurate up to user specified frequency. We use Lanczos algorithm to compute eigenvalues, an electrical network based method to compute eigenvectors and use these to synthesize a reduced order RC network. A similar method for RC multiport approximation has been proposed by Yang and Kerns[4], which works with admittance matrices (G and sC) of the multiport and also uses Lanczos algorithm to compute dominant eigenvalues and eigenvectors. However their method forbids capacitor cutsets in the multiport, which is a common occurrence in multilayered interconnect modeling, signal integrity analyses etc. We propose an algorithm which works implicitly on the state matrix description of the network and allows capacitor cutsets in the graph of the network. We use an efficient DC analysis algorithm[l] to make implicit computations of eigenvalues, eigenvectors and port admittance matrices.

Chapter 13 Algorithms for the PLP of a Submodular Function

Annals of Discrete Mathematics, 1997

Publisher Summary This chapter presents algorithms for the principal lattice of partitions (PLP) ... more Publisher Summary This chapter presents algorithms for the principal lattice of partitions (PLP) of a general submodular function and extends these to important instances of functions based on bipartite graphs. The general algorithms of the chapter are parallel to the algorithms for principal partition. Main algorithms in this context are presented. In the case of both principal partition (PP) and PLP, the problem of minimizing a submodular function is at the heart of the algorithms. The chapter explains the application of general PLP algorithms to the important cases of weighted adjacency and weighted exclusivity functions associated with a bipartite graph. In both these cases, the minimization of the basic submodular function reduces to appropriate flow problems, which can be solved extremely efficiently. Several useful techniques for improving the efficiency of the algorithms in those cases where the maximum value of the (integral) submodular function is less than the size of the underlying set are described.

A state assignment scheme targeting performance and area

Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013), 1999

In this paper, we address the state assignment problem for Finite State Machines (FSMs). In parti... more In this paper, we address the state assignment problem for Finite State Machines (FSMs). In particular, we study the effect of certain sparse state encoding strategies on the area and performance of the FSM when implemented using multi-level logic circuits. We present the results of a systematic study conducted for characterizing the effects of some encoding schemes on the area and delay of FSM implementations. Based on these results, we conclude that two-hot encodings preserve the speed advantages of onehot encodings while reducing the area of the implemented circuit. We show that the problem of finding an optimal twohot encoding can be posed as a constrained partitioning problem on a certain graph. We describe a greedy heuristic algorithm for this partitioning problem. Finally, we present some results and comparisons between the circuits obtained using two-hot encodings as opposed to those obtained using one-hot encoding, and to those obtained using JEDI[1] and NOVA[2]. The results are encouraging, particularly for FSMs with a large number of states.

On the membership problem over polymatroid intersection

Also published as report no. SFB-303--94825SIGLEAvailable from TIB Hannover: RN 4052(94825) / FIZ... more

Mathematical programming and resistor transformer diode networks

In this paper, we present results for networks with ideal transformers, conical diodes and resist... more In this paper, we present results for networks with ideal transformers, conical diodes and resistors which are conical analogues of vector space based fundamental results for networks with ideal transformers and resistors. By a conical diode we mean a device which satisfies VD≤0, iD≥0, where VD, iD represent the voltage and current associated with the device. We show: 1) if

Large Scale VLSI Circuit Simulation Using Point Relaxation

International Conference on Scientific Computing, 2010

The aim of this paper is to show that by using the elementary technique of point relaxation (i.e.... more The aim of this paper is to show that by using the elementary technique of point relaxation (i.e., Gauss seidel iteration of static nonlinear equations at each time point), we can solve accurately very large digital circuits of several million nodes. The characteristic of such circuits is a well defined direction of signal flow, which can be defined beforehand and used for ordering the nodes during the iteration. An attendant, and important, benefit is the relatively small requirement of storage space. We have been able to simulate MOS digital circuits with approximately 1.6 million transistors (512 × 512 SRAM memory array) in about half an hour using less than 1.3GB working memory (Pentium-4, 2.2GHz). We also propose a simple parallelization technique and experimentally demonstrate that we can solve digital circuits with tens of million transistors in a few hours.

Two Graph Based Circuit Simulator for PDE-Electrical Analogy

2012 25th International Conference on VLSI Design, 2012

The aim of the paper is to develop an efficient circuit simulator to solve circuits arising out o... more The aim of the paper is to develop an efficient circuit simulator to solve circuits arising out of an electrical analogy for Partial Differential Equations (PDEs). This electrical analogy arises when we solve PDE through finite element method (FEM). The paper also proposes an optimal method for simulation of such circuits. We have built simulators based on Modified Nodal Analysis and Two Graph method for solution of PDEs through electrical analogy and compared their timing performance with commercial simulators. The timing performance of circuit simulators is improved for special PDE problems (such as Convection-diffusion) by an efficient implementation of iterative Cholesky with Two Graph method. The method is based on a graph representation of linear systems of equations. Such iterative methods would not be feasible with MNA. Using this method, we have been able to simulate circuits arising from the Convection-Diffusion problem with approximately 1.6 million nodes and 47 million edges in less than 8 minutes.

On Duality of Behavioural Systems

Multidimensional Systems and Signal Processing, 2002

The notions of controllability and observability are duals in the traditional input-state-output ... more The notions of controllability and observability are duals in the traditional input-state-output framework of systems theory for 1-D systems. Recently, there has been a study on the duality between controllability and observability in behavioural systems [2] by the construction of a suitable adjoint for 1-D behavioural systems. In this paper we show, among other things, that formally this definition carries

Application of DC Analyzer to Combinatorial Optimization Problems

20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07), 2007

Solution of many combinatorial optimization problems can be found by analyzing appropriate electr... more Solution of many combinatorial optimization problems can be found by analyzing appropriate electrical networks made up of positive resistors, voltage sources, current sources and ideal diodes. This method is an alternative approach for the approximate solution of such problems. Two Graph method based fast simulator is a more suitable option for this purpose than Modified Nodal Analysis based conventional simulators. Using this approach we have made an attempt to solve min cost flow and single source shortest path problems. A planar min cost flow problem of size 200, 000 nodes and 600, 000 edges is solved by our simulator approximately within 0.1% of the optimum solution in about 11 mins. We have exactly solved a planar single source shortest path problem (having negative edge weights also) of size 100, 000 nodes and 600, 000 edges in about 2 mins. We have performed our experiments on a PIV processor having 1 GB RAM.

Orthogonal partitioning and gated clock architecture for low power realization of FSMs

Proceedings of 13th Annual IEEE International ASIC/SOC Conference (Cat. No.00TH8541)

In this paper; we address the issue of low power realization of FSMs using decomposition and gate... more In this paper; we address the issue of low power realization of FSMs using decomposition and gated clock architecture. We decompose the N state machine into two interacting machines with NI, N2 states such that N = N I x N2. Our cost function is the number of self-edges, which is to be maximized. For all the self-edge conditions, the inputs and clock of the respective machine is disabled to reduce the switching activity and therefore, the reduction in power can be achieved. We describe the greedy algorithm which maximizes the cost function. We are attempting to keep the area same by keeping the number offlip-pops minimum. We compared the results of our algorithm with JEDl [7]. In one case, we could achieve the power reduction up to 67% with the less area as well. Based on the results, we conclude that our approach is suitable for machines with large number of states and less number of outputs.

An efficient practical heuristic for good ratio-cut partitioning

16th International Conference on VLSI Design, 2003. Proceedings.

We present an efficient heuristic for finding good bipartitions of the vertex set of a graph in t... more We present an efficient heuristic for finding good bipartitions of the vertex set of a graph in the sense of the wellknown measure of ratioCut [2, 8] (essentially the ratio between weight of cut edges and the product of weights of the nodesets of the bipartition). The widely accepted ratioCut bipartitioning algorithm of Wei and Cheng [13] is similar in spirit to the Fiduccia-Mattheyeses [9] algorithm (F-M algorithm). Our approach makes use of F-M algorithm as the first phase that takes in as an input, random bipartitions. In the later phase of our algorithm we make use of a new coarsening strategy and follow it up with a submodular function optimization algorithm on the coarsened graph. We also present the comparison of results of this approach applied to benchmark circuits with the well-established algorithms such as the Wei-Cheng algorithm [13] for ratioCut bipartitioning and pmetis of Metis [7] package. The comparative study not only shows that this new approach indeed produces good quality ratioCut bipartitions, but also the fact that this approach has the potential of finding a large number of such good partitions in comparison with other approaches. The key subroutine in our heuristic strategies is based on the recent finding published in [12] about the role of submodular functions in designing new heuristics and approximate algorithms to some NP-hard problems.

On the notion of generalized minor in topological network theory and matroids

Linear Algebra and its Applications, 2014

FPGA Based High Performance Double-Precision Matrix Multiplication

2009 22nd International Conference on VLSI Design, 2009

ABSTRACT We present two designs (I and II) for IEEE 754 double precision floating point matrix mu... more ABSTRACT We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, an important kernel in many tile-based BLAS algorithms, optimized for implementation on high-end FPGAs. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I.

Tutorial Four Mathematical Methods in VLSI

The theme of the tutorial is the use of mathematical methods in VLSI. The traditional use of math... more The theme of the tutorial is the use of mathematical methods in VLSI. The traditional use of mathematics in engineering disciplines is via mathematical modeling- concepts and interactions in the problem domain are mapped to objects and relationships of a specific mathematical topic and then the formal deductions within the topic are re-interpreted in the problem domain. After a structured review of VLSI design flow and the identification of mathematical topics applicable to each step of the design flow, the tutorial illustrates these themes by a sampling of mathematical techniques applicable to analysis of modeling and simulation, partitioning, structural and behavioral decomposition, and symbolic reasoning about behavior. The tutorial is aimed at illustrating the importance of mathematics in VLSI, especially in the development of various tools, which are critical for design. The audience will be made to appreciate how some of the tools which are used by designers actually have some...

On the linking of number lattices

arXiv: Number Theory, 2019

In this paper we study ideas which have proved useful in topological network theory in the contex... more In this paper we study ideas which have proved useful in topological network theory in the context of lattices of numbers. A number lattice $L_S$ is a collection of row vectors, over $\mathbb{Q}$ on a finite column set $S,$ generated by integral linear combination of a finite set of row vectors. A generalized number lattice $K_S$ is the sum of a number lattice $L_S$ and a vector space $V_S$ which has only the zero vector in common with it. The dual $K^d_S$ of a generalized number lattice is the collection of all vectors whose dot product with vectors in $K_S$ are integral and is another generalized number lattice. We consider a linking operation ('matched composition`) between generalized number lattices $K_{SP},K_{P}$ (regarded as collections of row vectors on column sets $S\cup P, P,$ respectively with $S,P,$ disjoint) defined by $K_{SP}\leftrightarrow K_{P}\equiv \{f_S:((f_S,g_P)\in K_{SP}, g_P \in K_{P}\}.$ We show that this operation together with contraction and restrictio...

On the Maximum Power Transfer Theorem

The International Journal of Electrical Engineering & Education, 1978

A reformulation of the maximum power transfer theorem, in terms of terminal conditions of the net... more

Chaper 1 Introduction

Submodular Functions and Electrical Networks, 1997

Publisher Summary This chapter introduces the methods for studying the properties of electrical n... more Publisher Summary This chapter introduces the methods for studying the properties of electrical networks, which are independent of the device characteristic. Only topological constraints are used—namely, Krichoff's current law (KCL) and Kirchoff's voltage law (KVL). These methods are also called “network topological.” The chapter presents applications to circuit simulation and circuit partitioning and establishes the relations between the optimization problems that arise naturally, while using these methods, to the central problems in the theory of submodular functions. There are more immediate applications possible. The most popular general purpose simulator currently running—SPICE—uses the modified nodal analysis approach. In this approach, the devices are divided into two classes, generalized admittance type whose currents can be written in terms of voltages appearing somewhere in the circuit, and the remaining devices. The final variables in terms of which the solution is carried out is the set of all nodal voltages and current variables. The resulting coefficient matrix is very sparse but suffers from several defects.

Polyhedrally tight set functions and discrete convexity

Pacific Journal of Optimization, 2008

Fast On-Line/Off-Line Algorithms for Optimal Reinforcement of a Network and Its Connections with Principal Partition

Lecture Notes in Computer Science, 2000

The problem of computing the strength and performing optimal reinforcement for an edge-weighted g... more The problem of computing the strength and performing optimal reinforcement for an edge-weighted graph G(V, E,w) is well-studied [1],[2],[3],[6],[7],[9]. In this paper, we present fast (sequential linear time and parallel logarithmic time) on-line algorithms for optimally reinforcing the graph when the reinforcement material is available continuosly online. These are first on-line algortithms for this problem. Although we invest some time

Principal lattice of partitions of submodular functions on graphs: Fast algorithms for principal partition and generic rigidity

Lecture Notes in Computer Science, 1992

In this paper we use a single unifying approach (which we call the Principal Lattice of Partition... more In this paper we use a single unifying approach (which we call the Principal Lattice of Partitions approach) to construct simple and fast algorithms for problems including and related to the Principal Partition and the Generic Rigidity of graphs. Most of our algorithms are at least as fast as presently known algorithms for these problems, while our algorithm for Principal

Time domain method for reduced order network synthesis of large RC circuits

ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187)

In this article we present a time domain method to approximate linear RC multiport networks, accu... more In this article we present a time domain method to approximate linear RC multiport networks, accurate up to user specified frequency. We use Lanczos algorithm to compute eigenvalues, an electrical network based method to compute eigenvectors and use these to synthesize a reduced order RC network. A similar method for RC multiport approximation has been proposed by Yang and Kerns[4], which works with admittance matrices (G and sC) of the multiport and also uses Lanczos algorithm to compute dominant eigenvalues and eigenvectors. However their method forbids capacitor cutsets in the multiport, which is a common occurrence in multilayered interconnect modeling, signal integrity analyses etc. We propose an algorithm which works implicitly on the state matrix description of the network and allows capacitor cutsets in the graph of the network. We use an efficient DC analysis algorithm[l] to make implicit computations of eigenvalues, eigenvectors and port admittance matrices.

Chapter 13 Algorithms for the PLP of a Submodular Function

Annals of Discrete Mathematics, 1997

Publisher Summary This chapter presents algorithms for the principal lattice of partitions (PLP) ... more Publisher Summary This chapter presents algorithms for the principal lattice of partitions (PLP) of a general submodular function and extends these to important instances of functions based on bipartite graphs. The general algorithms of the chapter are parallel to the algorithms for principal partition. Main algorithms in this context are presented. In the case of both principal partition (PP) and PLP, the problem of minimizing a submodular function is at the heart of the algorithms. The chapter explains the application of general PLP algorithms to the important cases of weighted adjacency and weighted exclusivity functions associated with a bipartite graph. In both these cases, the minimization of the basic submodular function reduces to appropriate flow problems, which can be solved extremely efficiently. Several useful techniques for improving the efficiency of the algorithms in those cases where the maximum value of the (integral) submodular function is less than the size of the underlying set are described.

A state assignment scheme targeting performance and area

Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013), 1999

In this paper, we address the state assignment problem for Finite State Machines (FSMs). In parti... more In this paper, we address the state assignment problem for Finite State Machines (FSMs). In particular, we study the effect of certain sparse state encoding strategies on the area and performance of the FSM when implemented using multi-level logic circuits. We present the results of a systematic study conducted for characterizing the effects of some encoding schemes on the area and delay of FSM implementations. Based on these results, we conclude that two-hot encodings preserve the speed advantages of onehot encodings while reducing the area of the implemented circuit. We show that the problem of finding an optimal twohot encoding can be posed as a constrained partitioning problem on a certain graph. We describe a greedy heuristic algorithm for this partitioning problem. Finally, we present some results and comparisons between the circuits obtained using two-hot encodings as opposed to those obtained using one-hot encoding, and to those obtained using JEDI[1] and NOVA[2]. The results are encouraging, particularly for FSMs with a large number of states.

On the membership problem over polymatroid intersection

Also published as report no. SFB-303--94825SIGLEAvailable from TIB Hannover: RN 4052(94825) / FIZ... more

Mathematical programming and resistor transformer diode networks

In this paper, we present results for networks with ideal transformers, conical diodes and resist... more In this paper, we present results for networks with ideal transformers, conical diodes and resistors which are conical analogues of vector space based fundamental results for networks with ideal transformers and resistors. By a conical diode we mean a device which satisfies VD≤0, iD≥0, where VD, iD represent the voltage and current associated with the device. We show: 1) if

Large Scale VLSI Circuit Simulation Using Point Relaxation

International Conference on Scientific Computing, 2010

The aim of this paper is to show that by using the elementary technique of point relaxation (i.e.... more The aim of this paper is to show that by using the elementary technique of point relaxation (i.e., Gauss seidel iteration of static nonlinear equations at each time point), we can solve accurately very large digital circuits of several million nodes. The characteristic of such circuits is a well defined direction of signal flow, which can be defined beforehand and used for ordering the nodes during the iteration. An attendant, and important, benefit is the relatively small requirement of storage space. We have been able to simulate MOS digital circuits with approximately 1.6 million transistors (512 × 512 SRAM memory array) in about half an hour using less than 1.3GB working memory (Pentium-4, 2.2GHz). We also propose a simple parallelization technique and experimentally demonstrate that we can solve digital circuits with tens of million transistors in a few hours.

Two Graph Based Circuit Simulator for PDE-Electrical Analogy

2012 25th International Conference on VLSI Design, 2012

The aim of the paper is to develop an efficient circuit simulator to solve circuits arising out o... more The aim of the paper is to develop an efficient circuit simulator to solve circuits arising out of an electrical analogy for Partial Differential Equations (PDEs). This electrical analogy arises when we solve PDE through finite element method (FEM). The paper also proposes an optimal method for simulation of such circuits. We have built simulators based on Modified Nodal Analysis and Two Graph method for solution of PDEs through electrical analogy and compared their timing performance with commercial simulators. The timing performance of circuit simulators is improved for special PDE problems (such as Convection-diffusion) by an efficient implementation of iterative Cholesky with Two Graph method. The method is based on a graph representation of linear systems of equations. Such iterative methods would not be feasible with MNA. Using this method, we have been able to simulate circuits arising from the Convection-Diffusion problem with approximately 1.6 million nodes and 47 million edges in less than 8 minutes.

On Duality of Behavioural Systems

Multidimensional Systems and Signal Processing, 2002

The notions of controllability and observability are duals in the traditional input-state-output ... more The notions of controllability and observability are duals in the traditional input-state-output framework of systems theory for 1-D systems. Recently, there has been a study on the duality between controllability and observability in behavioural systems [2] by the construction of a suitable adjoint for 1-D behavioural systems. In this paper we show, among other things, that formally this definition carries

Application of DC Analyzer to Combinatorial Optimization Problems

20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07), 2007

Solution of many combinatorial optimization problems can be found by analyzing appropriate electr... more Solution of many combinatorial optimization problems can be found by analyzing appropriate electrical networks made up of positive resistors, voltage sources, current sources and ideal diodes. This method is an alternative approach for the approximate solution of such problems. Two Graph method based fast simulator is a more suitable option for this purpose than Modified Nodal Analysis based conventional simulators. Using this approach we have made an attempt to solve min cost flow and single source shortest path problems. A planar min cost flow problem of size 200, 000 nodes and 600, 000 edges is solved by our simulator approximately within 0.1% of the optimum solution in about 11 mins. We have exactly solved a planar single source shortest path problem (having negative edge weights also) of size 100, 000 nodes and 600, 000 edges in about 2 mins. We have performed our experiments on a PIV processor having 1 GB RAM.

Orthogonal partitioning and gated clock architecture for low power realization of FSMs

Proceedings of 13th Annual IEEE International ASIC/SOC Conference (Cat. No.00TH8541)

In this paper; we address the issue of low power realization of FSMs using decomposition and gate... more In this paper; we address the issue of low power realization of FSMs using decomposition and gated clock architecture. We decompose the N state machine into two interacting machines with NI, N2 states such that N = N I x N2. Our cost function is the number of self-edges, which is to be maximized. For all the self-edge conditions, the inputs and clock of the respective machine is disabled to reduce the switching activity and therefore, the reduction in power can be achieved. We describe the greedy algorithm which maximizes the cost function. We are attempting to keep the area same by keeping the number offlip-pops minimum. We compared the results of our algorithm with JEDl [7]. In one case, we could achieve the power reduction up to 67% with the less area as well. Based on the results, we conclude that our approach is suitable for machines with large number of states and less number of outputs.

An efficient practical heuristic for good ratio-cut partitioning

16th International Conference on VLSI Design, 2003. Proceedings.

We present an efficient heuristic for finding good bipartitions of the vertex set of a graph in t... more We present an efficient heuristic for finding good bipartitions of the vertex set of a graph in the sense of the wellknown measure of ratioCut [2, 8] (essentially the ratio between weight of cut edges and the product of weights of the nodesets of the bipartition). The widely accepted ratioCut bipartitioning algorithm of Wei and Cheng [13] is similar in spirit to the Fiduccia-Mattheyeses [9] algorithm (F-M algorithm). Our approach makes use of F-M algorithm as the first phase that takes in as an input, random bipartitions. In the later phase of our algorithm we make use of a new coarsening strategy and follow it up with a submodular function optimization algorithm on the coarsened graph. We also present the comparison of results of this approach applied to benchmark circuits with the well-established algorithms such as the Wei-Cheng algorithm [13] for ratioCut bipartitioning and pmetis of Metis [7] package. The comparative study not only shows that this new approach indeed produces good quality ratioCut bipartitions, but also the fact that this approach has the potential of finding a large number of such good partitions in comparison with other approaches. The key subroutine in our heuristic strategies is based on the recent finding published in [12] about the role of submodular functions in designing new heuristics and approximate algorithms to some NP-hard problems.

On the notion of generalized minor in topological network theory and matroids

Linear Algebra and its Applications, 2014

FPGA Based High Performance Double-Precision Matrix Multiplication

2009 22nd International Conference on VLSI Design, 2009

ABSTRACT We present two designs (I and II) for IEEE 754 double precision floating point matrix mu... more ABSTRACT We present two designs (I and II) for IEEE 754 double precision floating point matrix multiplication, an important kernel in many tile-based BLAS algorithms, optimized for implementation on high-end FPGAs. The designs, both based on the rank-1 update scheme, can handle arbitrary matrix sizes, and are able to sustain their peak performance except during an initial latency period. Through these designs, the trade-offs involved in terms of local-memory and bandwidth for an FPGA implementation are demonstrated and an analysis is presented for the optimal choice of design parameters. The designs, implemented on a Virtex-5 SX240T FPGA, scale gracefully from 1 to 40 processing elements(PEs) with a less than 1% degradation in the design frequency of 373 MHz. With 40 PEs and a design speed of 373 MHz, a sustained performance of 29.8 GFLOPS is possible with a bandwidth requirement of 750 MB/s for design-II and 5.9 GB/s for design-I.