Skip to main content

Srinivasa Satti

Norwegian University of Science and Technology, Department of Computer and Information Science, Faculty Member

Followers

18

Following

6

Co-authors

4

Mentions

1

Public Views

Interests

Uploads

Papers by Srinivasa Satti

Energy Efficient Sorting, Selection and Searching

Theoretical computer science, Apr 1, 2024

Practical Implementation of Encoding Range Top-2 Queries

Symposium on Experimental and Efficient Algorithms, 2021

We design a practical variant of an encoding for range Top-2 queries (RT2Q), and evaluate its per... more We design a practical variant of an encoding for range Top-2 queries (RT2Q), and evaluate its performance. Given an array A[1, n] of n elements from a total order, the range Top-2 encoding problem is to construct a data structure that can answer RT2Q queries, which return the positions of the first and the second largest elements within a given query range of A, without accessing the array A at query time. Davoodi et al. [Phil. Trans. Royal Soc. A, 2016] proposed a (3.272n + o(n))-bit encoding, which answers RT2Q queries in O(1) time, while Gawrychowski and Nicholson [ICALP, 2015] gave an optimal (2.755n + (n))-bit encoding which doesn't support efficient queries. In this paper, we propose the first practical implementation of the encoding data structure for answering RT2Q. Our implementation is based on an alternative representation of Davoodi et al.'s data structure. The experimental results show that our implementation is efficient in practice, and gives improved time-space trade-offs compared to the indexing data structures (which keep the original array A as part of the data structure) for range maximum queries. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis Keywords and phrases Range top-2 query, Range minimum query, Cartesian tree, Succinct encoding

Approximate Query Processing over Static Sets and Sliding Windows

International Symposium on Algorithms and Computation, 2018

Indexing of static and dynamic sets is fundamental to a large set of applications such as informa... more Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing j≤i B[j]) and select(i) (i.e., finding min{p | rank(p) ≥ i}) queries. We study multiple types of approximations (allowing an error in the query or the result) and present lower bounds and succinct data structures for several variants of the problem. We also extend our model to sliding windows, in which we process a stream of elements and compute suffix sums. This is a generalization of the window summation problem that allows the user to specify the window size at query time. Here, we provide an algorithm that supports updates and queries in constant time while requiring just (1 + o(1)) factor more space than the fixed-window summation algorithms.

On Succinct Representations of Binary Trees

arXiv (Cornell University), Oct 18, 2014

We observe that a standard transformation between ordinal trees (arbitrary rooted trees with orde... more We observe that a standard transformation between ordinal trees (arbitrary rooted trees with ordered children) and binary trees leads to interesting succinct binary tree representations. There are four symmetric versions of these transformations. Via these transformations we get four succinct representations of n-node binary trees that use 2n + n/(log n) O(1) bits and support (among other operations) navigation, inorder numbering, one of pre-or post-order numbering, subtree size and lowest common ancestor (LCA) queries. The ability to support inorder numbering is crucial for the well-known range-minimum query (RMQ) problem on an array A of n ordered values. While this functionality, and more, is also supported in O(1) time using 2n + o(n) bits by Davoodi et al.'s (Phil. Trans. Royal Soc. A 372 (2014)) extension of a representation by Farzan and Munro (Algorithmica 6 (2014)), their redundancy, or the o(n) term, is much larger, and their approach may not be suitable for practical implementations. One of these transformations is related to the Zaks' sequence (S. Zaks, Theor. Comput. Sci. 10 (1980)) for encoding binary trees, and we thus provide the first succinct binary tree representation based on Zaks' sequence. Another of these transformations is equivalent to Fischer and Heun's (SIAM J. Comput. 40 (2011)) 2d-Min-Heap structure for this problem. Yet another variant allows an encoding of the Cartesian tree of A to be constructed from A using only O(√ n log n) bits of working space.

임의 접근을 지원하는 간단한 정수 배열 코드 시스템

Biconnectivity, $st$-numbering and other applications of DFS using $O(n)$ bits

arXiv (Cornell University), Jun 28, 2016

We consider space efficient implementations of some classical applications of DFS including the p... more We consider space efficient implementations of some classical applications of DFS including the problem of testing biconnectivity and 2-edge connectivity, finding cut vertices and cut edges, computing chain decomposition and st-numbering of a given undirected graph G on n vertices and m edges. Classical algorithms for them typically use DFS and some Ω(lg n) bits 1 of information at each vertex. Building on a recent O(n)-bits implementation of DFS due to Elmasry et al. (STACS 2015) we provide O(n)-bit implementations for all these applications of DFS. Our algorithms take O(m lg c n lg lg n) time for some small constant c (where c ≤ 2). Central to our implementation is a succinct representation of the DFS tree and a space efficient partitioning of the DFS tree into connected subtrees, which maybe of independent interest for designing other space efficient graph algorithms. 1 We use lg to denote logarithm to the base 2. $ Some of these results were announced in preliminary form in the proceedings of 27th International Symposium on Algorithms and

Minimum Transactions Problem

Lecture Notes in Computer Science, 2018

We are given a directed graph G(V, E) on n vertices and m edges where each edge has a positive we... more We are given a directed graph G(V, E) on n vertices and m edges where each edge has a positive weight associated with it. The influx of a vertex is defined as the difference between the sum of the weights of edges entering the vertex and the sum of the weights of edges leaving the vertex. The goal is to find a graph \(G'(V,E')\) such that the influx of each vertex in \(G'(V,E')\) is same as the influx of each vertex in G(V, E) and \(|E'|\) is minimal. We show that 1. finding the optimal solution for this problem is NP-hard, 2. the optimal solution has at most \(n-1\) edges, and we give an algorithm to find one such solution with at most \(n-1\) edges in \(O(m \log n)\) time, and 3. for one variant of the problem where we can delete as well as add extra edges to the graph, we can compute a solution that is within a factor 3 / 2 from the optimal solution.

Finding Mode Using Equality Comparisons

Lecture Notes in Computer Science, 2016

We consider the problem of finding the mode (an element that appears the maximum number of times)... more We consider the problem of finding the mode (an element that appears the maximum number of times) in a list of elements that are not necessarily from a totally ordered set. Here, the relation between elements is determined by ‘equality’ comparisons whose outcome is \(=\) when the two elements being compared are equal and \(\ne \) otherwise. In sharp contrast to the \(\varTheta (\frac{n\lg n}{m})\) bound known in the classical three way comparison model where elements are from a totally ordered set, a recent paper gave an \(O(\frac{n^2}{m})\) upper bound and \(\varOmega (\frac{n^2}{m})\) lower bound for the number of comparisons required to find the mode, where m is the frequency of the mode. While the number of comparisons made by the algorithm is roughly \(\frac{n^2}{m}\), it is not clear how the necessary bookkeeping required can be done to make the rest of the operations take \(\varTheta (\frac{n^2}{m})\) time.

Space Efficient Linear Time Algorithms for BFS, DFS and Applications

Theory of computing systems, Jan 22, 2018

Research on space efficient graph algorithms, particularly for stconnectivity, has a long history... more Research on space efficient graph algorithms, particularly for stconnectivity, has a long history including the celebrated polynomial time, O(lg n) bits 1 algorithm in undirected graphs by Reingold J. JACM. 55(4) (2008), and polynomial time, n/2 (√ lg n) bits algorithm in directed graphs by Barnes et al. SICOMP. 27(5), 1273-1282 (1998). Recent works by Asano et al. ISAAC (2014) and Elmasry et al. STACS (2015), reconsidered classical fundamental graph algorithms focusing on improving the space complexity. Elmasry et al. gave, among others, an implementation of breadth first search (BFS) in a graph G with n vertices and m edges, taking the optimal O(m + n) time using O(n) bits improving the naïve O(n lg n) bits We use lg to denote logarithm to the base 2. Some of these results were announced in preliminary form in the proceedings of 22nd International

Succinct Representation for (Non)Deterministic Finite Automata

arXiv (Cornell University), Jul 22, 2019

Deterministic finite automata are one of the simplest and most practical models of computation st... more Deterministic finite automata are one of the simplest and most practical models of computation studied in automata theory. Their conceptual extension is the non-deterministic finite automata which also have plenty of applications. In this article, we study these models through the lens of succinct data structures where our ultimate goal is to encode these mathematical objects using information theoretically optimal number of bits along with supporting queries on them efficiently. Towards this goal, we first design a succinct data structure for representing any deterministic finite automaton D having n states over a σ-letter alphabet Σ using (σ−1)n log n+O(n log σ) bits of space, which can determine, given an input string x over Σ, whether D accepts x in O(|x| log σ) time, using constant words of working space. When the input deterministic finite automaton is acyclic, not only we can improve the above space bound significantly to (σ − 1)(n − 1) log n + 3n + O(log 2 σ) + o(n) bits, we also obtain optimal query time for string acceptance checking. More specifically, using our succinct representation, we can check if a given input string x can be accepted by the acyclic deterministic finite automaton using time proportional to the length of x, hence, the optimal query time. We also exhibit a succinct data structure for representing a non-deterministic finite automaton N having n states over a σ-letter alphabet Σ using σn 2 + n bits of space, such that given an input string x, we can decide whether N accepts x efficiently in O(n 2 |x|) time. Finally, we also provide time and space efficient algorithms for performing several standard operations such as union, intersection and complement on the languages accepted by deterministic finite automata.

Finding kings in tournaments

Discrete Applied Mathematics, Dec 1, 2022

Simultaneous encodings for range and next/previous larger/smaller value queries

arXiv (Cornell University), Dec 22, 2016

Given an array of n elements from a total order, we propose encodings that support various range ... more Given an array of n elements from a total order, we propose encodings that support various range queries (range minimum, range maximum and their variants), and previous and next smaller/larger value queries. When query time is not of concern, we obtain a 4.088n+o(n)-bit encoding that supports all these queries. For the case when we need to support all these queries in constant time, we give an encoding that takes 4.585n + o(n) bits, where n is the length of input array. This improves the 5.08n + o(n)-bit encoding obtained by encoding the colored 2d-Min and Max heaps proposed by Fischer [TCS, 2011]. We first extend the original DFUDS [Algorithmica, 2005] encoding of the colored 2d-Min (Max) heap that supports the queries in constant time. Then, we combine the extended DFUDS of 2d-Min heap and 2d-Max heap using the Min-Max encoding of Gawrychowski and Nicholson [ICALP, 2015] with some modifications. We also obtain encodings that take lesser space and support a subset of these queries.

Random Access to Grammar Compressed Strings

arXiv (Cornell University), Jan 11, 2010

Grammar based compression, where one replaces a long string by a small context-free grammar that ... more Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many of the popular compression schemes, including the Lempel-Ziv family, Run-Length Encoding, Byte-Pair Encoding, Sequitur, and RePair. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let S be a string of length N compressed into a context-free grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n • α k (n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, α k (n) is the inverse of the k th row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P |k, k 4 + |P |} + log N) + occ), where occ is the number of occurrences of P in S. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.

Encoding Two-Dimensional Range Top-k Queries

Combinatorial Pattern Matching, 2016

We consider the problem of encoding two-dimensional arrays, whose elements come from a total orde... more We consider the problem of encoding two-dimensional arrays, whose elements come from a total order, for answering Top-k queries. The aim is to obtain encodings that use space close to the information-theoretic lower bound, which can be constructed efficiently. For an m × n array, with m ≤ n, we first propose an encoding for answering 1-sided Top-k queries, whose query range is restricted to [1. .. m][1. .. a], for 1 ≤ a ≤ n. Next, we propose an encoding for answering for the general (4-sided) Top-k queries that takes (m lg (k+1)n n + 2nm(m − 1) + o(n)) bits, which generalizes the joint Cartesian tree of Golin et al. [TCS 2016]. Compared with trivial O(nm lg n)-bit encoding, our encoding takes less space when m = o(lg n). In addition to the upper bound results for the encodings, we also give lower bounds on encodings for answering 1 and 4-sided Top-k queries, which show that our upper bound results are almost optimal.

Optimal In-place Algorithms for Basic Graph Problems

arXiv (Cornell University), Jul 22, 2019

We present linear time in-place algorithms for several basic and fundamental graph problems inclu... more We present linear time in-place algorithms for several basic and fundamental graph problems including the well-known graph search methods (like depth-first search, breadth-first search, maximum cardinality search), connectivity problems (like biconnectivity, 2-edge connectivity), decomposition problem (like chain decomposition) among various others, improving the running time (by polynomial multiplicative factor) of the recent results of Chakraborty et al. [ESA, 2018] who designed O(n 3 lg n) time in-place algorithms for a strict subset of the above mentioned problems. The running times of all our algorithms are essentially optimal as they run in linear time. One of the main ideas behind obtaining these algorithms is the detection and careful exploitation of sortedness present in the input representation for any graph without loss of generality. This observation alone is powerful enough to design some basic linear time in-place algorithms, but more non-trivial graph problems require extra techniques which, we believe, may find other applications while designing in-place algorithms for different graph problems in future.

Succinct Navigational Oracles for Families of Intersection Graphs on a Circle

arXiv (Cornell University), Oct 8, 2020

We consider the problem of designing succinct navigational oracles, i.e., succinct data structure... more We consider the problem of designing succinct navigational oracles, i.e., succinct data structures supporting basic navigational queries such as degree, adjacency and neighborhood efficiently for intersection graphs on a circle, which include graph classes such as circle graphs, k-polygoncircle graphs, circle-trapezoid graphs, trapezoid graphs. The degree query reports the number of incident edges to a given vertex, the adjacency query asks if there is an edge between two given vertices, and the neighborhood query enumerates all the neighbors of a given vertex. We first prove a general lower bound for these intersection graph classes, and then present a uniform approach that lets us obtain matching lower and upper bounds for representing each of these graph classes. More specifically, our lower bound proofs use a unified technique to produce tight bounds for all these classes, and this is followed by our data structures which are also obtained from a unified representation method to achieve succinctness for each class. In addition, we prove a lower bound of space for representing trapezoid graphs, and give a succinct navigational oracle for this class of graphs.

Approximate Query Processing over Static Sets and Sliding Windows

arXiv (Cornell University), Sep 14, 2018

Indexing of static and dynamic sets is fundamental to a large set of applications such as informa... more Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing j≤i B[j]) and select(i) (i.e., finding min{p | rank(p) ≥ i}) queries. We study multiple types of approximations (allowing an error in the query or the result) and present lower bounds and succinct data structures for several variants of the problem. We also extend our model to sliding windows, in which we process a stream of elements and compute suffix sums. This is a generalization of the window summation problem that allows the user to specify the window size at query time. Here, we provide an algorithm that supports updates and queries in constant time while requiring just (1 + o(1)) factor more space than the fixed-window summation algorithms.

Succinct Data Structures for Families of Interval Graphs

arXiv (Cornell University), Feb 25, 2019

We consider the problem of designing succinct data structures for interval graphs with n vertices... more We consider the problem of designing succinct data structures for interval graphs with n vertices while supporting degree, adjacency, neighborhood and shortest path queries in optimal time. Towards showing succinctness, we first show that at least n log 2 n−2n log 2 log 2 n−O(n) bits are necessary to represent any unlabeled interval graph G with n vertices, answering an open problem of Yang and Pippenger [Proc. Amer. Math. Soc. 2017]. This is augmented by a data structure of size n log 2 n + O(n) bits while supporting not only the above queries optimally but also capable of executing various combinatorial algorithms (like proper coloring, maximum independent set etc.) on interval graphs efficiently. Finally, we extend our ideas to other variants of interval graphs, for example, proper/unit interval graphs, kimproper interval graphs, and circular-arc graphs, and design succinct data structures for these graph classes as well along with supporting queries on them efficiently.

Space Efficient Top-k Query Encoding Based on Data Distribution

정보과학회논문지, Mar 31, 2020

We consider an encoding that supports a range top-k query on a two-dimensional array without acce... more We consider an encoding that supports a range top-k query on a two-dimensional array without accessing the original array. We propose a more space-efficient encoding method for top-k query with better average-case query time. Our experiments also show that our encoding is more space-efficient than the earlier ones. Also, based on the learning-based data structure, we propose the use of the learning-based data structure on succinct data structures.

Encoding 2D range maximum queries

Theoretical Computer Science, 2016

We consider the two-dimensional range maximum query (2D-RMQ) problem: given an array containing e... more We consider the two-dimensional range maximum query (2D-RMQ) problem: given an array containing elements from an ordered set, encode the array so that the position of the maximum element in any specified range of rows and range of columns can be found efficiently. We focus on determining the effective entropy of 2D-RMQ, i.e., how many bits are needed to encode an array so that 2D-RMQ queries can be answered without accessing the array. We give tight upper and lower bounds on the expected effective entropy for the case when A contains independent identically-distributed random values, and give new upper and lower bounds for the case when the array contains few rows. The latter results improve upon the upper and lower bounds by Brodal et al. [4]. We also give some efficient data structures for 2D-RMQ whose space usage is close to the effective entropy.

Energy Efficient Sorting, Selection and Searching

Theoretical computer science, Apr 1, 2024

Practical Implementation of Encoding Range Top-2 Queries

Symposium on Experimental and Efficient Algorithms, 2021

We design a practical variant of an encoding for range Top-2 queries (RT2Q), and evaluate its per... more We design a practical variant of an encoding for range Top-2 queries (RT2Q), and evaluate its performance. Given an array A[1, n] of n elements from a total order, the range Top-2 encoding problem is to construct a data structure that can answer RT2Q queries, which return the positions of the first and the second largest elements within a given query range of A, without accessing the array A at query time. Davoodi et al. [Phil. Trans. Royal Soc. A, 2016] proposed a (3.272n + o(n))-bit encoding, which answers RT2Q queries in O(1) time, while Gawrychowski and Nicholson [ICALP, 2015] gave an optimal (2.755n + (n))-bit encoding which doesn't support efficient queries. In this paper, we propose the first practical implementation of the encoding data structure for answering RT2Q. Our implementation is based on an alternative representation of Davoodi et al.'s data structure. The experimental results show that our implementation is efficient in practice, and gives improved time-space trade-offs compared to the indexing data structures (which keep the original array A as part of the data structure) for range maximum queries. 2012 ACM Subject Classification Theory of computation → Data structures design and analysis Keywords and phrases Range top-2 query, Range minimum query, Cartesian tree, Succinct encoding

Approximate Query Processing over Static Sets and Sliding Windows

International Symposium on Algorithms and Computation, 2018

Indexing of static and dynamic sets is fundamental to a large set of applications such as informa... more Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing j≤i B[j]) and select(i) (i.e., finding min{p | rank(p) ≥ i}) queries. We study multiple types of approximations (allowing an error in the query or the result) and present lower bounds and succinct data structures for several variants of the problem. We also extend our model to sliding windows, in which we process a stream of elements and compute suffix sums. This is a generalization of the window summation problem that allows the user to specify the window size at query time. Here, we provide an algorithm that supports updates and queries in constant time while requiring just (1 + o(1)) factor more space than the fixed-window summation algorithms.

On Succinct Representations of Binary Trees

arXiv (Cornell University), Oct 18, 2014

We observe that a standard transformation between ordinal trees (arbitrary rooted trees with orde... more We observe that a standard transformation between ordinal trees (arbitrary rooted trees with ordered children) and binary trees leads to interesting succinct binary tree representations. There are four symmetric versions of these transformations. Via these transformations we get four succinct representations of n-node binary trees that use 2n + n/(log n) O(1) bits and support (among other operations) navigation, inorder numbering, one of pre-or post-order numbering, subtree size and lowest common ancestor (LCA) queries. The ability to support inorder numbering is crucial for the well-known range-minimum query (RMQ) problem on an array A of n ordered values. While this functionality, and more, is also supported in O(1) time using 2n + o(n) bits by Davoodi et al.'s (Phil. Trans. Royal Soc. A 372 (2014)) extension of a representation by Farzan and Munro (Algorithmica 6 (2014)), their redundancy, or the o(n) term, is much larger, and their approach may not be suitable for practical implementations. One of these transformations is related to the Zaks' sequence (S. Zaks, Theor. Comput. Sci. 10 (1980)) for encoding binary trees, and we thus provide the first succinct binary tree representation based on Zaks' sequence. Another of these transformations is equivalent to Fischer and Heun's (SIAM J. Comput. 40 (2011)) 2d-Min-Heap structure for this problem. Yet another variant allows an encoding of the Cartesian tree of A to be constructed from A using only O(√ n log n) bits of working space.

임의 접근을 지원하는 간단한 정수 배열 코드 시스템

Biconnectivity, $st$-numbering and other applications of DFS using $O(n)$ bits

arXiv (Cornell University), Jun 28, 2016

We consider space efficient implementations of some classical applications of DFS including the p... more We consider space efficient implementations of some classical applications of DFS including the problem of testing biconnectivity and 2-edge connectivity, finding cut vertices and cut edges, computing chain decomposition and st-numbering of a given undirected graph G on n vertices and m edges. Classical algorithms for them typically use DFS and some Ω(lg n) bits 1 of information at each vertex. Building on a recent O(n)-bits implementation of DFS due to Elmasry et al. (STACS 2015) we provide O(n)-bit implementations for all these applications of DFS. Our algorithms take O(m lg c n lg lg n) time for some small constant c (where c ≤ 2). Central to our implementation is a succinct representation of the DFS tree and a space efficient partitioning of the DFS tree into connected subtrees, which maybe of independent interest for designing other space efficient graph algorithms. 1 We use lg to denote logarithm to the base 2. $ Some of these results were announced in preliminary form in the proceedings of 27th International Symposium on Algorithms and

Minimum Transactions Problem

Lecture Notes in Computer Science, 2018

We are given a directed graph G(V, E) on n vertices and m edges where each edge has a positive we... more We are given a directed graph G(V, E) on n vertices and m edges where each edge has a positive weight associated with it. The influx of a vertex is defined as the difference between the sum of the weights of edges entering the vertex and the sum of the weights of edges leaving the vertex. The goal is to find a graph \(G'(V,E')\) such that the influx of each vertex in \(G'(V,E')\) is same as the influx of each vertex in G(V, E) and \(|E'|\) is minimal. We show that 1. finding the optimal solution for this problem is NP-hard, 2. the optimal solution has at most \(n-1\) edges, and we give an algorithm to find one such solution with at most \(n-1\) edges in \(O(m \log n)\) time, and 3. for one variant of the problem where we can delete as well as add extra edges to the graph, we can compute a solution that is within a factor 3 / 2 from the optimal solution.

Finding Mode Using Equality Comparisons

Lecture Notes in Computer Science, 2016

We consider the problem of finding the mode (an element that appears the maximum number of times)... more We consider the problem of finding the mode (an element that appears the maximum number of times) in a list of elements that are not necessarily from a totally ordered set. Here, the relation between elements is determined by ‘equality’ comparisons whose outcome is \(=\) when the two elements being compared are equal and \(\ne \) otherwise. In sharp contrast to the \(\varTheta (\frac{n\lg n}{m})\) bound known in the classical three way comparison model where elements are from a totally ordered set, a recent paper gave an \(O(\frac{n^2}{m})\) upper bound and \(\varOmega (\frac{n^2}{m})\) lower bound for the number of comparisons required to find the mode, where m is the frequency of the mode. While the number of comparisons made by the algorithm is roughly \(\frac{n^2}{m}\), it is not clear how the necessary bookkeeping required can be done to make the rest of the operations take \(\varTheta (\frac{n^2}{m})\) time.

Space Efficient Linear Time Algorithms for BFS, DFS and Applications

Theory of computing systems, Jan 22, 2018

Research on space efficient graph algorithms, particularly for stconnectivity, has a long history... more Research on space efficient graph algorithms, particularly for stconnectivity, has a long history including the celebrated polynomial time, O(lg n) bits 1 algorithm in undirected graphs by Reingold J. JACM. 55(4) (2008), and polynomial time, n/2 (√ lg n) bits algorithm in directed graphs by Barnes et al. SICOMP. 27(5), 1273-1282 (1998). Recent works by Asano et al. ISAAC (2014) and Elmasry et al. STACS (2015), reconsidered classical fundamental graph algorithms focusing on improving the space complexity. Elmasry et al. gave, among others, an implementation of breadth first search (BFS) in a graph G with n vertices and m edges, taking the optimal O(m + n) time using O(n) bits improving the naïve O(n lg n) bits We use lg to denote logarithm to the base 2. Some of these results were announced in preliminary form in the proceedings of 22nd International

Succinct Representation for (Non)Deterministic Finite Automata

arXiv (Cornell University), Jul 22, 2019

Deterministic finite automata are one of the simplest and most practical models of computation st... more Deterministic finite automata are one of the simplest and most practical models of computation studied in automata theory. Their conceptual extension is the non-deterministic finite automata which also have plenty of applications. In this article, we study these models through the lens of succinct data structures where our ultimate goal is to encode these mathematical objects using information theoretically optimal number of bits along with supporting queries on them efficiently. Towards this goal, we first design a succinct data structure for representing any deterministic finite automaton D having n states over a σ-letter alphabet Σ using (σ−1)n log n+O(n log σ) bits of space, which can determine, given an input string x over Σ, whether D accepts x in O(|x| log σ) time, using constant words of working space. When the input deterministic finite automaton is acyclic, not only we can improve the above space bound significantly to (σ − 1)(n − 1) log n + 3n + O(log 2 σ) + o(n) bits, we also obtain optimal query time for string acceptance checking. More specifically, using our succinct representation, we can check if a given input string x can be accepted by the acyclic deterministic finite automaton using time proportional to the length of x, hence, the optimal query time. We also exhibit a succinct data structure for representing a non-deterministic finite automaton N having n states over a σ-letter alphabet Σ using σn 2 + n bits of space, such that given an input string x, we can decide whether N accepts x efficiently in O(n 2 |x|) time. Finally, we also provide time and space efficient algorithms for performing several standard operations such as union, intersection and complement on the languages accepted by deterministic finite automata.

Finding kings in tournaments

Discrete Applied Mathematics, Dec 1, 2022

Simultaneous encodings for range and next/previous larger/smaller value queries

arXiv (Cornell University), Dec 22, 2016

Given an array of n elements from a total order, we propose encodings that support various range ... more Given an array of n elements from a total order, we propose encodings that support various range queries (range minimum, range maximum and their variants), and previous and next smaller/larger value queries. When query time is not of concern, we obtain a 4.088n+o(n)-bit encoding that supports all these queries. For the case when we need to support all these queries in constant time, we give an encoding that takes 4.585n + o(n) bits, where n is the length of input array. This improves the 5.08n + o(n)-bit encoding obtained by encoding the colored 2d-Min and Max heaps proposed by Fischer [TCS, 2011]. We first extend the original DFUDS [Algorithmica, 2005] encoding of the colored 2d-Min (Max) heap that supports the queries in constant time. Then, we combine the extended DFUDS of 2d-Min heap and 2d-Max heap using the Min-Max encoding of Gawrychowski and Nicholson [ICALP, 2015] with some modifications. We also obtain encodings that take lesser space and support a subset of these queries.

Random Access to Grammar Compressed Strings

arXiv (Cornell University), Jan 11, 2010

Grammar based compression, where one replaces a long string by a small context-free grammar that ... more Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many of the popular compression schemes, including the Lempel-Ziv family, Run-Length Encoding, Byte-Pair Encoding, Sequitur, and RePair. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let S be a string of length N compressed into a context-free grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n • α k (n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, α k (n) is the inverse of the k th row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P |k, k 4 + |P |} + log N) + occ), where occ is the number of occurrences of P in S. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.

Encoding Two-Dimensional Range Top-k Queries

Combinatorial Pattern Matching, 2016

We consider the problem of encoding two-dimensional arrays, whose elements come from a total orde... more We consider the problem of encoding two-dimensional arrays, whose elements come from a total order, for answering Top-k queries. The aim is to obtain encodings that use space close to the information-theoretic lower bound, which can be constructed efficiently. For an m × n array, with m ≤ n, we first propose an encoding for answering 1-sided Top-k queries, whose query range is restricted to [1. .. m][1. .. a], for 1 ≤ a ≤ n. Next, we propose an encoding for answering for the general (4-sided) Top-k queries that takes (m lg (k+1)n n + 2nm(m − 1) + o(n)) bits, which generalizes the joint Cartesian tree of Golin et al. [TCS 2016]. Compared with trivial O(nm lg n)-bit encoding, our encoding takes less space when m = o(lg n). In addition to the upper bound results for the encodings, we also give lower bounds on encodings for answering 1 and 4-sided Top-k queries, which show that our upper bound results are almost optimal.

Optimal In-place Algorithms for Basic Graph Problems

arXiv (Cornell University), Jul 22, 2019

We present linear time in-place algorithms for several basic and fundamental graph problems inclu... more We present linear time in-place algorithms for several basic and fundamental graph problems including the well-known graph search methods (like depth-first search, breadth-first search, maximum cardinality search), connectivity problems (like biconnectivity, 2-edge connectivity), decomposition problem (like chain decomposition) among various others, improving the running time (by polynomial multiplicative factor) of the recent results of Chakraborty et al. [ESA, 2018] who designed O(n 3 lg n) time in-place algorithms for a strict subset of the above mentioned problems. The running times of all our algorithms are essentially optimal as they run in linear time. One of the main ideas behind obtaining these algorithms is the detection and careful exploitation of sortedness present in the input representation for any graph without loss of generality. This observation alone is powerful enough to design some basic linear time in-place algorithms, but more non-trivial graph problems require extra techniques which, we believe, may find other applications while designing in-place algorithms for different graph problems in future.

Succinct Navigational Oracles for Families of Intersection Graphs on a Circle

arXiv (Cornell University), Oct 8, 2020

We consider the problem of designing succinct navigational oracles, i.e., succinct data structure... more We consider the problem of designing succinct navigational oracles, i.e., succinct data structures supporting basic navigational queries such as degree, adjacency and neighborhood efficiently for intersection graphs on a circle, which include graph classes such as circle graphs, k-polygoncircle graphs, circle-trapezoid graphs, trapezoid graphs. The degree query reports the number of incident edges to a given vertex, the adjacency query asks if there is an edge between two given vertices, and the neighborhood query enumerates all the neighbors of a given vertex. We first prove a general lower bound for these intersection graph classes, and then present a uniform approach that lets us obtain matching lower and upper bounds for representing each of these graph classes. More specifically, our lower bound proofs use a unified technique to produce tight bounds for all these classes, and this is followed by our data structures which are also obtained from a unified representation method to achieve succinctness for each class. In addition, we prove a lower bound of space for representing trapezoid graphs, and give a succinct navigational oracle for this class of graphs.

Approximate Query Processing over Static Sets and Sliding Windows

arXiv (Cornell University), Sep 14, 2018

Indexing of static and dynamic sets is fundamental to a large set of applications such as informa... more Indexing of static and dynamic sets is fundamental to a large set of applications such as information retrieval and caching. Denoting the characteristic vector of the set by B, we consider the problem of encoding sets and multisets to support approximate versions of the operations rank(i) (i.e., computing j≤i B[j]) and select(i) (i.e., finding min{p | rank(p) ≥ i}) queries. We study multiple types of approximations (allowing an error in the query or the result) and present lower bounds and succinct data structures for several variants of the problem. We also extend our model to sliding windows, in which we process a stream of elements and compute suffix sums. This is a generalization of the window summation problem that allows the user to specify the window size at query time. Here, we provide an algorithm that supports updates and queries in constant time while requiring just (1 + o(1)) factor more space than the fixed-window summation algorithms.

Succinct Data Structures for Families of Interval Graphs

arXiv (Cornell University), Feb 25, 2019

We consider the problem of designing succinct data structures for interval graphs with n vertices... more We consider the problem of designing succinct data structures for interval graphs with n vertices while supporting degree, adjacency, neighborhood and shortest path queries in optimal time. Towards showing succinctness, we first show that at least n log 2 n−2n log 2 log 2 n−O(n) bits are necessary to represent any unlabeled interval graph G with n vertices, answering an open problem of Yang and Pippenger [Proc. Amer. Math. Soc. 2017]. This is augmented by a data structure of size n log 2 n + O(n) bits while supporting not only the above queries optimally but also capable of executing various combinatorial algorithms (like proper coloring, maximum independent set etc.) on interval graphs efficiently. Finally, we extend our ideas to other variants of interval graphs, for example, proper/unit interval graphs, kimproper interval graphs, and circular-arc graphs, and design succinct data structures for these graph classes as well along with supporting queries on them efficiently.

Space Efficient Top-k Query Encoding Based on Data Distribution

정보과학회논문지, Mar 31, 2020

We consider an encoding that supports a range top-k query on a two-dimensional array without acce... more We consider an encoding that supports a range top-k query on a two-dimensional array without accessing the original array. We propose a more space-efficient encoding method for top-k query with better average-case query time. Our experiments also show that our encoding is more space-efficient than the earlier ones. Also, based on the learning-based data structure, we propose the use of the learning-based data structure on succinct data structures.

Encoding 2D range maximum queries

Theoretical Computer Science, 2016

We consider the two-dimensional range maximum query (2D-RMQ) problem: given an array containing e... more We consider the two-dimensional range maximum query (2D-RMQ) problem: given an array containing elements from an ordered set, encode the array so that the position of the maximum element in any specified range of rows and range of columns can be found efficiently. We focus on determining the effective entropy of 2D-RMQ, i.e., how many bits are needed to encode an array so that 2D-RMQ queries can be answered without accessing the array. We give tight upper and lower bounds on the expected effective entropy for the case when A contains independent identically-distributed random values, and give new upper and lower bounds for the case when the array contains few rows. The latter results improve upon the upper and lower bounds by Brodal et al. [4]. We also give some efficient data structures for 2D-RMQ whose space usage is close to the effective entropy.