A TPR-tree is a well-known indexing structure that is developed to answer queries about the curre... more A TPR-tree is a well-known indexing structure that is developed to answer queries about the current or future time locations of moving objects. For the purpose of space efficiency, the TPR-tree employs the notion of VBR (velocity bounding rectangle) so that a regional rectangle presents varying positions of a group of moving objects. Since the rectangle computed from a VBR always encloses the possible maximum range of an indexed object group, a search process only has to follow VBR-based rectangles overlapped with a given query range, while searching toward candidate leaf nodes. Although the TPR-tree index shows up its space efficiency, it easily suffers from the problem of dead space that results from fast and constant expansions of VBR-based rectangles. Against this, the TPR-tree index is enforced to update leaf nodes for reducing dead spaces within them. Such an update-prone feature of the TPR-tree becomes more problematic when the tree is saved in flash storage. This is because ...
The Journal of the Institute of Webcasting, Internet and Telecommunication, 2019
In this paper, we propose a new logging and recovery scheme that is suited for the high-performan... more In this paper, we propose a new logging and recovery scheme that is suited for the high-performance transaction processing system base on flash memory storage. The proposed scheme is designed by considering flash's I/O characteristic of asymmetric costs between page update/read operations. That is, we substitute the costly update operation with writing and real-time usage of snapshot log, which is for the page-level physical redo. From this, we can avoid costly rewriting of a dirty page when it is evicted form a buffering pool. while supporting efficient revery procedure. The proposed scheme would be not lucrative in the case of HDD-based system. However, the proposed scheme offers the performance advance sush as a reduced number of updates and the fast system recovery time, in the case of flash storage such as SSD (solid state drive). Because the proposed scheme can easily be applied to existing systems by saving our snapshot records and ordinary log records together, our scheme can be used for improving the performance of upcoming SSD-based database systems through a tiny modification to existing REDO algorithms.
Recently, as the price per bit is decreasing at a fast rate, flash memory is considered to be use... more Recently, as the price per bit is decreasing at a fast rate, flash memory is considered to be used as primary storage of large-scale database systems. Although flash memory shows off its high speeds of page reads, however, it has a problem of noticeable performance degradation in the presence of increasing update workloads. When updates are requested for pages with random page IDs, in particular, the shortcoming of flash tends to impair significantly the overall performance of a flash-based database system. Therefore, it is important to have a way to efficiently update the B+-tree, when it is stored in flash storage. This is because most of updates in the B+-tree arise at leaf nodes, whose page IDs are in random. In this light, we propose a new flash B+-tree that stores up-to-date versions of leaf nodes in sibling-leaf blocks (SLBs), while updating them. The use of SLBs improves the update performance of B-trees and provides the mechanism for fast key range searches. To verify the p...
The B+-tree is the most popular index structure that has been used in the disk-based DBMSs. The f... more The B+-tree is the most popular index structure that has been used in the disk-based DBMSs. The fast key-search times and the efficiency of storage usage are major causes of its popularity during the past time. When we adopt the B+-tree as a primary indexing scheme of databases stored in flash storage, however, its advantages above may diminish because of distinctive I/O features of flash memory. Differently from the hard disk drive, flash memory suffers from considerable performance asymmetry between the speeds of page reading and page updating. Therefore, it is crucial to reduce the amount of page updates in the case of flash-based databases. Since the random updates can severely degrade the storage performance, the efficiency for updating leaf nodes is very important for the B+tree stored in flash storage. In this context, we propose a new way for updating B+-tree leaf nodes at cheap costs. To this end, we devised some new algorithms for tree reconstruction that is performed in t...
Since web applications are accessed by anonymous users via web, more security risks are imposed o... more Since web applications are accessed by anonymous users via web, more security risks are imposed on those applications. In particular, because security vulnerabilities caused by insecure source codes cannot be properly handled by the system-level security system such as the intrusion detection system, it is necessary to eliminate such problems in advance. In this paper, to enhance the security of web applications, we develop a static analyzer for detecting the well-known security vulnerability of PHP file inclusion vulnerability. Using a semantic based static analysis, our vulnerability analyzer guarantees the soundness of the vulnerability detection and imposes no runtime overhead, differently from the other approaches such as the penetration test method and the application firewall method. For this end, our analyzer adopts abstract interpretation framework and uses an abstract analysis domain designed for the detection of the target vulnerability in PHP programs. Thus, our analyzer...
● 요 약 ● 웹 검색에는 기존의 정보검색(Information Retrieval) 시스템에서와 다르게 문서 간 하이퍼링크 정보를 바탕으로 각 웹 문서의 고유 중요도를 추정하... more ● 요 약 ● 웹 검색에는 기존의 정보검색(Information Retrieval) 시스템에서와 다르게 문서 간 하이퍼링크 정보를 바탕으로 각 웹 문서의 고유 중요도를 추정하는 방식이 자주 이용된다. 링크 분석에 기반한 알고리즘 중 PageRank 알고리즘은 구글의 웹 검색 서비스 에 적용된 것으로 알려져 있다. 이런 PageRank 알고리즘에 따라 중요도를 계산하는 경우 색인된 웹 문서수가 증가함에 따라 계산에 필요한 CPU 자원의 사용도 함께 증가하며, 문서 수가 수 억 페이지에 달하면 하나의 서버에서는 계산을 수행할 수 없다 는 문제가 있다. 본 논문에서는 이런 문제점을 해소하기 위해 여러 대의 서버를 PageRank 계산 용 클러스터로 사용할 수 있는 방법을 제시한다. 제시된 방법은 고속의 LAN을 이용하여 여러 대의 서버를 연결하고 반복적인 행렬 계산을 병렬로 수행할 수 있어 계산 시간을 단축시킬 수 있다. 이런 서버 클러스터 구현을 위해 멀티 쓰레딩 프로그램이 작성되었으며, PageRank 계산에 사용되는 행렬 데이터를 적은 양의 메모리만으로 표현 가능하도록 하였다.
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 2018
A time-series is defined to be a real-number sequence that is monitored in accordance with a part... more A time-series is defined to be a real-number sequence that is monitored in accordance with a particular time interval. To index a large volume of time-series data without excessive dimensionality expansions, the DFT (Discrete Fourier Transform) technique is widely accepted. It is a challenging task to support fast similarity searches on normalized time-series without false dismissals. Here, the normalization pre-processing on time-series is vital for similar-trend searches that are tackled in our work. To address this problem, we locate multiple sub-queries within a given user query, and map them into points in the normalized DFT index space. Then, a joinlike operation is executed using those points and newly computed Euclidian (similarity) distances. We propose a new cost function utilized for deciding sub-queries that may have the smallest intersection in the index space. With this approach, we can enhance the query performance significantly. Through performance evaluation, it is verified that our approach can reduce the query processing time by about 62%, compared to existing one.
ABSTRACT Recently, on-demand streaming service of continuous media (CM) becomes crucial for succe... more ABSTRACT Recently, on-demand streaming service of continuous media (CM) becomes crucial for successful Internet businesses. To ensure quality service of online CM streams, the Sweep scheme was proposed to provide high I/O throughput as well as hiccup-free playback. When this scheme is applied in the system using the zoned disk, however, it may suffer from significant bandwidth losses because of its inherent scheduling inflexibility. Since disk zones in a multi-zone disk have different data transfer rates, much slack time occurs when data requests are made to read data blocks located in inner disk zones. Such slack time cannot be efficiently reclaimed in Sweep. In this paper we propose an EDF-style variant of the Sweep scheme, called the Dynamic Sweep Scheme, in order to handle slack time that increases in the zoned disk.
Recently, with the advent of applications using locations of moving objects, it becomes crucial t... more Recently, with the advent of applications using locations of moving objects, it becomes crucial to develop efficient index schemes for spatio-temporal databases. The TPR *-tree is most popularly accepted as an index structure for processing future-time queries. In the TPR *-tree, the future locations of moving objects are predicted based on the CBR(Conservative Bounding Rectangle). Since the areas predicted from CBRs tend to grow rapidly over time, CBRs thus enlarged lead to serious performance degradation in query processing. Against the problem, we propose a new method to adjust CBRs to be tight, thereby improving the performance of query processing. Our method examines whether the adjustment of a CBR is necessary when accessing a leaf node for processing a user query. Thus, it does not incur extra disk I/Os in this examination. Also, in order to make a correct decision, we devise a cost model that considers both the I/O overhead for the CBR adjustment and the performance gain in the future-time owing to the CBR adjustment. With the cost model, we can prevent unusual expansions of BRs even when updates on nodes are infrequent and also avoid unnecessary execution of the CBR adjustment. For performance evaluation, we conducted a variety of experiments. The results show that our method improves the performance of the original TPR *-tree significantly.
Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015
In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped w... more In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped with SSD (Solid State Drive). We propose a novel buffering scheme that evicts a dirty page X without flushing it into SSD, and restores the right image of X when X is requested for later access. The restoration of X having previous flushing-less eviction is performed through our online redo actions on X. We call this page-restoring online redo the on-the-fly redo. Although our on-the-fly redo mechanism has some overhead of increasing the number of page reads, this can be compensated by infrequent page updates. Additionally, since the proposed buffering scheme with the on-the-fly redo can easily support the no-steal policy in buffer management, we can enjoy the advantages of smaller logging overhead and faster recovery. Through the TPC-C benchmarks using a Berkeley DB, we show that our scheme shortens the transaction processing times by up to 53%.
Recently, the widespread use of mobile devices, such as smartphones and tablet PCs, leads to a ne... more Recently, the widespread use of mobile devices, such as smartphones and tablet PCs, leads to a new demand for distributed mobile applications. When the mobile application has a long lifetime and it is comprised of many parallel tasks, it is required to safely checkpoint its processing states against abrupt failure. To this end, lots of works have been done to reduce the wireless traffic overhead caused by distributed checkpointing protocols. In this paper, we also propose a new recovery scheme with less networking overhead and high flexibility in its protocol. For this, we deploy logging agents across mobile support stations so that they can gather the causality dependency vectors of the involved application processes without the use of wireless data transmission. Because of these features, our scheme provides the benefits of high flexibility during the checkpointing time and low traffic overhead, while preventing severe cascaded rollbacks efficiently.
Because of the fast growing volume of web documents during the past decades, the efficiency of th... more Because of the fast growing volume of web documents during the past decades, the efficiency of the web search engine has become more crucial than ever. Such efficiency can be estimated with both factors of the query relevance of search results answered and the financial cost for query processing. Between them, the ways for improving query relevance of web searches have been intensively studied in the research topics like hyperlink-based ranking, topic-sensitive document classifications, and semantic-awareness in rank evaluations. However, there have been not studies that provide an efficient solution to cut the financial cost of query processing, while retaining high query relevance. In this light, we propose a distributed cache scheme and a server-clustering technique that can be used to reduce the query processing cost. With the help of such techniques for accelerating the web query processing, we saved around 70% of the server cost of a commercial web search engine implemented in...
Asia-pacific Journal of Convergent Research Interchange, 2020
As the price per bit of flash storage is rapidly decreasing, diverse research has been done to de... more As the price per bit of flash storage is rapidly decreasing, diverse research has been done to devise flash-aware B+-tree indexes. Since the original B+-tree structure was devised for indexing data records stored in hard disk drives, a naive transplant of the B+-tree into flash may degrade index performance. This is because flash storage suffers from significant performance disparity between update operations and read/write operations. To solve the problem, we adopt the probabilistic index structure, called the Bloom filter. By using the Bloom filter, we make some free space in each node whose child nodes are leaf nodes. We refer to such a node as the BF node. In the free space of a BF node, our proposed F 2 B+-tree stores update logs in order to save histories of key inserts or deletes that have arisen in leaf nodes. Since B+-tree's nodes except for leaf nodes are usually manipulated in a memory buffer pool, the F 2 B+-tree can considerably reduce the amount of physical updates on flash storage. Additionally, we cluster a set of sibling leaf nodes in a flash block such that garbage collection can be cheaply performed without full-merges or half-merges. As a result, the F 2 B+-tree can prevent unpredictable fluctuations in performance of flash-based databases, which could be caused by background-mode actions needed for garbage collection.
Thanks to remarkably fast random reads and rapidly decreasing prices per bit, flash storage has b... more Thanks to remarkably fast random reads and rapidly decreasing prices per bit, flash storage has been regarded as a promising alternative to traditional hard disk drives (HDDs). Although flash storage has many distinguished hardware features, it still suffers from the poor I/O performance in the case of update operations. Due to the absence of in-place updates, differently from HDDs, flash storage needs to modify data through out-of-place updates. For this reason, it is required to continuously renew the mapping information between a logical page address and its new physical address, invalidating its old physical address. When the invalidated pages swallow most of free space in flash storage, the actions of garbage reclamation are needed. Since the actions of garbage reclamation are very costly, it is crucial to reduce the number of update operations for the use of flash storage in enterprise-scale database systems. In this light, we propose a new buffering scheme that evicts dirty p...
A TPR-tree is a well-known indexing structure that is developed to answer queries about the curre... more A TPR-tree is a well-known indexing structure that is developed to answer queries about the current or future time locations of moving objects. For the purpose of space efficiency, the TPR-tree employs the notion of VBR (velocity bounding rectangle) so that a regional rectangle presents varying positions of a group of moving objects. Since the rectangle computed from a VBR always encloses the possible maximum range of an indexed object group, a search process only has to follow VBR-based rectangles overlapped with a given query range, while searching toward candidate leaf nodes. Although the TPR-tree index shows up its space efficiency, it easily suffers from the problem of dead space that results from fast and constant expansions of VBR-based rectangles. Against this, the TPR-tree index is enforced to update leaf nodes for reducing dead spaces within them. Such an update-prone feature of the TPR-tree becomes more problematic when the tree is saved in flash storage. This is because ...
The Journal of the Institute of Webcasting, Internet and Telecommunication, 2019
In this paper, we propose a new logging and recovery scheme that is suited for the high-performan... more In this paper, we propose a new logging and recovery scheme that is suited for the high-performance transaction processing system base on flash memory storage. The proposed scheme is designed by considering flash's I/O characteristic of asymmetric costs between page update/read operations. That is, we substitute the costly update operation with writing and real-time usage of snapshot log, which is for the page-level physical redo. From this, we can avoid costly rewriting of a dirty page when it is evicted form a buffering pool. while supporting efficient revery procedure. The proposed scheme would be not lucrative in the case of HDD-based system. However, the proposed scheme offers the performance advance sush as a reduced number of updates and the fast system recovery time, in the case of flash storage such as SSD (solid state drive). Because the proposed scheme can easily be applied to existing systems by saving our snapshot records and ordinary log records together, our scheme can be used for improving the performance of upcoming SSD-based database systems through a tiny modification to existing REDO algorithms.
Recently, as the price per bit is decreasing at a fast rate, flash memory is considered to be use... more Recently, as the price per bit is decreasing at a fast rate, flash memory is considered to be used as primary storage of large-scale database systems. Although flash memory shows off its high speeds of page reads, however, it has a problem of noticeable performance degradation in the presence of increasing update workloads. When updates are requested for pages with random page IDs, in particular, the shortcoming of flash tends to impair significantly the overall performance of a flash-based database system. Therefore, it is important to have a way to efficiently update the B+-tree, when it is stored in flash storage. This is because most of updates in the B+-tree arise at leaf nodes, whose page IDs are in random. In this light, we propose a new flash B+-tree that stores up-to-date versions of leaf nodes in sibling-leaf blocks (SLBs), while updating them. The use of SLBs improves the update performance of B-trees and provides the mechanism for fast key range searches. To verify the p...
The B+-tree is the most popular index structure that has been used in the disk-based DBMSs. The f... more The B+-tree is the most popular index structure that has been used in the disk-based DBMSs. The fast key-search times and the efficiency of storage usage are major causes of its popularity during the past time. When we adopt the B+-tree as a primary indexing scheme of databases stored in flash storage, however, its advantages above may diminish because of distinctive I/O features of flash memory. Differently from the hard disk drive, flash memory suffers from considerable performance asymmetry between the speeds of page reading and page updating. Therefore, it is crucial to reduce the amount of page updates in the case of flash-based databases. Since the random updates can severely degrade the storage performance, the efficiency for updating leaf nodes is very important for the B+tree stored in flash storage. In this context, we propose a new way for updating B+-tree leaf nodes at cheap costs. To this end, we devised some new algorithms for tree reconstruction that is performed in t...
Since web applications are accessed by anonymous users via web, more security risks are imposed o... more Since web applications are accessed by anonymous users via web, more security risks are imposed on those applications. In particular, because security vulnerabilities caused by insecure source codes cannot be properly handled by the system-level security system such as the intrusion detection system, it is necessary to eliminate such problems in advance. In this paper, to enhance the security of web applications, we develop a static analyzer for detecting the well-known security vulnerability of PHP file inclusion vulnerability. Using a semantic based static analysis, our vulnerability analyzer guarantees the soundness of the vulnerability detection and imposes no runtime overhead, differently from the other approaches such as the penetration test method and the application firewall method. For this end, our analyzer adopts abstract interpretation framework and uses an abstract analysis domain designed for the detection of the target vulnerability in PHP programs. Thus, our analyzer...
● 요 약 ● 웹 검색에는 기존의 정보검색(Information Retrieval) 시스템에서와 다르게 문서 간 하이퍼링크 정보를 바탕으로 각 웹 문서의 고유 중요도를 추정하... more ● 요 약 ● 웹 검색에는 기존의 정보검색(Information Retrieval) 시스템에서와 다르게 문서 간 하이퍼링크 정보를 바탕으로 각 웹 문서의 고유 중요도를 추정하는 방식이 자주 이용된다. 링크 분석에 기반한 알고리즘 중 PageRank 알고리즘은 구글의 웹 검색 서비스 에 적용된 것으로 알려져 있다. 이런 PageRank 알고리즘에 따라 중요도를 계산하는 경우 색인된 웹 문서수가 증가함에 따라 계산에 필요한 CPU 자원의 사용도 함께 증가하며, 문서 수가 수 억 페이지에 달하면 하나의 서버에서는 계산을 수행할 수 없다 는 문제가 있다. 본 논문에서는 이런 문제점을 해소하기 위해 여러 대의 서버를 PageRank 계산 용 클러스터로 사용할 수 있는 방법을 제시한다. 제시된 방법은 고속의 LAN을 이용하여 여러 대의 서버를 연결하고 반복적인 행렬 계산을 병렬로 수행할 수 있어 계산 시간을 단축시킬 수 있다. 이런 서버 클러스터 구현을 위해 멀티 쓰레딩 프로그램이 작성되었으며, PageRank 계산에 사용되는 행렬 데이터를 적은 양의 메모리만으로 표현 가능하도록 하였다.
Proceedings of the 33rd Annual ACM Symposium on Applied Computing, 2018
A time-series is defined to be a real-number sequence that is monitored in accordance with a part... more A time-series is defined to be a real-number sequence that is monitored in accordance with a particular time interval. To index a large volume of time-series data without excessive dimensionality expansions, the DFT (Discrete Fourier Transform) technique is widely accepted. It is a challenging task to support fast similarity searches on normalized time-series without false dismissals. Here, the normalization pre-processing on time-series is vital for similar-trend searches that are tackled in our work. To address this problem, we locate multiple sub-queries within a given user query, and map them into points in the normalized DFT index space. Then, a joinlike operation is executed using those points and newly computed Euclidian (similarity) distances. We propose a new cost function utilized for deciding sub-queries that may have the smallest intersection in the index space. With this approach, we can enhance the query performance significantly. Through performance evaluation, it is verified that our approach can reduce the query processing time by about 62%, compared to existing one.
ABSTRACT Recently, on-demand streaming service of continuous media (CM) becomes crucial for succe... more ABSTRACT Recently, on-demand streaming service of continuous media (CM) becomes crucial for successful Internet businesses. To ensure quality service of online CM streams, the Sweep scheme was proposed to provide high I/O throughput as well as hiccup-free playback. When this scheme is applied in the system using the zoned disk, however, it may suffer from significant bandwidth losses because of its inherent scheduling inflexibility. Since disk zones in a multi-zone disk have different data transfer rates, much slack time occurs when data requests are made to read data blocks located in inner disk zones. Such slack time cannot be efficiently reclaimed in Sweep. In this paper we propose an EDF-style variant of the Sweep scheme, called the Dynamic Sweep Scheme, in order to handle slack time that increases in the zoned disk.
Recently, with the advent of applications using locations of moving objects, it becomes crucial t... more Recently, with the advent of applications using locations of moving objects, it becomes crucial to develop efficient index schemes for spatio-temporal databases. The TPR *-tree is most popularly accepted as an index structure for processing future-time queries. In the TPR *-tree, the future locations of moving objects are predicted based on the CBR(Conservative Bounding Rectangle). Since the areas predicted from CBRs tend to grow rapidly over time, CBRs thus enlarged lead to serious performance degradation in query processing. Against the problem, we propose a new method to adjust CBRs to be tight, thereby improving the performance of query processing. Our method examines whether the adjustment of a CBR is necessary when accessing a leaf node for processing a user query. Thus, it does not incur extra disk I/Os in this examination. Also, in order to make a correct decision, we devise a cost model that considers both the I/O overhead for the CBR adjustment and the performance gain in the future-time owing to the CBR adjustment. With the cost model, we can prevent unusual expansions of BRs even when updates on nodes are infrequent and also avoid unnecessary execution of the CBR adjustment. For performance evaluation, we conducted a variety of experiments. The results show that our method improves the performance of the original TPR *-tree significantly.
Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015
In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped w... more In this paper, we address how to reduce the amount of page updates in flash-based DBMS equipped with SSD (Solid State Drive). We propose a novel buffering scheme that evicts a dirty page X without flushing it into SSD, and restores the right image of X when X is requested for later access. The restoration of X having previous flushing-less eviction is performed through our online redo actions on X. We call this page-restoring online redo the on-the-fly redo. Although our on-the-fly redo mechanism has some overhead of increasing the number of page reads, this can be compensated by infrequent page updates. Additionally, since the proposed buffering scheme with the on-the-fly redo can easily support the no-steal policy in buffer management, we can enjoy the advantages of smaller logging overhead and faster recovery. Through the TPC-C benchmarks using a Berkeley DB, we show that our scheme shortens the transaction processing times by up to 53%.
Recently, the widespread use of mobile devices, such as smartphones and tablet PCs, leads to a ne... more Recently, the widespread use of mobile devices, such as smartphones and tablet PCs, leads to a new demand for distributed mobile applications. When the mobile application has a long lifetime and it is comprised of many parallel tasks, it is required to safely checkpoint its processing states against abrupt failure. To this end, lots of works have been done to reduce the wireless traffic overhead caused by distributed checkpointing protocols. In this paper, we also propose a new recovery scheme with less networking overhead and high flexibility in its protocol. For this, we deploy logging agents across mobile support stations so that they can gather the causality dependency vectors of the involved application processes without the use of wireless data transmission. Because of these features, our scheme provides the benefits of high flexibility during the checkpointing time and low traffic overhead, while preventing severe cascaded rollbacks efficiently.
Because of the fast growing volume of web documents during the past decades, the efficiency of th... more Because of the fast growing volume of web documents during the past decades, the efficiency of the web search engine has become more crucial than ever. Such efficiency can be estimated with both factors of the query relevance of search results answered and the financial cost for query processing. Between them, the ways for improving query relevance of web searches have been intensively studied in the research topics like hyperlink-based ranking, topic-sensitive document classifications, and semantic-awareness in rank evaluations. However, there have been not studies that provide an efficient solution to cut the financial cost of query processing, while retaining high query relevance. In this light, we propose a distributed cache scheme and a server-clustering technique that can be used to reduce the query processing cost. With the help of such techniques for accelerating the web query processing, we saved around 70% of the server cost of a commercial web search engine implemented in...
Asia-pacific Journal of Convergent Research Interchange, 2020
As the price per bit of flash storage is rapidly decreasing, diverse research has been done to de... more As the price per bit of flash storage is rapidly decreasing, diverse research has been done to devise flash-aware B+-tree indexes. Since the original B+-tree structure was devised for indexing data records stored in hard disk drives, a naive transplant of the B+-tree into flash may degrade index performance. This is because flash storage suffers from significant performance disparity between update operations and read/write operations. To solve the problem, we adopt the probabilistic index structure, called the Bloom filter. By using the Bloom filter, we make some free space in each node whose child nodes are leaf nodes. We refer to such a node as the BF node. In the free space of a BF node, our proposed F 2 B+-tree stores update logs in order to save histories of key inserts or deletes that have arisen in leaf nodes. Since B+-tree's nodes except for leaf nodes are usually manipulated in a memory buffer pool, the F 2 B+-tree can considerably reduce the amount of physical updates on flash storage. Additionally, we cluster a set of sibling leaf nodes in a flash block such that garbage collection can be cheaply performed without full-merges or half-merges. As a result, the F 2 B+-tree can prevent unpredictable fluctuations in performance of flash-based databases, which could be caused by background-mode actions needed for garbage collection.
Thanks to remarkably fast random reads and rapidly decreasing prices per bit, flash storage has b... more Thanks to remarkably fast random reads and rapidly decreasing prices per bit, flash storage has been regarded as a promising alternative to traditional hard disk drives (HDDs). Although flash storage has many distinguished hardware features, it still suffers from the poor I/O performance in the case of update operations. Due to the absence of in-place updates, differently from HDDs, flash storage needs to modify data through out-of-place updates. For this reason, it is required to continuously renew the mapping information between a logical page address and its new physical address, invalidating its old physical address. When the invalidated pages swallow most of free space in flash storage, the actions of garbage reclamation are needed. Since the actions of garbage reclamation are very costly, it is crucial to reduce the number of update operations for the use of flash storage in enterprise-scale database systems. In this light, we propose a new buffering scheme that evicts dirty p...
Uploads
Papers by Sungchae Lim