Papers by Swapnil Chankhore Patil
We examine the problem of scalable file system directories, motivated by data-intensive applicati... more We examine the problem of scalable file system directories, motivated by data-intensive applications requiring millions to billions of small files to be ingested in a single directory at rates of hundreds of thousands of file creates every second. We introduce a POSIX-compliant scalable directory design, GIGA+, that distributes directory entries over a cluster of server nodes. For scalability, each server makes only local, independent decisions about migration for load balancing. GIGA+ uses two internal implementation tenets, asynchrony and eventual consistency, to: (1) partition an index among all servers without synchronization or serialization, and (2) gracefully tolerate stale index state at the clients. Applications, however, are provided traditional strong synchronous consistency semantics. We have built and demonstrated that the GIGA+ approach scales better than existing distributed directory implementations, delivers a sustained throughput of more than 98,000 file creates pe...
This paper presents the design and implementation of DOT, a flexible architecture for data transf... more This paper presents the design and implementation of DOT, a flexible architecture for data transfer. This architecture separates content negotiation from the data transfer itself. Applications determine what data they need to send and then use a new transfer service to send it. This transfer service acts as a common interface between applications and the lower-level network layers, facilitating innovation both above and below. The transfer service frees developers from re-inventing transfer mechanisms in each new application. New transfer mechanisms, in turn, can be easily deployed without modifying existing applications. We discuss the benefits that arise from separating data transfer into a service and the challenges this service must overcome. The paper then examines the implementation of DOT and its plugin framework for creating new data transfer mechanisms. A set of microbenchmarks shows that the DOT prototype performs well, and that the overhead it imposes is unnoticeable in t...
High performance computing applications are becoming increasingly widespread in a large number of... more High performance computing applications are becoming increasingly widespread in a large number of fields. However the performance of I/O sub-systems within such HPC computing environments has not kept pace with extreme processing and communication speeds of such computing clusters. The problem of high performance is tackled by system architects by employing a variety of storage technologies such as Parallel File Systems. While such solutions serve to significantly alleviate the problem of I/O performance scaling they still a lot of roam for improvement because they not endeavor to significanly scale the performance of meta-data operations. Such a situation can easily arise in database or telecommunication applications that create thousands of files per second in a single directory. When this happens, the consequent performance degradation effected by the slowdown of meta-data operations can severely slowdown the performance of the overall I/O system. GIGA+ affords a potential soluti...
Modern science has available to it, and is more productively pursued with, massive amounts of dat... more Modern science has available to it, and is more productively pursued with, massive amounts of data, typically either gathered from sensors or output from some simulation or processing. The table below shows a sampling of data sets that a few scientists at Carnegie Mellon University have available to them or intend to construct soon. Data Intensive Scalable Computing (DISC) couples computational resources with the data storage and access capabilities to handle massive data science quickly and efficiently. Our topic in this extended abstract is the effectiveness of the data intensive file systems embedded in a DISC system. We are interested in understanding the differences between data intensive file system implementations and high performance computing (HPC) parallel file system implementations. Both are used at comparable scale and speed. Beyond feature inclusions, which we expect to evolve as data intensive file systems see wider use, we find that performance does not need to be va...
Data-intensive distributed file systems are emerging as a key component of large scale Internet s... more Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classify file systems for large clusters into two disjoint categories, those for Internet services and those for high performance computing. In this paper we compare and contrast parallel file systems, developed for high performance computing, and data-intensive distributed file systems, developed for Internet services. Using PVFS as a representative for parallel file systems and HDFS as a representative for Internet services file systems, we configure a parallel file system into a data-intensive Internet services stack, Hadoop, and test performance with microbenchmarks and macrobenchmarks running on...
2004 First Annual IEEE Communications Society Conference on Sensor and Ad Hoc Communications and Networks, 2004. IEEE SECON 2004.
This paper considers serial fusion as a mechanism for collaborative signal detection. The advanta... more This paper considers serial fusion as a mechanism for collaborative signal detection. The advantage of this technique is that it can use only the sensor observations that are really necessary for signal detection and thus can be very communication efficient. We develop the signal processing mechanisms for serial fusion based on simple models. We also develop a space-filling curve-based routing mechanism for message routing to implement serial fusion. We demonstrate via simulations that serial fusion with curve-based routing performs better, both in terms of detection errors and message cost, relative to commonly used mechanisms such as parallel fusion with a tree-based aggregation scheme.
2004 IEEE Wireless Communications and Networking Conference (IEEE Cat. No.04TH8733)
AbstractCurrent security mechanisms in ad-hoc sensor net-works do not guarantee reliable and rob... more AbstractCurrent security mechanisms in ad-hoc sensor net-works do not guarantee reliable and robust network functionality. Even with these mechanisms, the sensor nodes could be made nonoperational by malicious attackers or physical break-down of the infrastructure. ...
2013 IEEE International Symposium on Technology and Society (ISTAS): Social Implications of Wearable Computing and Augmediated Reality in Everyday Life, 2013
Privacy risks have been addressed through technical solutions such as Privacy-Enhancing Technolog... more Privacy risks have been addressed through technical solutions such as Privacy-Enhancing Technologies (PETs) as well as regulatory measures including Do Not Track. These approaches are inherently limited as they are grounded in the paradigm of a rational end user who can determine, articulate, and manage consistent privacy preferences. This assumes that self-serving efforts to enact privacy preferences lead to socially optimal outcomes with regard to information sharing. We argue that this assumption typically does not hold true. Consequently, solutions to specific risks are developed-even mandated-without effective reduction in the overall harm of privacy breaches. We present a systematic framework to examine these limitations of current technical and policy solutions. To address the shortcomings of existing privacy solutions, we argue for considering information sharing to be transactions within a community. Outcomes of privacy management can be improved at a lower overall cost if peers, as a community, are empowered by appropriate technical and policy mechanisms. Designing for a community requires encouraging dialogue, enabling transparency, and supporting enforcement of community norms. We describe how peer production of privacy is possible through PETs that are grounded in the notion of information as a common-pool resource subject to community governance.
2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12), 2012
Increased need of the automatic authentication of persons has led to extensive researches in biom... more Increased need of the automatic authentication of persons has led to extensive researches in biometrics. Among all biometrics, iris recognition is one of the most promising methods due to rich and unique textures of the iris, noninvasiveness, stability of iris pattern throughout the human life time, public acceptance, and availability of user friendly capturing devices. Iris segmentation is the vital step in iris recognition systems because all subsequent steps depend highly on its precision. For instance, even an effective feature extraction method would not be able to obtain useful information from an iris image that is not segmented accurately which will unavoidably result in poor recognition performance. A robust method for iris segmentation should be used to remove the influence of the noises as much as possible. In this paper, we present accuracy based comparative analysis of the three different methods for iris segmentation viz. Geodesic Active Contours (GACs), traditional Integrodifferential operator and Hough transform. Along with accurate segmentation the quality enhancement of encoded template is done by employing super resolution based on sparse signal representation approach. By directly super-resolving only the features essential for recognition, obtained from accurately segmented irises, recognition performance improvement is achieved. CASIA Interval version3 dataset is used for the experimentation in MATLAB based implementation.
2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013
The rapid growth of the World Wide Web has increased the online communication. The use of social ... more The rapid growth of the World Wide Web has increased the online communication. The use of social networking sites is one of the important approaches for communication. In order to improve the textual methods of communication such as tweets, blogs and chat; it is needed to analyze the emotion of user by studying the input text. Much of the current work in this area of emotion recognition from text has typically focused on recognizing the polarity of sentiment (positive/negative). This paper presents a novel approach for emotion estimation from the text entered by user on social networking sites. The work proposed in this paper uses affective words and sentence context analysis methods for emotion recognition. Also for helping the users to effectively express their emotion, we have developed visual image generation approach that generates images according to emotion in text. The results of evaluations show that the proposed system obtains better results for emotion prediction than the existing methods.
Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11, 2011
Inspired by Google's BigTable, a variety of scalable, semistructured, weak-semantic table stores ... more Inspired by Google's BigTable, a variety of scalable, semistructured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and finegrained security.
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11, 2011
SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, 2014
The growing size of modern storage systems is expected to exceed billions of objects, making meta... more The growing size of modern storage systems is expected to exceed billions of objects, making metadata scalability critical to overall performance. Many existing distributed file systems only focus on providing highly parallel fast access to file data, and lack a scalable metadata service. In this paper, we introduce a middleware design called IndexFS that adds support to existing file systems such as PVFS, Lustre, and HDFS for scalable high-performance operations on metadata and small files. IndexFS uses a table-based architecture that incrementally partitions the namespace on a per-directory basis, preserving server and disk locality for small directories. An optimized log-structured layout is used to store metadata and small files efficiently. We also propose two client-based stormfree caching techniques: bulk namespace insertion for creation intensive workloads such as N-N checkpointing; and stateless consistent metadata caching for hot spot mitigation. By combining these techniques, we have demonstrated IndexFS scaled to 128 metadata servers. Experiments show our out-of-core metadata throughput out-performing existing solutions such as PVFS, Lustre, and HDFS by 50% to two orders of magnitude.
Frontiers in Endocrinology, 2014
The worldwide increase in the prevalence of Diabetes mellitus (DM) has highlighted the need for i... more The worldwide increase in the prevalence of Diabetes mellitus (DM) has highlighted the need for increased research efforts into treatment options for both the disease itself and its associated complications. In recent years, mesenchymal stromal cells (MSCs) have been highlighted as a new emerging regenerative therapy due to their multipotency but also due to their paracrine secretion of angiogenic factors, cytokines, and immunomodulatory substances. This review focuses on the potential use of MSCs as a regenerative medicine in microvascular and secondary complications of DM and will discuss the challenges and future prospects of MSCs as a regenerative therapy in this field. MSCs are believed to have an important role in tissue repair. Evidence in recent years has demonstrated that MSCs have potent immunomodulatory functions resulting in active suppression of various components of the host immune response. MSCs may also have glucose lowering properties providing another attractive and unique feature of this therapeutic approach. Through a combination of the above characteristics, MSCs have been shown to exert beneficial effects in pre-clinical models of diabetic complications prompting initial clinical studies in diabetic wound healing and nephropathy. Challenges that remain in the clinical translation of MSC therapy include issues of MSC heterogeneity, optimal mode of cell delivery, homing of these cells to tissues of interest with high efficiency, clinically meaningful engraftment, and challenges with cell manufacture. An issue of added importance is whether an autologous or allogeneic approach will be used. In summary, MSC administration has significant potential in the treatment of diabetic microvascular and secondary complications but challenges remain in terms of engraftment, persistence, tissue targeting, and cell manufacture
Proceedings of the twentieth ACM symposium on Operating systems principles - SOSP '05, 2005
This WIP proposes a new architecture for applications that perform bulk data transfers. This arch... more This WIP proposes a new architecture for applications that perform bulk data transfers. This architecture, called DOT (for data-oriented transfer), cleanly separates out two functions that are comingled in today's applications. Using DOT, applications perform content negotiation to determine what content to send. They then pass that data object to the transfer service to perform the actual data transmission. This separation increases application flexibility, enables the rapid development of innovative transfer mechanisms, reduces developer effort, and allows increased efficiency through cross-application sharing.
Scripta Materialia, 2009
... Figure 2. [0 0 1] IPF plot showing the change in crystal orientation (a) along the path A–B–C... more ... Figure 2. [0 0 1] IPF plot showing the change in crystal orientation (a) along the path A–B–C–D indicated in Figure 1 for the undeformed specimen and (b) along the path A–B at an LPD of 0.2 mm. ... J. Fract., 124 (2003), p. 43. [19] SD Patil, R. Narasimhan and RK Mishra. J. Mech. ...
Proceedings of the 2nd international workshop on Petascale data storage held in conjunction with Supercomputing '07 - PDSW '07, 2007
2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012
IEEE International Conference on Electro-Information Technology , EIT 2013, 2013
ABSTRACT
Uploads
Papers by Swapnil Chankhore Patil