Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010
…
9 pages
1 file
Abstract Monitoring systems are necessary for the management of anything beyond the smallest networks of computers. While specialised monitoring systems can be deployed to detect specific problems, more general systems are required to detect unexpected issues, and track performance trends.
2008
There is no existing monitoring system that can scale to monitor large fleets of computers and provide effective visualisation. The overall goal of the project is to develop a system that will allow effective monitoring of a large fleet of computers. The project will consist of 3 major parts: Collection, Storage & Retrieval, and Visualisation.
Computer Communications, 1996
Continuous monitoring of computer network performance is probably the only solution which allows prompt identification of anomalous functioning conditions and provides knowledge of parameters on which to base recovery interventions. A number of tools for performance management already exist, but their effectiveness is limited as they are essentially inserted inside owner network management solutions. In this paper we describe the realization of PMt, a platform for the development of performance management applications for the control and real-time management of a heterogeneous computer network. Ph4t uses an object-oriented approach in managing the various elements in the distributed system, and defines a methodology for the design of applications to monitor performance. The methodology is mainly destined for use on distributed systems, and provides the user with a set of tools and services that can be used to develop management applications.
2016
In this paper we present the concept of a scalable job centric monitoring infras-tructure. The overall performance of this distributed, layer based architecture called SLAte can be increased by installing additional servers to adapt to the demands of the monitored resources and users. Another important aspect is to offer a uniform global view on all data which are stored distributed to provide an easy access for users or visualisation tools. Additionally we discus the impact of these uniform access layer on scalability.
Cluster Computing - CLUSTER, 2002
Distributed systems based on cluster of workstation are more and more difficult to manage due to the increasing number of processors involved, and the complexity of associated applications. Such systems need efficient and flexible monitoring mechanisms to fulfill administration services requirements. In this paper, we present PHOENIX a distributed platform supporting both applications and operating system monitoring with a variable granularity. The granularity is defined using logical expressions to specify complex monitoring conditions. These conditions can be dynamically modified during the application execution. Observation techniques, based on an automatic probe insertion combined with a system agent to minimize the PHOENIX execution time overhead. The platform extensibility offers a suitable environment to design distributed value added services (performance monitoring, load balancing, accounting, cluster management, etc.).
This paper addresses a central challenge in PRISM, a large-scale distributed monitoring system: coping with the uncertainties and ambiguities introduced by network and node failures. In particular, in a large scale monitoring system, such failures interact badly with techniques needed for scalability like hierarchy, arithmetic filtering, and temporal batching. For example, if a monitoring subtree is silent over an interval, it is difficult to distinguish between two cases: (a) the subtree has sent no updates because the inputs have not significantly changed or (b) the inputs have significantly changed but the subtree is unable to transmit its report. As a result, reported results can be arbitrarily far from their true values.
Applications running in large-scale distributed systems face many challenges and difficulties. Constraints imposed to such systems need to be thoroughly checked in order to ensure a proper service delivery to the client. The current paper proposes a monitoring solution for large-scale distributed systems relying on abstract state machines. Data gathered from the monitoring components are used in calculating metrics and establishing a diagnosis for the system. Emphasis is put on failure detection and on ensuring non-functional requirements of the system such as fault-tolerance and resilience. The model introduced in this paper will be integrated in a cloud-enabled large-scale distributed system. The novelty of the solution consists of finding the best integration architecture for state-of-the-art algorithms and tools and refining them to an efficient version for large-scale distributed systems.
[1992 Proceedings] Second Workshop on the Management of Replicated Data
Modeling the reliability of distributed systems requires a good understanding of the reliability of the components. Careful modeling allows highly fault-tolerant distributed applications to be constructed at the least cost. Realistic estimates can be found by measuring the performance of actual systems. An enormous amount of information about system performance can be acquired with no special privileges via the Internet. A distributed monitoring tool called a tattler is described. The system is composed of a group of tattler processes that monitor a set of selected hosts. The tattlers cooperate to provide a fault-tolerant distributed data base of information about the hosts they monitor. They use weak-consistency replication techniques to ensure their own fault-tolerance and the eventual consistency of the data base that they maintain.
1998
Workstation clusters have off late become a cost-effective solution for high performance computing. C-DAC's PARAM OpenFrame is a large cluster of high performance workstations interconnected through low-latency, high bandwidth communication networks. Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters. System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem. PARMON is a portable, flexible, interactive, scalable, location-transparent, and comprehensive environment for monitoring of large clusters. It follows client-server methodology and provides transparent access to all nodes to be monitored from a monitoring machine. PARMON allows to monitor critical system resources activities and their utilization at three different levels: entire system, node, and component level. It allo...
2002
Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. Determining the source of the performance problems requires detailed end-to-end instrumentation of all components, including the applications, operating systems, hosts, and networks. However, one must be very careful to design the instrumentation to have extremely low overhead, and not affect the system being monitored. In this paper we present a very light-weight instrumentation system that can be dynamically activated to unobtrusively collect and aggregate detailed end-to-end monitoring information from distributed applications. We also show how emerging "web services" can be used to facilitate remote interaction with this system.
In "Borderlands Histories", edited by John Carpenter and Matthew Pailes (University of Utah Press), 2022
Revue archéologique du centre de la France, 2017
Ulusal Yapay Zeka Strateji Belgemiz Nasıl Daha İyi Bir Noktaya Taşınabilir?, 2024
Militarne tradycje Kędzierzyna-Koźla, Śląska i Rzeczypospoliteej, tom II, , 2013
Revealing/Reveiling Shanghai: Cultural Representations from the Twentieth and Twenty-First Centuries , 2020
Revista da Faculdade de Direito, Universidade de São Paulo, 2021
REDHECS: Revista electrónica de Humanidades, Educación y Comunicación Social, 2010
Applied Catalysis B: Environmental, 2012
Ciencia Huasteca Boletín Científico de la Escuela Superior de Huejutla, 2015
Revista de Derecho agrario y alimentación, 2017
Mübadil Kentler: Türkçe Konuşan Rum Ortodokslar, 2019
Egyptian Rheumatology and Rehabilitation, 2014
Journal of Optical Communications and Networking, 2010
Journal of Water Process Engineering, 2016
Experiências em ensino de ciências, 2019
Archives of Virology, 2008
Infection Control & Hospital Epidemiology, 2002