Several 64-processor XMT systems have now been shipped to customers and there have been 128-proce... more Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.
Page 1. Real Virtual Environment Applications - Now Paul T. Breen, Jr., Chair Department Head, Ap... more Page 1. Real Virtual Environment Applications - Now Paul T. Breen, Jr., Chair Department Head, Applied Technology The MITRE Corporation, Bedford, MA Georges G. Grinstein, Ph.D. Principal Engineer, Continental C3 The ...
This paper is a preliminary report on a set of experiments designed to compare an immersive, head... more This paper is a preliminary report on a set of experiments designed to compare an immersive, head-tracked VR system to a typical graphics workstation display screen, with respect to whether VR makes it easier for a user to comprehend complex, 3-D objects. Experimental subjects were asked to build a physical replica of a three-dimensional “wire sculpture” which they viewed either physically, on a workstation screen, or in a stereoscopic “boom” VR display. Preliminary results show less speed but slightly fewer errors with the VR display. The slower speed is probably explainable by the overhead involved in moving to and grasping the boom display.
For the past three years, we have been developing augmented reality technology for application to... more For the past three years, we have been developing augmented reality technology for application to a variety of touch labor tasks in aircraft manufacturing and assembly. The system would be worn by factory workers to provide them with better-quality information for performing their tasks than was previously available. Using a see-through head-mounted display (HMD) whose optics are set at a focal length of about 18 in., the display and its associated head tracking system can be used to superimpose and stabilize graphics on the surface of a work piece. This technology would obviate many expensive marking systems now used in aerospace manufacturing. The most challenging technical issue with respect to factory applications of AR is head position and orientation tracking. It requires high accuracy, long- range tracking in a high-noise environment. The approach we have chosen uses a head- mounted miniature video camera. The user's wearable computer system utilizes the camera to find fiducial markings that have been placed on known coordinates on or near the work piece. The system then computes the user's position and orientation relative to the fiducial marks. It is referred to as a `videometric' head tracker. In this paper, we describe the steps we took and the results we obtained in the process of prototyping our videometric head tracker, beginning with analytical and simulation results, and continuing through the working prototypes.
Summary form only given. The author is working on two research projects in Boeing Computer Servic... more Summary form only given. The author is working on two research projects in Boeing Computer Services that have to do with virtual reality technology. The first involves importing aircraft CAD data into a VR environment. Applications include a side range of engineering and design activities, all of which involve being able to view and interact with the CAD geometry as if one were inside an actual physical mockup of the aircraft. He refers to the technology being explored in the second project as "Augmented Reality". This entails the use of a see-through head-mounted display with an optical focal length of about 20 inches, along with VR-style position/orientation sensing system. The intended application area is in touch labor manufacturing: superimposing diagrams or text onto the surface of a workpiece and stabilizing it there on specific coordinates, so that the appropriate information needed by a factory worker for each step of a manufacturing or assembly operation appears on the surface of the workpiece as if it were painted there. The hardest technical problem for augmented reality is position tracking. Long-range head position/orientation sensing systems that can operate in factory environments are needed. This requirement and others give rise to some interesting computational problems, including wearer registration and position sensing using image processing.<<ETX>>
Much of the early domain-specific success with graph analytics has been with algorithms whose res... more Much of the early domain-specific success with graph analytics has been with algorithms whose results are based on global graph structure. An example of such an algorithm is betweenness centrality, whose value for any vertex potentially depends on the number of shortest paths between all pairs of vertices in the entire graph. YarcData's UrikaTM customers use SPARQL's graph-oriented pattern-matching capabilities, but many of them also require a capability to call graph functions such as betweenness centrality. This customer feedback led us to combine SPARQL 1.1's query capabilities with classical and emerging graph-analytic algorithms (e.g., community detection, shortest path, betweenness, BadRank). With this capability, a SPARQL query can select a specific subgraph of interest, pass that subgraph to a graph algorithms for deep analysis, and then pass those results back to an enclosing SPARQL query that post-processes those results as needed. With the Summer 2014 Urika release, we have extended the SPARQL implementation with a graph-function capability and a small set of built-in graph functions. We describe our design approach and our experiences with this first release, including anecdotal evidence of dramatically higher performance. Built-in graph functions represent an important step in the maturation of graph analysis and SPARQL. As common motifs emerge from use cases, those motifs may be mapped to specific graph functions that can be highly tuned for much higher performance than will be possible for SPARQL. Identifying those motifs and developing the underlying graph functions to accelerate their execution is a topic of intense effort industry-wide. Graph functions merged with SPARQL provide a new mechanism by which third-party graph-algorithm developers may expose their algorithms to widespread use.
We describe the design and prototyping steps we have taken toward the implementation of a heads-u... more We describe the design and prototyping steps we have taken toward the implementation of a heads-up, see-through, head-mounted display (HUDSET). Combined with head position sensing and a real world registration system, this technology allows a computer-produced diagram to be superimposed and stabilized on a specific position on a real-world object. Successful development of the HUDset technology will enable cost reductions and efficiency improvements in many of the human-involved operations in aircraft manufacturing, by eliminating templates, formboard diagrams, and other masking devices.
Several wearable computing or ubiquitous computing research projects have detected and distinguis... more Several wearable computing or ubiquitous computing research projects have detected and distinguished user motion activities by attaching accelerometers in known positions and orientations on the user's body. This paper observes that the orientation constraint can probably be relaxed. An estimate of the constant gravity vector can be obtained by averaging accelerometer samples. This gravity vector estimate in turn enables estimation of the vertical component and the magnitude of the horizontal component of the user's motion, independently of how the three-axis accelerometer system is oriented.
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), May 1, 2011
To-date, the application of high-performance computing resources to Semantic Web data has largely... more To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global sharedmemory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely inmemory.
The design of ISI's SDI architecture simulator was intended to minimize the software develop... more The design of ISI's SDI architecture simulator was intended to minimize the software development necessary to add new simulation models to the system or to refine the detail of existing ones. The key software design approach used to accomplish this goal was the modeling of each simulated defense system component by a software object called a 'technology module.' Each technology module provided a carefully defined abstract interface between the component model and the rest of the simulation system, particularly the simulation models of battle managers. This report documents the first test of the validity of this software design approach. A new technology module modeling a kinetic kill vehicle' (KKV) was added to the simulator. Although this technology module had an impact on several parts of the simulation system in the form of new data structures and functions that had to be created, the integration of the new module was accomplished without the necessity of replacing any existing code.
We present an RDFS closure algorithm, specifically designed and implemented on the Cray XMT super... more We present an RDFS closure algorithm, specifically designed and implemented on the Cray XMT supercomputer, that obtains inference rates of 13 million inferences per second on the largest system configuration we used. The Cray XMT, with its large global memory (4TB for our experiments), permits the construction of a conceptually straightforward algorithm, fundamentally a series of operations on a shared hash table. Each thread is given a partition of triple data to process, a dedicated copy of the ontology to apply to the data, and a reference to the hash table into which it inserts inferred triples. The global nature of the hash table allows the algorithm to avoid a common obstacle for distributed memory machines: the creation of duplicate triples. On LUBM data sets ranging between 1.3 billion and 5.3 billion triples, we obtain nearly linear speedup except for two portions: file I/O, which can be ameliorated with the additional service nodes, and data structure initialization, which requires nearly constant time for runs involving 32 processors or more.
Unclassified 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT 2b. DEC... more Unclassified 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT 2b. DECLASSIFICATION/DOWNGRADING SCHEDULE This document is approved for public release; distribution is unlimited. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) ISI/RR-89-225 6a. NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION I (If applicable) USC/Information Sciences Institute j Office of Naval Research 6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code) 4676 Admiralty Way 800 N. Quincy Street Marina del Rey, CA 90292-6695 Arlington, VA 22217 8a. NAME OF FUNDING /SPONSORING 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) SDIO N00014-87-K-0022 8c. ADDRESS (City, State, and ZIP Code) 10.
The concept of a timed Markov model is introduced and proposed as a means of analyzing software d... more The concept of a timed Markov model is introduced and proposed as a means of analyzing software development workflows. A simple model of a single researcher developing a new code for a high performance computer system is proposed, and data from a classroom experiment in programming a high performance system is fitted to the model.
As semantic graph database technology grows to address components ranging from large triple store... more As semantic graph database technology grows to address components ranging from large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a high performance hybrid system comprising computational capability for semantic graph database processing utilizing the multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths of the Billion Triple Challenge 2010 dataset.
As semantic graph database technology grows to address components ranging from extant large tripl... more As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deploying that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.
Several 64-processor XMT systems have now been shipped to customers and there have been 128-proce... more Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.
Page 1. Real Virtual Environment Applications - Now Paul T. Breen, Jr., Chair Department Head, Ap... more Page 1. Real Virtual Environment Applications - Now Paul T. Breen, Jr., Chair Department Head, Applied Technology The MITRE Corporation, Bedford, MA Georges G. Grinstein, Ph.D. Principal Engineer, Continental C3 The ...
This paper is a preliminary report on a set of experiments designed to compare an immersive, head... more This paper is a preliminary report on a set of experiments designed to compare an immersive, head-tracked VR system to a typical graphics workstation display screen, with respect to whether VR makes it easier for a user to comprehend complex, 3-D objects. Experimental subjects were asked to build a physical replica of a three-dimensional “wire sculpture” which they viewed either physically, on a workstation screen, or in a stereoscopic “boom” VR display. Preliminary results show less speed but slightly fewer errors with the VR display. The slower speed is probably explainable by the overhead involved in moving to and grasping the boom display.
For the past three years, we have been developing augmented reality technology for application to... more For the past three years, we have been developing augmented reality technology for application to a variety of touch labor tasks in aircraft manufacturing and assembly. The system would be worn by factory workers to provide them with better-quality information for performing their tasks than was previously available. Using a see-through head-mounted display (HMD) whose optics are set at a focal length of about 18 in., the display and its associated head tracking system can be used to superimpose and stabilize graphics on the surface of a work piece. This technology would obviate many expensive marking systems now used in aerospace manufacturing. The most challenging technical issue with respect to factory applications of AR is head position and orientation tracking. It requires high accuracy, long- range tracking in a high-noise environment. The approach we have chosen uses a head- mounted miniature video camera. The user's wearable computer system utilizes the camera to find fiducial markings that have been placed on known coordinates on or near the work piece. The system then computes the user's position and orientation relative to the fiducial marks. It is referred to as a `videometric' head tracker. In this paper, we describe the steps we took and the results we obtained in the process of prototyping our videometric head tracker, beginning with analytical and simulation results, and continuing through the working prototypes.
Summary form only given. The author is working on two research projects in Boeing Computer Servic... more Summary form only given. The author is working on two research projects in Boeing Computer Services that have to do with virtual reality technology. The first involves importing aircraft CAD data into a VR environment. Applications include a side range of engineering and design activities, all of which involve being able to view and interact with the CAD geometry as if one were inside an actual physical mockup of the aircraft. He refers to the technology being explored in the second project as "Augmented Reality". This entails the use of a see-through head-mounted display with an optical focal length of about 20 inches, along with VR-style position/orientation sensing system. The intended application area is in touch labor manufacturing: superimposing diagrams or text onto the surface of a workpiece and stabilizing it there on specific coordinates, so that the appropriate information needed by a factory worker for each step of a manufacturing or assembly operation appears on the surface of the workpiece as if it were painted there. The hardest technical problem for augmented reality is position tracking. Long-range head position/orientation sensing systems that can operate in factory environments are needed. This requirement and others give rise to some interesting computational problems, including wearer registration and position sensing using image processing.<<ETX>>
Much of the early domain-specific success with graph analytics has been with algorithms whose res... more Much of the early domain-specific success with graph analytics has been with algorithms whose results are based on global graph structure. An example of such an algorithm is betweenness centrality, whose value for any vertex potentially depends on the number of shortest paths between all pairs of vertices in the entire graph. YarcData's UrikaTM customers use SPARQL's graph-oriented pattern-matching capabilities, but many of them also require a capability to call graph functions such as betweenness centrality. This customer feedback led us to combine SPARQL 1.1's query capabilities with classical and emerging graph-analytic algorithms (e.g., community detection, shortest path, betweenness, BadRank). With this capability, a SPARQL query can select a specific subgraph of interest, pass that subgraph to a graph algorithms for deep analysis, and then pass those results back to an enclosing SPARQL query that post-processes those results as needed. With the Summer 2014 Urika release, we have extended the SPARQL implementation with a graph-function capability and a small set of built-in graph functions. We describe our design approach and our experiences with this first release, including anecdotal evidence of dramatically higher performance. Built-in graph functions represent an important step in the maturation of graph analysis and SPARQL. As common motifs emerge from use cases, those motifs may be mapped to specific graph functions that can be highly tuned for much higher performance than will be possible for SPARQL. Identifying those motifs and developing the underlying graph functions to accelerate their execution is a topic of intense effort industry-wide. Graph functions merged with SPARQL provide a new mechanism by which third-party graph-algorithm developers may expose their algorithms to widespread use.
We describe the design and prototyping steps we have taken toward the implementation of a heads-u... more We describe the design and prototyping steps we have taken toward the implementation of a heads-up, see-through, head-mounted display (HUDSET). Combined with head position sensing and a real world registration system, this technology allows a computer-produced diagram to be superimposed and stabilized on a specific position on a real-world object. Successful development of the HUDset technology will enable cost reductions and efficiency improvements in many of the human-involved operations in aircraft manufacturing, by eliminating templates, formboard diagrams, and other masking devices.
Several wearable computing or ubiquitous computing research projects have detected and distinguis... more Several wearable computing or ubiquitous computing research projects have detected and distinguished user motion activities by attaching accelerometers in known positions and orientations on the user's body. This paper observes that the orientation constraint can probably be relaxed. An estimate of the constant gravity vector can be obtained by averaging accelerometer samples. This gravity vector estimate in turn enables estimation of the vertical component and the magnitude of the horizontal component of the user's motion, independently of how the three-axis accelerometer system is oriented.
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), May 1, 2011
To-date, the application of high-performance computing resources to Semantic Web data has largely... more To-date, the application of high-performance computing resources to Semantic Web data has largely focused on commodity hardware and distributed memory platforms. In this paper we make the case that more specialized hardware can offer superior scaling and close to an order of magnitude improvement in performance. In particular we examine the Cray XMT. Its key characteristics, a large, global sharedmemory, and processors with a memory-latency tolerant design, offer an environment conducive to programming for the Semantic Web and have engendered results that far surpass current state of the art. We examine three fundamental pieces requisite for a fully functioning semantic database: dictionary encoding, RDFS inference, and query processing. We show scaling up to 512 processors (the largest configuration we had available), and the ability to process 20 billion triples completely inmemory.
The design of ISI's SDI architecture simulator was intended to minimize the software develop... more The design of ISI's SDI architecture simulator was intended to minimize the software development necessary to add new simulation models to the system or to refine the detail of existing ones. The key software design approach used to accomplish this goal was the modeling of each simulated defense system component by a software object called a 'technology module.' Each technology module provided a carefully defined abstract interface between the component model and the rest of the simulation system, particularly the simulation models of battle managers. This report documents the first test of the validity of this software design approach. A new technology module modeling a kinetic kill vehicle' (KKV) was added to the simulator. Although this technology module had an impact on several parts of the simulation system in the form of new data structures and functions that had to be created, the integration of the new module was accomplished without the necessity of replacing any existing code.
We present an RDFS closure algorithm, specifically designed and implemented on the Cray XMT super... more We present an RDFS closure algorithm, specifically designed and implemented on the Cray XMT supercomputer, that obtains inference rates of 13 million inferences per second on the largest system configuration we used. The Cray XMT, with its large global memory (4TB for our experiments), permits the construction of a conceptually straightforward algorithm, fundamentally a series of operations on a shared hash table. Each thread is given a partition of triple data to process, a dedicated copy of the ontology to apply to the data, and a reference to the hash table into which it inserts inferred triples. The global nature of the hash table allows the algorithm to avoid a common obstacle for distributed memory machines: the creation of duplicate triples. On LUBM data sets ranging between 1.3 billion and 5.3 billion triples, we obtain nearly linear speedup except for two portions: file I/O, which can be ameliorated with the additional service nodes, and data structure initialization, which requires nearly constant time for runs involving 32 processors or more.
Unclassified 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT 2b. DEC... more Unclassified 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT 2b. DECLASSIFICATION/DOWNGRADING SCHEDULE This document is approved for public release; distribution is unlimited. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) ISI/RR-89-225 6a. NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION I (If applicable) USC/Information Sciences Institute j Office of Naval Research 6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code) 4676 Admiralty Way 800 N. Quincy Street Marina del Rey, CA 90292-6695 Arlington, VA 22217 8a. NAME OF FUNDING /SPONSORING 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) SDIO N00014-87-K-0022 8c. ADDRESS (City, State, and ZIP Code) 10.
The concept of a timed Markov model is introduced and proposed as a means of analyzing software d... more The concept of a timed Markov model is introduced and proposed as a means of analyzing software development workflows. A simple model of a single researcher developing a new code for a high performance computer system is proposed, and data from a classroom experiment in programming a high performance system is fitted to the model.
As semantic graph database technology grows to address components ranging from large triple store... more As semantic graph database technology grows to address components ranging from large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a high performance hybrid system comprising computational capability for semantic graph database processing utilizing the multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths of the Billion Triple Challenge 2010 dataset.
As semantic graph database technology grows to address components ranging from extant large tripl... more As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deploying that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.
Uploads
Papers by David Mizell