Optimizing the design of water distribution systems often faces difficulties due to continuous va... more Optimizing the design of water distribution systems often faces difficulties due to continuous variations in water demands, pressure requirements, and disinfectant concentrations. The complexity of this optimization even increases when trying to optimize both the hydraulic and the water quality design models. Most of the previous works in the literature did not investigate the linkage between both models, either by combining them into one general model or by selecting any representative solution to proceed from one model to another. This work introduces an integrated two-step framework to optimize both designs while investigating the reasonable network configuration selection from the hydraulic design view before proceeding to the water quality design. The framework is mainly based on a modified version of the multi-objective particle swarm optimization algorithm. The algorithm’s first step is optimizing the hydraulic design of the network by minimizing the system’s capital cost whi...
Background and objective: Selective electrical stimulation of target brain locations (stimulation... more Background and objective: Selective electrical stimulation of target brain locations (stimulation focality) is a difficult problem because it comprises conflicting goals. The stimulating current density field needs to be strong enough to stimulate targeted locations but weak enough not to stimulate nearby non-targeted locations. The objective of this study is to suggest a methodology for improving electrical stimulation focality based on time-division multiplexing principle. Proposed methodology: The complex problem of exciting a group of target locations is decomposed into a series of simpler problems in which a single location is targeted. Time-division multiplexing between the solutions of the simpler problems achieves seemingly parallel excitation of the selected target locations with minimal excitation of non-targeted locations. Results: A high fidelity finite element-based simulation of a cortical vision prosthesis is used to demonstrate the proposed idea and highlight important facts about neurons dynamics that must be taken into consideration in order to design a successful time-division multiplexing based stimulation scheme. Conclusion: The study offers a clear detailed procedure for designing focal electrical stimulation setups based on time division multiplexing principle. The included results and experiments prove that the proposed strategy is a step forward towards more focal stimulation setups.
We introduce pyMune, an open-source Python library for robust clustering of complex real-world da... more We introduce pyMune, an open-source Python library for robust clustering of complex real-world datasets without density cutoff parameters. It implements DenMune (Abbas et al., 2021), a mutual nearest neighbor algorithm that uses dimensionality reduction and approximate nearest neighbor search to identify and expand cluster cores. Noise is removed with a mutual nearest-neighbor voting system. In addition to clustering, pyMune provides classification, visualization, and validation functionalities. It is fully compatible with scikit-learn and has been accepted into the scikit-learn-contrib repository. The code, documentation, and demos are available on GitHub, PyPi, and CodeOcean for easy use and reproducibility.
2022 10th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC)
Traffic fatalities are increasing in developing countries where there are few investments in road... more Traffic fatalities are increasing in developing countries where there are few investments in road safety. Culture and road conditions also affect driving habits. Therefore, automatic detection and reporting of driver behavior to concerned entities can potentially save lives. In particular, we analyze a driving maneuvers dataset collected from one environment (country) but tested in another environment with aggressive driving habits and irregular road conditions. We also develop an on-edge system with fast response time to serve users on a large scale. Specifically, we propose an approach for detecting aggressive and normal events using random forest classifier. We utilize the accelerometer and gyroscope smartphone readings to classify driving maneuvers events to five types (aggressive acceleration, suddenly break, aggressive turn right, aggressive turn left, and normal). We achieved an accuracy of only 63.4% by training our model on an available dataset collected from a foreigner environment and tested on our environment. The lowest precision value was 54% while the lowest recall was 42%. However, we achieved an accuracy of 98.4% when augmenting an available dataset with data collected with our application. The lowest precision value was 98% while the lowest recall was 90%. From the results, it is shown that the available datasets do not generalize well to different driving habits and road conditions. Finally, an implementation of the random forest model using OpenCV on an Android platform is analyzed.
Node embedding refers to learning or generating low-dimensional representations for nodes in a gi... more Node embedding refers to learning or generating low-dimensional representations for nodes in a given graph. In the era of big data and large graphs, there has been a growing interest in node embedding across a wide range of applications, ranging from social media to healthcare. Numerous research efforts have been invested in searching for node embeddings that maximally preserve the associated graph properties. However, each embedding technique has its own limitations. This paper presents a method for generating deep neural node embeddings that encode dissimilarity scores between pairs of nodes with the help of prototype nodes spread throughout the target graph. The proposed technique is adaptable to various notions of dissimilarity and yields efficient embeddings capable of estimating the dissimilarity between any two pairs of nodes in a graph. We compare our technique against relevant state-of-the-art similar embedding techniques. Superior results have been demonstrated in a number of experiments using several benchmark datasets. INDEX TERMS Deep neural networks, link prediction, node embedding, unsupervised node classification.
In the microarray-based approach for automated cancer diagnosis, the application of the tradition... more In the microarray-based approach for automated cancer diagnosis, the application of the traditional k-nearest neighbors kNN algorithm suffers from several difficulties such as the large number of genes (high dimensionality of the feature space) with many irrelevant genes (noise) relative to the small number of available samples and the imbalance in the size of the samples of the target classes. This research provides an ensemble classifier based on decision models derived from kNN that is applicable to problems characterized by imbalanced small size datasets. The proposed classification method is an ensemble of the traditional kNN algorithm and four novel classification models derived from it. The proposed models exploit the increase in density and connectivity using K1-nearest neighbors table (KNN-table) created during the training phase. In the density model, an unseen sample u is classified as belonging to a class t if it achieves the highest increase in density when this sample is added to it i.e. the unseen sample can replace more neighbors in the KNN-table for samples of class t than other classes. In the other three connectivity models, the mean and standard deviation of the distribution of the average, minimum as well the maximum distance to the K neighbors of the members of each class are computed in the training phase. The class t to which u achieves the highest possibility of belongness to its distribution is chosen, i.e. the addition of u to the samples of this class produces the least change to the distribution of the corresponding decision model for class t. Combining the predicted results of the four individual models along with traditional kNN makes the decision space more discriminative. With the help of the KNN-table which can be updated online in the training phase, an improved performance has been achieved compared to the traditional kNN algorithm with slight increase in classification time. The proposed ensemble method achieves significant increase in accuracy compared to the accuracy achieved using any of its base classifiers on Kentridge, GDS3257, Notterman, Leukemia and CNS datasets. The method is also compared to several existing ensemble methods and state of the art techniques using different dimensionality reduction techniques on several standard datasets. The results prove clear superiority of EKNN over several individual and ensemble classifiers regardless of the choice of the gene selection strategy.
We develop a galaxy cluster finding algorithm based on spectral clustering technique to identify ... more We develop a galaxy cluster finding algorithm based on spectral clustering technique to identify optical counterparts and estimate optical redshifts for X-ray selected cluster candidates. As an application, we run our algorithm on a sample of X-ray cluster candidates selected from the third XMM-Newton serendipitous source catalog (3XMM-DR5) that are located in the Stripe 82 of the Sloan Digital Sky Survey (SDSS). Our method works on galaxies described in the color-magnitude feature space. We begin by examining 45 galaxy clusters with published spectroscopic redshifts in the range of 0.1 to 0.8 with a median of 0.36. As a result, we are able to identify their optical counterparts and estimate their photometric redshifts, which have a typical accuracy of 0.025 and agree with the published ones. Then, we investigate another 40 X-ray cluster candidates (from the same cluster survey) with no redshift information in the literature and found that 12 candidates are considered as galaxy clus...
2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)
Speed estimation is an open research area because of its importance in many applications and the ... more Speed estimation is an open research area because of its importance in many applications and the necessity of replacing GPS due to smartphones battery drainage. Relying on integrating accelerometer values is challenging and generally requires continual utilization of external references to correct speed, because of error accumulation. Therefore, highway speed estimation, in particular, is a difficult problem due to the maintenance of high speeds by vehicles for a long time and the scarcity of reference points like uneven road surfaces, turns, and stops. In this paper, we investigate exploiting micro road surface unevenness that results in vibrations on the accelerometer readings without integrating the acceleration signal. In particular, we employ deep 1D convolutional neural networks to learn and extract robust features that learn the relation between such complex vibrations and speed. Also, the use of bidirectional LSTMs is investigated to benefit from both forward and backward dependencies in the sensed data, and allow a form of integration. Specifically, two highway speed estimation models are proposed. The first uses a deep convolutional neural network with 5 layers and the second uses a deep bidirectional LSTM neural network. The inputs to both networks are the readings from the accelerometer and gyroscope sensors of a smartphone. The methods achieved mean absolute error results of 5.53 km/hr and 3.71 km/hr, respectively; whereas a related LSTM based method, resulted in a high error rate of 68.05 km/hr. Finally, an implementation of the proposed CNN model on an android and iOS smartphones is described and analyzed.
International Journal of Mining, Reclamation and Environment
ABSTRACT The decision to exploit a mineral deposit that can be extracted by open pit and/or under... more ABSTRACT The decision to exploit a mineral deposit that can be extracted by open pit and/or underground mining involves consideration of the following options: (a) independent open pit extraction; (b) independent underground extraction; (c) simultaneous open pit and underground (OPUG) extraction; (d) sequential OPUG extraction; and (e) combinations of (c) and (d). This paper investigates the extraction strategy for deposits using Mixed Integer Linear Programming (MILP) optimisation framework to maximise the net present value and determine the schedules for mining, processing, underground capital and operational developments, and 3D crown pillar position. The MILP framework is implemented for a gold deposit. The results showed a combined sequential and simultaneous open pit and underground (OPUG) mining with crown pillar as the optimal extraction option generating NPV that is 11% and 13% better than independent OP or UG mining, respectively.
Background: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as ... more Background: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. Results: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. Conclusions: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
Finding optimal phase durations for a controlled intersection is a computationally intensive task... more Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N 3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N 2). Three implementations that span a wide range of parallel hardware are developed. The first is based on shared-memory architecture, using the OpenMP programming model. The second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. The third implementation is based on the data parallel programming model mapped on Graphics Processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1996
A new formulation of the Maximal Common Subgraph Problem (MCSP), that is implemented using a two-... more A new formulation of the Maximal Common Subgraph Problem (MCSP), that is implemented using a two-stage Hopfield neural network, is given. Relative merits of this proposed formulation, with respect to current neural network-based solutions as well as classical sequential-search-based solutions, are discussed.
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary... more A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. The proposed technique is less likely to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as ...
Optimizing the design of water distribution systems often faces difficulties due to continuous va... more Optimizing the design of water distribution systems often faces difficulties due to continuous variations in water demands, pressure requirements, and disinfectant concentrations. The complexity of this optimization even increases when trying to optimize both the hydraulic and the water quality design models. Most of the previous works in the literature did not investigate the linkage between both models, either by combining them into one general model or by selecting any representative solution to proceed from one model to another. This work introduces an integrated two-step framework to optimize both designs while investigating the reasonable network configuration selection from the hydraulic design view before proceeding to the water quality design. The framework is mainly based on a modified version of the multi-objective particle swarm optimization algorithm. The algorithm’s first step is optimizing the hydraulic design of the network by minimizing the system’s capital cost whi...
Background and objective: Selective electrical stimulation of target brain locations (stimulation... more Background and objective: Selective electrical stimulation of target brain locations (stimulation focality) is a difficult problem because it comprises conflicting goals. The stimulating current density field needs to be strong enough to stimulate targeted locations but weak enough not to stimulate nearby non-targeted locations. The objective of this study is to suggest a methodology for improving electrical stimulation focality based on time-division multiplexing principle. Proposed methodology: The complex problem of exciting a group of target locations is decomposed into a series of simpler problems in which a single location is targeted. Time-division multiplexing between the solutions of the simpler problems achieves seemingly parallel excitation of the selected target locations with minimal excitation of non-targeted locations. Results: A high fidelity finite element-based simulation of a cortical vision prosthesis is used to demonstrate the proposed idea and highlight important facts about neurons dynamics that must be taken into consideration in order to design a successful time-division multiplexing based stimulation scheme. Conclusion: The study offers a clear detailed procedure for designing focal electrical stimulation setups based on time division multiplexing principle. The included results and experiments prove that the proposed strategy is a step forward towards more focal stimulation setups.
We introduce pyMune, an open-source Python library for robust clustering of complex real-world da... more We introduce pyMune, an open-source Python library for robust clustering of complex real-world datasets without density cutoff parameters. It implements DenMune (Abbas et al., 2021), a mutual nearest neighbor algorithm that uses dimensionality reduction and approximate nearest neighbor search to identify and expand cluster cores. Noise is removed with a mutual nearest-neighbor voting system. In addition to clustering, pyMune provides classification, visualization, and validation functionalities. It is fully compatible with scikit-learn and has been accepted into the scikit-learn-contrib repository. The code, documentation, and demos are available on GitHub, PyPi, and CodeOcean for easy use and reproducibility.
2022 10th International Japan-Africa Conference on Electronics, Communications, and Computations (JAC-ECC)
Traffic fatalities are increasing in developing countries where there are few investments in road... more Traffic fatalities are increasing in developing countries where there are few investments in road safety. Culture and road conditions also affect driving habits. Therefore, automatic detection and reporting of driver behavior to concerned entities can potentially save lives. In particular, we analyze a driving maneuvers dataset collected from one environment (country) but tested in another environment with aggressive driving habits and irregular road conditions. We also develop an on-edge system with fast response time to serve users on a large scale. Specifically, we propose an approach for detecting aggressive and normal events using random forest classifier. We utilize the accelerometer and gyroscope smartphone readings to classify driving maneuvers events to five types (aggressive acceleration, suddenly break, aggressive turn right, aggressive turn left, and normal). We achieved an accuracy of only 63.4% by training our model on an available dataset collected from a foreigner environment and tested on our environment. The lowest precision value was 54% while the lowest recall was 42%. However, we achieved an accuracy of 98.4% when augmenting an available dataset with data collected with our application. The lowest precision value was 98% while the lowest recall was 90%. From the results, it is shown that the available datasets do not generalize well to different driving habits and road conditions. Finally, an implementation of the random forest model using OpenCV on an Android platform is analyzed.
Node embedding refers to learning or generating low-dimensional representations for nodes in a gi... more Node embedding refers to learning or generating low-dimensional representations for nodes in a given graph. In the era of big data and large graphs, there has been a growing interest in node embedding across a wide range of applications, ranging from social media to healthcare. Numerous research efforts have been invested in searching for node embeddings that maximally preserve the associated graph properties. However, each embedding technique has its own limitations. This paper presents a method for generating deep neural node embeddings that encode dissimilarity scores between pairs of nodes with the help of prototype nodes spread throughout the target graph. The proposed technique is adaptable to various notions of dissimilarity and yields efficient embeddings capable of estimating the dissimilarity between any two pairs of nodes in a graph. We compare our technique against relevant state-of-the-art similar embedding techniques. Superior results have been demonstrated in a number of experiments using several benchmark datasets. INDEX TERMS Deep neural networks, link prediction, node embedding, unsupervised node classification.
In the microarray-based approach for automated cancer diagnosis, the application of the tradition... more In the microarray-based approach for automated cancer diagnosis, the application of the traditional k-nearest neighbors kNN algorithm suffers from several difficulties such as the large number of genes (high dimensionality of the feature space) with many irrelevant genes (noise) relative to the small number of available samples and the imbalance in the size of the samples of the target classes. This research provides an ensemble classifier based on decision models derived from kNN that is applicable to problems characterized by imbalanced small size datasets. The proposed classification method is an ensemble of the traditional kNN algorithm and four novel classification models derived from it. The proposed models exploit the increase in density and connectivity using K1-nearest neighbors table (KNN-table) created during the training phase. In the density model, an unseen sample u is classified as belonging to a class t if it achieves the highest increase in density when this sample is added to it i.e. the unseen sample can replace more neighbors in the KNN-table for samples of class t than other classes. In the other three connectivity models, the mean and standard deviation of the distribution of the average, minimum as well the maximum distance to the K neighbors of the members of each class are computed in the training phase. The class t to which u achieves the highest possibility of belongness to its distribution is chosen, i.e. the addition of u to the samples of this class produces the least change to the distribution of the corresponding decision model for class t. Combining the predicted results of the four individual models along with traditional kNN makes the decision space more discriminative. With the help of the KNN-table which can be updated online in the training phase, an improved performance has been achieved compared to the traditional kNN algorithm with slight increase in classification time. The proposed ensemble method achieves significant increase in accuracy compared to the accuracy achieved using any of its base classifiers on Kentridge, GDS3257, Notterman, Leukemia and CNS datasets. The method is also compared to several existing ensemble methods and state of the art techniques using different dimensionality reduction techniques on several standard datasets. The results prove clear superiority of EKNN over several individual and ensemble classifiers regardless of the choice of the gene selection strategy.
We develop a galaxy cluster finding algorithm based on spectral clustering technique to identify ... more We develop a galaxy cluster finding algorithm based on spectral clustering technique to identify optical counterparts and estimate optical redshifts for X-ray selected cluster candidates. As an application, we run our algorithm on a sample of X-ray cluster candidates selected from the third XMM-Newton serendipitous source catalog (3XMM-DR5) that are located in the Stripe 82 of the Sloan Digital Sky Survey (SDSS). Our method works on galaxies described in the color-magnitude feature space. We begin by examining 45 galaxy clusters with published spectroscopic redshifts in the range of 0.1 to 0.8 with a median of 0.36. As a result, we are able to identify their optical counterparts and estimate their photometric redshifts, which have a typical accuracy of 0.025 and agree with the published ones. Then, we investigate another 40 X-ray cluster candidates (from the same cluster survey) with no redshift information in the literature and found that 12 candidates are considered as galaxy clus...
2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)
Speed estimation is an open research area because of its importance in many applications and the ... more Speed estimation is an open research area because of its importance in many applications and the necessity of replacing GPS due to smartphones battery drainage. Relying on integrating accelerometer values is challenging and generally requires continual utilization of external references to correct speed, because of error accumulation. Therefore, highway speed estimation, in particular, is a difficult problem due to the maintenance of high speeds by vehicles for a long time and the scarcity of reference points like uneven road surfaces, turns, and stops. In this paper, we investigate exploiting micro road surface unevenness that results in vibrations on the accelerometer readings without integrating the acceleration signal. In particular, we employ deep 1D convolutional neural networks to learn and extract robust features that learn the relation between such complex vibrations and speed. Also, the use of bidirectional LSTMs is investigated to benefit from both forward and backward dependencies in the sensed data, and allow a form of integration. Specifically, two highway speed estimation models are proposed. The first uses a deep convolutional neural network with 5 layers and the second uses a deep bidirectional LSTM neural network. The inputs to both networks are the readings from the accelerometer and gyroscope sensors of a smartphone. The methods achieved mean absolute error results of 5.53 km/hr and 3.71 km/hr, respectively; whereas a related LSTM based method, resulted in a high error rate of 68.05 km/hr. Finally, an implementation of the proposed CNN model on an android and iOS smartphones is described and analyzed.
International Journal of Mining, Reclamation and Environment
ABSTRACT The decision to exploit a mineral deposit that can be extracted by open pit and/or under... more ABSTRACT The decision to exploit a mineral deposit that can be extracted by open pit and/or underground mining involves consideration of the following options: (a) independent open pit extraction; (b) independent underground extraction; (c) simultaneous open pit and underground (OPUG) extraction; (d) sequential OPUG extraction; and (e) combinations of (c) and (d). This paper investigates the extraction strategy for deposits using Mixed Integer Linear Programming (MILP) optimisation framework to maximise the net present value and determine the schedules for mining, processing, underground capital and operational developments, and 3D crown pillar position. The MILP framework is implemented for a gold deposit. The results showed a combined sequential and simultaneous open pit and underground (OPUG) mining with crown pillar as the optimal extraction option generating NPV that is 11% and 13% better than independent OP or UG mining, respectively.
Background: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as ... more Background: MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. Results: The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index. The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred. The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. Conclusions: The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
Finding optimal phase durations for a controlled intersection is a computationally intensive task... more Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N 3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N 2). Three implementations that span a wide range of parallel hardware are developed. The first is based on shared-memory architecture, using the OpenMP programming model. The second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. The third implementation is based on the data parallel programming model mapped on Graphics Processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1996
A new formulation of the Maximal Common Subgraph Problem (MCSP), that is implemented using a two-... more A new formulation of the Maximal Common Subgraph Problem (MCSP), that is implemented using a two-stage Hopfield neural network, is given. Relative merits of this proposed formulation, with respect to current neural network-based solutions as well as classical sequential-search-based solutions, are discussed.
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary... more A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. The proposed technique is less likely to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as ...
Uploads
Papers by Amin Shoukry