Papers by Raquel Vassallo
With the advance of technologies, machines are closer to people. Thus, it is necessary to develop... more With the advance of technologies, machines are closer to people. Thus, it is necessary to develop interfaces, like gestures, capable of providing an intuitive way of interaction. Therefore, this work proposes a modification of the Star RGB technique, which condenses the temporal information of a video in just one RGB image. The proposal, called Star RGB+, applies the Star RGB technique over the channels of a video. So, rather than only one RGB image, this proposes yields three images as a condensed representation of a gesture in an RGB video clip. As complement, still is proposed an ensemble-like architecture using 3 VGG16, as feature extractor, one for each image, and a fully connected architecture as classifier that recieves the fused information came out from the extractors. The main experiments were carried out on GRIT (Gesture Commands for Robot inTeracton) dataset, used for human-robot interaction, and achieve more than 97% of accuracy, precision, recall and f1-score, outperforming the author's original results in more than 5% for every metric. In order to compare results with the original propose of Star RGB, a secondary experiment was carried out on Montalbano dataset, achieving 92.34% of accuracy, outperforming the autor's results in more than 9%. This shows the contribution of this work for dynamic gesture recognition field, mainly for those ones used for human-robot interaction. Resumo: Com o avanço das tecnologias, as máquinas estão cada vez mais próximas das pessoas. Assim,é necessário desenvolver interfaces, como gestos, que forneçam uma maneira intuitiva de interação entre humano e robôs. Neste sentido, este trabalho visa propor uma modificação na técnica Star RGB, que condensa a informação temporal de um vídeo em apenas uma imagem RGB. A proposta aqui apresentada, chamada Star RGB+, aplica a técnica Star RGB nos canais de cor de um vídeo. Sendo assim, ao invés de apenas uma imagem RGB, esta proposta produz três imagens como representação condensada de um gesto presente em um vídeo colorido. Como complemento,é proposta também uma arquitetura do tipo ensemble utilizando para isso três redes VGG16 pretreinadas, uma para cada imagem, como extrator de características e uma arquitetura totalmente conectada como classificador que recebe a combinação das características extraídas por cada VGG16. Os principais experimentos foram realizados na base de dados GRIT (Gesture Commands for Robot inTeracton), usada para interação homem-robô, e atingiram mais de 97% em todas as métricas, acurácia, precisão, recall e F1-score, superando os resultados originais dos autores em mais de 5% em todas elas. A fim de comparar a melhora da proposta em relaçãoà original, um experimento secundário foi realizado na base de dados Montalbano, alcançando 92, 4% de taxa de reconhecimento, superando os resultados dos autores em mais de 9%. Isso mostra a contribuição deste trabalho para o reconhecimento de gestos dinâmicos, principalmente para aqueles destinadosà interação humano-robô.
German Conference on Robotics, Jun 7, 2010
Page 1. Off-road Place Recognition using Fused Image Features Tobias Föhst, Michael Arndt, Karste... more Page 1. Off-road Place Recognition using Fused Image Features Tobias Föhst, Michael Arndt, Karsten Berns RRLab, University of Kaiserslautern, PO Box 3049, 67663 Kaiserslautern, Germany {foehst,m_arndt,berns}@cs.uni-kl.de ... References [1] H. Andreasson and T. Duckett. ...
Smart cities, Aug 9, 2023
Procedings do XXII Congresso Brasileiro de Automatica, 2018
Gestos estaticos e dinâmicos sao considerados ferramentas importantes para a interacao homem maqu... more Gestos estaticos e dinâmicos sao considerados ferramentas importantes para a interacao homem maquina. Mesmo sendo mais complexos, gestos dinâmicos sao preferidos por serem considerados mais naturais. Muitos trabalhos buscam reconhecer gestos dinâmicos utilizando informacoes multimodais, capturadas com mais de um tipo de sensor. Entretanto, a maioria dos locais possuem apenas câmeras instaladas (para vigilância e monitoramento), ja que outros sensores normalmente tem alcance limitado. Assim, reconhecer gestos usando apenas informacoes visuais pode ser uma alternativa muito interessante, permitindo-se usar tal abordagem em ambientes menos sofisticados e mais comuns. Por isso, neste trabalho e proposto um reconhecedor de gestos dinâmicos baseado apenas em cor, onde a tecnica aplicada representa informacoes temporais como informacoes espaciais. Usa-se ainda o metodo de trasfer learning a Fim de se acelerar a convergencia do modelo e se obter melhores resultados. A avaliacao do metodo foi feita usando 3579 gestos, retirados do banco de dados Montalbano gesture dataset, e distribuidos em 20 classes distintas. Como resultado, obteve-se uma acuracia de 83; 10%, sendo que 65% dos gestos alcancaram mais de 80% de acuracia. Isso mostra que a abordagem proposta tem desempenho adequado, podendo ainda ser melhorada, para um uso futuro em tarefas de interacao homem-maquina.
Advanced wireless communication network testbeds are now widely being deployed around European an... more Advanced wireless communication network testbeds are now widely being deployed around European and cross-continental. This represents an interesting opportunity for vertical industry and academia to perform experimentation and validation before a real deployment. In this paper, we present 5GinFIRE as a suitably flexible platform towards open 5G (Network Function Virtualization (NFV) ecosystem and playground. On top of this platform, we designed and deployed a smart city safety system as a vertical use case, exploring 5G capabilities through a combination of NFV and machine learning to provide end-to-end communication and low latency smart city service. This safety system helps detecting criminals along the city and sending a notification to the security center. A Virtual Network Function (VNF) has been developed to enable video transcoding, face detection and recognition at the cloud or the edge of the network. The validation of the overall system is performed through the deployment of the use case indoor (Smart Internet Lab) and outdoor (Millennium Square Bristol). We show the VNF specification and present a quantitative analysis in terms of bandwidth, response time, processing time and transmission speed in terms of Quality of Experience (QoE).
The main issue addressed in this work is multi-robot formation control for cooperative load pushi... more The main issue addressed in this work is multi-robot formation control for cooperative load pushing. Specifically, the principal concern is how to achieve and maintain a formation of two or more inexpensive robots to keep them in contact with the load and coordinate their movement in order to perform the task. Also a final position controller is implemented for defining
Demand for weapons has grown along with crime rates, a contemporary problem haunting countries. T... more Demand for weapons has grown along with crime rates, a contemporary problem haunting countries. This has motivated scientists to devise solutions that can aid in public safety in general. This paper proposes the detection of firearms in images through convolutional neural networks, using the YOLO (You Only Look Once) object detector. To improve learning, YOLO was used to generate annotations in an unmarked database, integrating a new database. This proposal was evaluated in a database containing 608 images, in which 304 images had weapons. Experiments carried out indicated an accuracy of 89.15% and a sensitivity of 100.00%, surpassing results presented in the current literature. These results show that the proposed methodology can be applied for the detection of firearms in images.
Neurocomputing, Aug 1, 2020
Due to the advance of technologies, machines are increasingly present in people's daily lives. Th... more Due to the advance of technologies, machines are increasingly present in people's daily lives. Thus, there has been more and more effort to develop interfaces, such as dynamic gestures, that provide an intuitive way of interaction. Currently, the most common trend is to use multimodal data, as depth and skeleton information, to enable dynamic gesture recognition. However, using only color information would be more interesting, since RGB cameras are usually available in almost every public place, and could be used for gesture recognition without the need of installing other equipment. The main problem with such approach is the difficulty of representing spatio-temporal information using just color. With this in mind, we propose a technique capable of condensing a dynamic gesture, shown in a video, in just one RGB image. We call this technique star RGB. This image is then passed to a classifier formed by two Resnet CNNs, a soft-attention ensemble, and a fully connected layer, which indicates the class of the gesture present in the input video. Experiments were carried out using both Montalbano and GRIT datasets. For Montalbano dataset, the proposed approach achieved an accuracy of 94.58%. Such result reaches the state-of-the-art when considering this dataset and only color information. Regarding the GRIT dataset, our proposal achieves more than 98% of accuracy, recall, precision, and F1-score, outperforming the reference approach by more than 6%.
Anais do ... Simpósio Brasileiro de Automação Inteligente, 2021
Data pipeline consists of a sequence of actions that preprocess and extract information from data... more Data pipeline consists of a sequence of actions that preprocess and extract information from datasets. In the context of anomaly detection, the data pipeline has application in the availability of structured and relevant information for the detection task. This article proposes an approach to the problem of detecting anomalies in single-scene video based on a data pipeline, composed of two parts: a patches extractor and a patches classification model. We performed experiments on the Street Scene dataset, achieving AUC = 0.898 and AUPRC = 0.916, which are results compatible with the current literature. Resumo: Pipeline de dados consiste em uma sequência de ações que preprocessam e extraem informações de conjuntos de dados. No contexto de detecção de anomalias, pipeline de dados tem aplicação na disponibilidade de informações estruturadas e relevantes para a tarefa de detecção. Nesse artigoé proposta uma abordagem para o problema de detecção de anomalias em vídeo de cenaúnica baseado em um pipeline de dados com localização espacial dos eventos anômalos, composto de duas partes: um extrator de patches e um modelo de classificação de patches. Foram realizados experimentos no conjunto de dados Street Scene, os quais foram avaliados pelas métricas AUC e AUPRC que são, respectivamente, aárea abaixo da curva ROC (Receiver Operating Characteristic Curve), e aárea abaixo da curva P recision vs. Recall. Foram obtidos AUC = 0.898 e AUPRC = 0.916, os quais são resultados compatíveis com a literatura atual.
Journal of Control, Automation and Electrical Systems, Oct 6, 2021
The anomaly detection problem consists in identifying the events that do not conform to an expect... more The anomaly detection problem consists in identifying the events that do not conform to an expected behavior pattern. In law enforcement and security, detection of anomalous events has application in the identification of suspicious behaviors. This paper addresses such problem in public areas by monitoring surveillance videos. Our approach involves a convolutional neural network for spatial features extraction, followed by a time series classifier with a one-dimensional convolutional layer and an ensemble of stacked bidirectional recurrent networks. The proposed methodology selects a pre-trained convolutional architecture for the spatial feature and applies transfer learning to specialize this architecture in anomaly detection in surveillance videos. We performed the experiments on the UCSD Anomaly Detection Dataset and the CUHK Avenue Dataset for Abnormal Event Detection to compare our approach with other works. Our evaluation protocol uses the Area Under the Receiver Operating Characteristic Curve—AUC, the Equal Error Rate—EER, and the Area Under the Precision vs. Recall Curve—AUPRC. During the experiments, the model obtained AUC above $$92\%$$ 92 % and EER below $$15\%$$ 15 % , which are compatible with the current literature.
Universidade Federal do Espírito Santo, Aug 1, 2018
Esta tese de doutorado propõe sistemas de navegação e controle para aplicação com veículos aéreos... more Esta tese de doutorado propõe sistemas de navegação e controle para aplicação com veículos aéreos não tripulados de pequeno porte, amplamente acessíveis na atualidade. O foco é na aplicação de técnicas com uma implementação simplificada e resultados experimentais eficientes, respeitando-se as limitações do equipamento utilizado. Os sistemas propostos são testados em diferentes missões de voo incluindo tarefas de posicionamento, seguimento de trajetórias e controle de formação entre robôs, que são realizadas sob condições variadas de voo tanto em ambientes interiores como exteriores. Para contextualização dos temas abordados, inicialmente são apresentadas as terminologias adotadas e os conceitos gerais sobre o funcionamento de um veículo aéreo moderno, enfatizando sua integração com um piloto automático. Em seguida, um modelo matemático simplificado é proposto como representação dos movimentos desse veículo a partir do qual os sistemas de controle automático são derivados. Em paralelo, apresentam-se as técnicas de fusão sensorial dos sistemas de navegação, explicitando o tratamento realizado nas informações usadas na realimentação de controle. Ao longo dos capítulos, diversos experimentos são discutidos com o propósito de avaliar os sistemas propostos e concluir sua eficiência como solução para os problemas abordados nesse trabalho. Finalmente, destacam-se as conclusões e algumas possibilidades de aplicações futuras.
Journal of Control, Automation and Electrical Systems, 2021
The anomaly detection problem consists in identifying the events that do not conform to an expect... more The anomaly detection problem consists in identifying the events that do not conform to an expected behavior pattern. In law enforcement and security, detection of anomalous events has application in the identification of suspicious behaviors. This paper addresses such problem in public areas by monitoring surveillance videos. Our approach involves a convolutional neural network for spatial features extraction, followed by a time series classifier with a one-dimensional convolutional layer and an ensemble of stacked bidirectional recurrent networks. The proposed methodology selects a pre-trained convolutional architecture for the spatial feature and applies transfer learning to specialize this architecture in anomaly detection in surveillance videos. We performed the experiments on the UCSD Anomaly Detection Dataset and the CUHK Avenue Dataset for Abnormal Event Detection to compare our approach with other works. Our evaluation protocol uses the Area Under the Receiver Operating Characteristic Curve—AUC, the Equal Error Rate—EER, and the Area Under the Precision vs. Recall Curve—AUPRC. During the experiments, the model obtained AUC above $$92\%$$ 92 % and EER below $$15\%$$ 15 % , which are compatible with the current literature.
Anais do Congresso Brasileiro de Automática 2020, 2020
Detecção de anomalias consiste na identificação de eventos que não estão em conformidade com um p... more Detecção de anomalias consiste na identificação de eventos que não estão em conformidade com um padrão de comportamento esperado. No contexto de segurança em vias públicas, a detecção automática de eventos anômalos através de video, tem aplicação na identificação de comportamentos suspeitos. Nesse artigo é proposta uma abordagem para o problema da detecção automática de eventos anômalos em vı́deos de vias públicas baseado em um modelo de redes neurais profundas end-to-end, composto de duas partes: um extrator de caracterı́sticas espaciais baseado em uma rede neural convolucional pre-treinada, e um classificador de sequências temporais baseado em camadas recorrentes empilhadas. Realizamos experimentos nos conjuntos de dados UCSD Anomaly Detection Dataset. Os resultados foram avaliados com as métricas Area Under the Receiver Operating Characteristic Curve - AUC, Area Under the Precision vs Recall Curve - AUPRC e Equal Error Rate – EER. Durante os experimentos, o modelo obteve AUC acim...
arXiv (Cornell University), Oct 1, 2019
To interact with humans in collaborative environments, machines need to be able to predict (i.e.,... more To interact with humans in collaborative environments, machines need to be able to predict (i.e., anticipate) future events, and execute actions in a timely manner. However, the observation of the human limb movements may not be sufficient to anticipate their actions unambiguously. In this work, we consider two additional sources of information (i.e., context) over time, gaze, movement and object information, and study how these additional contextual cues improve the action anticipation performance. We address action anticipation as a classification task, where the model takes the available information as the input and predicts the most likely action. We propose to use the uncertainty about each prediction as an online decision-making criterion for action anticipation. Uncertainty is modeled as a stochastic process applied to a time-based neural network architecture, which improves the conventional class-likelihood (i.e., deterministic) criterion. The main contributions of this paper are four-fold: (i) We propose a novel and effective decision-making criterion that can be used to anticipate actions even in situations of high ambiguity; (ii) we propose a deep architecture that outperforms previous results in the action anticipation task when using the Acticipate collaborative dataset; (iii) we show that contextual information is important to disambiguate the interpretation of similar actions; and (iv) we also provide a formal description of three existing performance metrics that can be easily used to evaluate action anticipation models. Our results on the Acticipate dataset showed the importance of contextual information and the uncertainty criterion for action anticipation. We achieve an average accuracy of 98.75% in the anticipation task using only an average of 25% of observations. Also, considering that a good anticipation model should perform well in the action recognition task, we achieve an average accuracy of 100% in action recognition on the Acticipate dataset, when the entire observation set is used.
IFAC Proceedings Volumes, 2006
In this work a mobile robot cooperation strategy based on computational vision is presented. Such... more In this work a mobile robot cooperation strategy based on computational vision is presented. Such strategy is applied to a mobile robot team formed by simple and cheap robots and a leader robot with more computational power. The leader has an omnidirectional visual system and uses color segmentation to obtain the pose of the followers. This visual information is used by a nonlinear stable controller that manages team formation. Simulations and tests were run and current results are encouraging.
DOAJ (DOAJ: Directory of Open Access Journals), Feb 1, 2010
ABSTRACT This paper presents the use of a hybrid collaborative stereo vision system (3D-distribut... more ABSTRACT This paper presents the use of a hybrid collaborative stereo vision system (3D-distributed visual sensing using different kinds of vision cameras) for the autonomous navigation of a wheeled robot team. It is proposed a triangulation-based method for the 3D-posture computation of an unknown object by considering the collaborative hybrid stereo vision system, and this way to steer the robot team to a desired position relative to such object while maintaining a desired robot formation. Experimental results with real mobile robots are included to validate the proposed vision system.
Scene classification is a very popular topic in the field of computer vision and it has many appl... more Scene classification is a very popular topic in the field of computer vision and it has many applications, such as, content-based image organization and retrieval and robot navigation. However, scene classification is quite a challenging task, due to the occurrence of occlusion, shadows and reflections, illumination changes and scale variability.
Uploads
Papers by Raquel Vassallo