Data mining, Information Extraction, Deep Web Research Papers

DLP is a data security technology that detects and prevents data breach incidents by monitoring data in-use, in-motion and at-rest. It has been widely applied for regulatory compliances, data privacy and intellectual property... more

Bookmark
Download
- by Liwei Ren
- •
- 9
  Algorithms, Regulatory Compliance, Network Security, Software Architecture

رایانه ها به هنگام ظهور این وعده را دادند که به عنوان یک مخزن دانش و خرد باشند، اما در عوض حجم عظيمی از داده ها را به سوی ما روانه ساختند وب کاوی فرآیند کشف اطلاعات و دانش از داده های وب می باشد. در وب کاوی این داده ها از سمت سرور ، مشتری... more

Traffic classification, i.e. associating network traffic to the application that generated it, is an important tool for several tasks, spanning on different fields (security, management, traffic engineering, R&D). This process is... more

Bookmark
Download
- by Domenico Ciuonzo and +1
  Giuseppe Aceto
- •
- 8
  Privacy, Anonymity, Privacy and data protection, Data mining, Information Extraction, Deep Web

Deep Web is the data on the internet that is not accessible by popular search engines. It is much greater than the Surface Web we use. Deep Web grants anonymity, and with it, come the horrors of underground misuse. It is a shady (and... more

Breve spiegazione ed analisi del Deep Web e dei suoi contenuti, con riferimenti al funzionamento di TOR e agli Hidden Services.

Bookmark
Download
- by Marco Rapaccini
- •
- 15
  Information Technology, Computer Forensics, Cybercrimes, Web Mining

Resumen El Presente documento, fue hecho con el propósito de dar a conocer la parte conceptual, qué contiene y cómo se accede a esa parte enorme y oculta bajo la superficie del iceberg llamado " información " que existe en la Red. Se... more

Özet İnternet şüphesiz insanlık tarihinde devrim niteliğinde bir buluş ve gelişimi de halen devam etmektedir. İnsanların birçoğu iletişim, sosyal medya, alışveriş, siyasi ve sosyal gündem takibi ve daha fazlası için interneti... more

Objective: The paper analyzes money laundering through crypto-assets and offers a legal perspective on how this new technology can be used to commit these felonies. The study intends to shed light on the matter, helping to visualize how... more

O presente estudo teve por objetivo propor um processo de mineração de conteúdos em mídias sociais para auxiliar na gestão de destinos turísticos composto por sete fases, elaborado com base nas metodologias propostas por Neves (2013),... more

O presente estudo teve por objetivo propor um processo de mineração de conteúdos em mídias sociais para auxiliar na gestão de destinos turísticos composto por sete fases, elaborado com base nas metodologias propostas por Neves (2013), Hea, Zha e Li (2013), Kalampokis, Tambouris e Tarabanis (2013), Abrahams, Jiao, Fan, Wang e Zhang (2013) e nos modelos de descoberta de conhecimento propostos por Fayyad, Piatetsky-Shapiro e Smyth (1996), Chapman et al. (2000) e Han, Kamber e Pei (2012). Caracteriza-se como pesquisa exploratória e descritiva e como método de investigação foram utilizados métodos mistos. Embora explore o monitoramento nas mídias sociais Facebook, Twitter e YouTube, o processo proposto foi verificado a partir da mineração de conteúdos do Twitter que tivessem os termos da ontologia de aplicação de atrativos e serviços turísticos (hospedagem, alimentação e transportes) das cidades de Curitiba (PR) e Foz do Iguaçu (PR), por opção metodológica e pela dificuldade em obter dados relevantes nas demais mídias sociais investigadas devido a limitações em suas Application Programming Interface (API). O presente processo mostrou-se ser eficaz para coletar conteúdos relevantes e identificar assuntos populares nas mídias sociais, realizar análises quantitativas e qualitativas, bem como auxiliar às Destination Management Organizations – DMO na gestão de destinos turísticos e no processo de tomada de decisões estratégicas e operacionais. Como resultado das análises da utilização das mídias sociais pelas DMO investigadas, constatou-se que o Facebook e o Twitter são mais utilizadas do que o YouTube, que ainda é pouco explorado em relação às demais. Identificou-se ainda que apesar das ações, estratégias e conteúdos publicados serem semelhantes, as abordagens e objetivos variam e os esforços e ações das DMO nas mídias sociais ainda são experimentais. Através das entrevistas semiestruturadas pessoais realizadas com os responsáveis pela gestão e atualização dos perfis em mídias sociais das DMO constatou-se que nenhuma DMO monitora as mídias sociais efetivamente utilizando softwares de monitoramento de mídias sociais ou técnicas de mineração de conteúdos. Entretanto, ainda que superficialmente, as DMO utilizam-se da ferramenta analítica do Facebook para monitorar e analisar o desempenho das ações e publicações. Por fim, foi possível identificar a inexperiência, a falta de conhecimento técnico e de recursos humanos e financeiros como as principais limitações frente a utilização e monitoramento de mídias sociais pelas DMO investigadas. Como sugestão de trabalhos futuros, sugere-se a elaboração do modelo teórico de gestão do conhecimento para que os resultados e conhecimentos obtidos sejam explicitados às instâncias de governança (federal, estadual e municipal), para os demais atores públicos e privados envolvidos na atividade turística, a ampliação das ontologias de aplicação elaboradas e ao monitoramento e mineração de conteúdos em mídias sociais sobre outras organizações turísticas públicas e privadas ou outros eventos como as Olimpíadas no Rio de Janeiro em 2016.

Palavras-chave: turismo; gestão de destinos turísticos; mídias sociais; monitoramento em mídias sociais; mineração de conteúdos em mídias sociais.

The Internet as the whole is a network of multiple computer networks and their massive infrastructure. The web is made up of accessible websites through search engines such as Google, Firefox, etc. and it is known as the Surface Web. The... more

Ogólne zasady poszukiwania informacji w internecie - Ogólne zasady poszukiwania informacji w internecie - Sposoby dostępu do zasobów Deep Web - Zasoby naukowe w Deep Web - Pozyskiwanie danych, publikacji i treści (w tym naukowych) z... more

This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine... more

Cyber and its related technologies such as Internet was introduced to the world only in late 1980s, and today it is unimaginable to think of a life without these all pervasive technologies. Despite being ubiquitous around the world, cyber... more

When we type the usual www acronym in an electronic device (computer, smartphone, tablet, among others) and then, the address of a webpage, in a matter of seconds we have access to all the information which, without the revolution of... more

Software project estimation is important for allocating resources and planning a reasonable work schedule. Estimation models are typically built using data from completed projects. While organizations have their historical data... more

A informação jornalística é alcançada a partir de fontes, documentos e dados. Eles estão nos mundos off-line e on-line e são captados por motores de busca e por cruzamentos de dados. Existe ainda outra fonte anônima, que não é captada por... more

Bookmark
Download
- by Krishma Carreira and +1
  Lucas V Araujo
- •
- 19
  Technology, Internet Studies, Comunicação, Jornalismo

Arguably the biggest challenge in analyzing English tense is to account for the double access interpretation, which arises when a present tensed verb is embedded under a past attitude—e.g. "John said that Mary is pregnant".... more

Coreference resolution plays an important role in Information Extraction.This paper covers the investigation of two strategies based on a mention-pair resolver using Decision Tree classifier on structured and unstructured dataset,... more

In this paper, we outline our work on developing a disk-based infrastructure for efficient visualization and graph exploration operations over very large graphs. The proposed platform, called graphVizdb, is based on a novel technique for... more

Bookmark
Download
- by Nikos Bikakis
- •
- 62
  Web 2.0, Semantic Web Technologies, Knowledge Management, Visualization

ABSTRCT Market Situation is something that will provide valuable benefits to increase the productivity of selling a product both conventionally and online, Indonesian e-commerce map data in the second quarter shows that the increase in... more

ABSTRCT Market Situation is something that will provide valuable benefits to increase the productivity of selling a product both conventionally and online, Indonesian e-commerce map data in the second quarter shows that the increase in sales of goods and services online is increasing, it was seen from 3 (three) major e-marketplaces namely Tokopedia, Shopee, and Bukalapak, then e-commerce activities in Indonesia in January 2019 for purchasing products or service online by 86%, it means that 86% of activities will buy products and services which located in Indonesian e-commerce. This situation creates fierce competition for sellers or business people or entrepreneurs who are just starting their business. One way for sellers to get a primacy in this competition is to apply segmentation to the products that they will sell in the online shop, but the data from a study by the association revealed that if a product stays online for a long time (more than 550 days) then 78% is very likely not to be purchased, the product placement activity in the online shop must be segmented appropriately, this research provided an effort to apply data extraction technology with web crawlers to present the segmentation, and the results of prototype testing to 9 entrepreneurs, 210 students who will start an online business, and 19 private employees which produce web crawler technology testing to help product segmentation with the result of successful was 79% with the Good category in laying online product was segmented. ABSTRAK Keadaan Pasar adalah suatu hal yang akan memberikan manfaat yang berharga untuk meningkatkan produktivitas penjualan suatu produk baik secara konvensional maupun secara online, data peta e-commerce indonesia pada kuartal ke 2 (dua) meperlihatkan bahwa peningkatan penjualan barang dan jasa secara online meningkat, dilihat dari 3 (tiga) besar e-marketplace yaitu tokopedia, shopee dan bukalapak, kemudian di sertai aktivitas e-commerce di indonesia pada januari 2019 untuk pembelian produk atau jasa online sebesar 86 %, artinya aktivitas sebesar 86% akan membeli produk dan jasa yang berada di e-commerce indonesia. Keadaan ini menimbulkan persaingan ketat untuk para penjual atau pebisnis maupun penggiat wirausaha yang baru memulai usahanya, salah satu cara agar penjual mendapatkan keunggulan dalam bersaing ini adalah dengan menerapkan segmentasi pada produk yang akan mereka jual di online shop, akan tetapi data dari sebuah penelitian oleh asosiasi yang mengungkapkan bahwa jika sebuah produk tetap online untuk waktu lama (lebih dari 550 hari) maka 78 % sangat mungkin tidak akan dibeli, seharusnya aktivitas peletakan produk di online shop harus tersegmentasi secara tepat, peneliti ini memberikan upaya penerapan teknologi ekstraksi data dengan web crawler untuk menyajikan segmentasi tersebut, dan hasil pengujian prototipe kepada 9 wirausaha , 210 mahasiswa yang akan memulai usaha online, dan 19 pegawai swasta menghasilkan pengujian teknologi web crawler untuk membantu segmentasi produk dengan hasil kesuksesan informasi segmentasi sebesar 79 % dengan kategori Baik untuk digunakan dalam peletakan produk online yang tersegmentasi.

Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more

Resumo: Como o copyright se aplica para conteúdos científicos e educacionais? A presente pesquisa visa analisar a aplicação do regime de propriedade intelectual no contexto da internet e de outras tecnologias. A abordagem metodológica... more

Bookmark
Download
- by Marcos Vinício Chein Feres and +1
  Jordan Vinícius de Oliveira
- •
- 16
  Open Access, Academic Libraries, Creative Commons, Copyright

With the rapid growth of users in social networking services, data is generated in thousands of terabytes every day. Practical frameworks for data extraction from social networking sites have not been well investigated yet. In this paper,... more

ABSTRAK Berdasarkan situs wearesocial.com menunjukan data perkembangan pengguna internet sebesar 4.437 miliar yang disertai 75 % pembelian produk online, 82 % pencarian produk dan kunjungan on-line retail 92% yang kemudian memicu sebuah... more

ABSTRAK Berdasarkan situs wearesocial.com menunjukan data perkembangan pengguna internet sebesar 4.437 miliar yang disertai 75 % pembelian produk online, 82 % pencarian produk dan kunjungan on-line retail 92% yang kemudian memicu sebuah penelitian oleh asosiasi yang mengungkapkan bahwa jika sebuah produk tetap online untuk waktu lama (lebih dari 550 hari) maka 78 % sangat mungkin tidak akan dibeli, seharusnya aktivitas peletakan produk di online shop harus tersegmentasi secara tepat, karena segmentasi produk yang baik akan meningkatkan potensi e-commerce atau e-marketplace pada tingkat negara seperti meningkatkan efisiensi pasar, efesiensi operational, memperluas akses terhadap pasar, dan adanya keterkaitan, selanjutnya dari tren tersebut maka peneliti membuat suatu teknologi script otomatis atau program untuk menulusuri situs-situs e-marketplace indonesia, teknologi ini dikenal dengan nama web crawler atau web spider, dan peneliti mengimplementasikan sebuah prototipe teknologi yang dapat menganalisis teks-teks dari produk yang diletakan di e-marketplace Indonesia, dan menghasilkan clustering segmentasi pasar sebagai solusi yang efektif untuk pelaku bisnis yang akan meletakan produknya di e-commerce maupun di e-marketplace, sehingga akan mendapatkan keunggulan bersaing guna mencapai Store Image yang baik bagi pelaku bisnis di indonesia. ABSTRACT Based on the site wearesocial.com shows data on the development of internet users amounting to 4,437 billion accompanied by 75% of purchases of online products, 82% of product searches and 92% on-line retail visits which then triggered a study by an association that revealed that a product remained online for time long time (more than 550 days), 78% is very unlikely to be purchased, the product laying activity in the online shop should be segmented appropriately, because good product segmentation will increase the potential of e-commerce or e-marketplace at the country level such as increasing market efficiency , operational efficiency, expanding access to the market, and linkages, then from that trend, the researcher makes an automated script technology or program to track Indonesian e-marketplace sites, this technology is known as web crawlers or web spiders, and researchers implement a technology prototype that can analysis of texts from products placed in Indonesian e-marketplace, and produce clustering market segmentation as an effective solution for business people who will put their products in e-commerce and in e-marketplace, so that they will gain competitive advantages to reach the Store Image good for business people in Indonesia.

The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree... more

The Islamic State of Iraq and Syria (ISIS) has made great use of the Internet and online social media sites to spread its message and encourage others, particularly young people, to support the organization; to travel to the Middle East... more

—Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of... more

Due to the enormous amount of data stored in databases and within other several information resources warehouses, there're increased needs to new technology to extract the hidden valuable knowledge from this data. This knowledge became... more

In Natural Language Processing, Parts-of-Speech tagging plays a vital role in text processing for any sort of language processing and understanding by machine. In each of the quarter of machine translation, information retrieval or speech... more

Aquilo que muitos conhecem popularmente como Internet, caracteriza-se, socioculturalmente como ciberespaço e possui territorialidade própria, bem como suas próprias práticas culturais, identificadas como cibercultura. Tendo em vista que,... more

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more

El artículo aborda la pendulación entre libre circulación de la información y la privacidad vs. la seguridad y el control, que se produce por los fuertes intereses que intervienen en ambos polos. Frente al caso de la Deep Web, por su... more

Web is a wide, various and dynamic environment in which different users publish their documents. Web-mining is one of data mining applications in which web patterns are explored. Studies on web mining can be categorized into three... more

The Internet may be free, but service provider’s indispensable to access services are not, to the extent that while the complexity and burden of the sites increases, it is becoming more and more expensive to surf the net. Blocking access... more

In this paper, we present a system for personality recognition that exploits linguistic cues and does not require supervision for evaluation. We run the system on a dataset sampled from a popular Social Network: FriendFeed. We adopted the... more

Bookmark
Download
- by Fabio Celli
- •
- 8
  Natural Language Processing, Personality, Self-Efficacy, Autonomy

The gigantic growth of information on the Internet makes discovery information challenging and time consuming. We are encircled by a plethora of data in the form of blogs, papers, reviews, and comments on different websites. Recommender... more

Attended Software Freedom Kosova 2016 conference. Held in Pristina, Kosovo (October 2016). Received a full speaker grant from the conference organizers and presented a lecture titled “Using public library computers anonymously in order to... more

Bookmark
Download
- by Dimitar Poposki
- •
- 6
  Information Technology, Digital Libraries, Privacy, Academic Libraries

In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema and templates for the input pages generated from a CGI... more

The World Wide Web organizes information in semi-structured HTML documents. For a template-based web page that contains a list of items, information schema can be implied and structured data can be extracted with a query, i.e. a (web)... more

We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we... more

ste artigo apresenta uma análise dos dados coletados na Deep Web, discutindo suas expressões a partir dos conceitos de Disciplina (FOUCAULT, 2010) e Dialética do Esclarecimento (ADORNO; HORKHEIMER,... more

There are several methods and available tools for terminology extraction, but the quality of the extracted terms is not always high. Hence, an important consideration in terminology extraction is to assess the quality of the extracted... more

Popular Personalities have multiple name aliases addressed in different documents of the web. An exact textual web identification of a person is useful in information retrieval, sentiment analysis, relation extraction and name... more

Bookmark

Data mining, Information Extraction, Deep Web

Log In