Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
23 views

ERROR flume.SinkRunner: Unable to deliver event

I am trying to transfer data from flume to hdfs. My flume.conf files looks like: agent.sources = tail agent.channels = channel1 agent.sinks = sink1 agent.sources.tail.type = exec agent.sources.tail....
Rich's user avatar
  • 141
0 votes
0 answers
39 views

Apache flume does not run hadoop 3.1.0 Flume 1.11

I configured my flume application to perform searches on Twitter but it has been stuck on that part for a long time and it won't go away, I've tried starting again several times I was hoping it would ...
Ricardo Henrique's user avatar
0 votes
0 answers
64 views

flume how to collect kafka protobuf data

there has some protobuf data in my kafka,the data is byte array,i want to use flume collect the kafka data to hdfs,and use spark analysis the hdfs data,after i use the flume config below the flume ...
pandatyut's user avatar
0 votes
1 answer
90 views

Apache Flume agent does not save the data in HDFS

I am trying to create an agent with Apache Flume, but I am new to this and I have not much idea. The agent has to receive the data from Netcat and save it in an HDFS file system. The data that the ...
xiras2's user avatar
  • 1
0 votes
1 answer
182 views

Error in moving log files from local file system to HDFS via Apache Flume

I have log files in my local file system, that are required to be transferred to HDFS via Apache Flume. I am having the following configuration file in the home directory saved as net.conf NetcatAgent....
Samar Pratap Singh's user avatar
0 votes
1 answer
136 views

Escape Sequences not populating hdfs path and file prefix

In my flume flow, I want to have a custom dynamic hdfs path but no data is being populated to the interceptors. Example data: 188 17 2016-06-01 00:31:10 6200.041736 0 Config agent2.sources....
andytcodes's user avatar
0 votes
1 answer
121 views

How do I partition data from a txt/csv file by year and month using Flume? Is is possible to make the HDFS path dynamic?

I want to configure a flume flow so that it takes in a CSV file as a source, checks the data, and dynamically separates each row of data into folders by year/month in HDFS. Is this possible?
andytcodes's user avatar
0 votes
0 answers
92 views

Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs

I'm using flume to get data from Kafka to HDFS. (Kafka Source and HDFS Sink). These are the versions I'm using. hadoop-3.2.2 flume-1.9.0 kafka_2.11-0.10.1.0 This is my kafka-fluem-hdfs.conf: a1....
RecharBao's user avatar
  • 371
1 vote
1 answer
197 views

How a HDFS directory by year month and day is created?

Following the question in this link, there is another question about the creating the directory on Hadoop HDFS. I am new to Hadoop/Flume and I have picked up a project which use Flume to save csv data ...
XYZ's user avatar
  • 382
0 votes
2 answers
1k views

Unable to retrieve Twitter streaming data using Flume

I am trying to stream and retrieve Twitter data using Flume but unable to do so because of some sort of error. When I try executing it using the command: flume-ng agent -n TwitterAgent -c conf -f /...
Pratheek Menon's user avatar
0 votes
1 answer
149 views

Flume Twitter Streaming isssue

I'm trying to get some data from Twitter using Apache Flume and store then in HDFS, but i'm having some troubles This is my flume-env.sh export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 $...
Vinicius Augusto de Souza's user avatar
0 votes
1 answer
109 views

flume is adding a random number to the hdfs file that i want to push (test.csv > test.csv.1591560702234 )

When i put a file in the local directory (vagrant/flume/test.csv), in HDFS flume turns it into (/user/inputs/test.csv.1591560702234) ,i want to know why HDFS adds 1591560702234 and how to remove it ! ...
yanis's user avatar
  • 1
0 votes
0 answers
655 views

Flume Twitter Connection Refused

I've been trying to collect data from Twitter Using Flume which is perfect for Hadoop clusters. Another option could be with fluentd but be honest I don't want to change the component as well as flume ...
Kenry Sanchez's user avatar
0 votes
1 answer
1k views

FLUME [HADOOP_ORG.APACHE.FLUME.TOOLS.GETJAVAPROPERTY_USER: Bad substitution]

I am trying to run the typical Flume first example to get tweets and store them in HDFS using Apache FLume. [Hadoop version 3.1.3; Apache Flume 1.9.0] I have configured flume-env.sh: ` export ...
Javier de la Iglesia 's user avatar
0 votes
0 answers
157 views

How to stream files from Hdfs directory and its sub directories to kafka

Avro files with Json data are written to hdfs directory every few minutes. For example, If today's date is 26/01/2020, a hdfs directory with name 20200126 will be created. Then there will be lot of ...
EnthuDev's user avatar
  • 170
7 votes
5 answers
3k views

Flume sink to HDFS error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument

With: Java 1.8.0_231 Hadoop 3.2.1 Flume 1.8.0 Have created a hdfs service on 9000 port. jps: 11688 DataNode 10120 Jps 11465 NameNode 11964 SecondaryNameNode 12621 NodeManager 12239 ResourceManager ...
pingze's user avatar
  • 1,039
2 votes
2 answers
1k views

Loading Oracle data in HDFS without Sqoop

I wanted to import data from an Oracle database to our Hadoop HDFS and considered using Sqoop. When I tried, I discovered that the data connector for Oracle and Hadoop was disconnected. 2019-07-18 09:...
Col Bates - collynomial's user avatar
0 votes
1 answer
312 views

Flume HDFS Sink Write error "no protocol: value"

When trying to run a flume job I am getting the error given below. I am running this on a cloudera setup. Kafka is the source Morphline is used as an interceptor with avro records getting created ...
DMin's user avatar
  • 10.3k
0 votes
1 answer
123 views

Connection denied when i use flume to post file to HDFS in real-time display

i am a beignner of flume,when i try to write a template to study how to use flume of handling posting file to HDFS in real-time,i got a error about connection denied. what i want to do in template: -->...
cole's user avatar
  • 11
0 votes
1 answer
891 views

How to stop a flume agent gracefully

Many websites suggest using kill -9 when stopping a flume agent. However, when I stop the agent with kill -9, the HDFS sink files left open forever(as *.tmp). How can I stop a flume agent gracefully ...
user3473222's user avatar
0 votes
1 answer
173 views

Flume HDFS sink with Kafka source - multiple files?

Flume HDFS sink configured as follows : tier1.sinks.sink1.hdfs.path = /project/mgd/ tier1.sinks.sink1.hdfs.filePrefix = EMA_LOG%Y%m%d tier1.sinks.sink1.hdfs.rollInterval = 86400 #tier1.sinks.sink1....
user2025791's user avatar
-2 votes
1 answer
33 views

Not able to store twitter data in flume

We we successful in extracting the data from twitter but we couldn't save it on our system using flume.Can you please explain
Prajkta Bhandarwar's user avatar
0 votes
1 answer
540 views

How to copy huge file(200-500GB) everyday from Teradata server to HDFS

I have teradata files on SERVER A and I need to copy to Server B into HDFS. what options do i have? distcp is ruled because Teradata is not on HDFS scp is not feasible for huge files Flume and Kafka ...
aloneonthe_edge's user avatar
0 votes
0 answers
112 views

Transfer data from local directory to Azure Data Lake store with flume without hadoop

** I'm trying to ingest data that is located on a directory of my vm linux to a directory From my azure data lake store with flume I do not know what type of sink to use or even if it is ...
Max's user avatar
  • 1
1 vote
1 answer
680 views

How to write data in real time to HDFS using Flume?

I am using Flume to store sensor data in HDFS. Once the data is received through MQTT. The subscriber posts the data in JSON format to Flume HTTP listener. It is currently working fine, but the ...
Yassine Fadhlaoui's user avatar
1 vote
1 answer
612 views

flume-ng throws Kafka topic must be specified

I'm trying to pull data off my kafka topic and write it to HDFS, and appear to have my flume conf identical to what I've seen in several examples, but I can't seem to get around the below error. I ...
supahcraig's user avatar
0 votes
0 answers
96 views

Flume is not writing the twitter data into the /tmp/xx folder

I am loading twitter data using flume into hdfs location. flume-ng command run successfully and it is showing message like below : [![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 ...
PraveenK's user avatar
0 votes
1 answer
309 views

Analysis of Log with Spark Streaming

I recently did analysis on a static log file with Spark SQL (find out stuff like the ip addresses which appear more than ten times). The problem was from this site. But I used my own implementation ...
Amber's user avatar
  • 944
0 votes
1 answer
287 views

roll setting of flume hdfs sink

Below is my setting in flume.conf: agent.sources = srcpv agent.channels = chlpv agent.sinks = hdfsSink agent.sources.srcpv.type = exec agent.sources.srcpv.command = tail -F /var/log/collector/web/pv....
Nan Wang's user avatar
0 votes
1 answer
259 views

Flume leaves .tmp file in HDFS after when changing to the new day's directory

I'm using Flume 1.7.0 and HDFS sink. I configured Flume to put data in the date directory in HDFS so it will automatically change the directory when the new day comes. The problem is that I set flume ...
sunnwmy's user avatar
  • 143
-3 votes
2 answers
349 views

Getting data directly from a website to a hdfs

How do I get data directly which is entering on a website concurrently on hdfs?
Kshitiz Katiyar's user avatar
1 vote
0 answers
159 views

How to store data on hdfs using flume with existing schema file

I have json data coming from source and i want to dump it on hdfs using flume in avro format for which i already have avsc file, i am using following configuration for sink but thats not picking my ...
User_qwerty's user avatar
-1 votes
1 answer
73 views

Duplicates between Mysql and hdfs with flume?

Is there any duplicacy occurring when we use Flume to get live streaming data from mysql database? And how does flume store the live data on table created on hdfs?
Kshitiz Katiyar's user avatar
0 votes
1 answer
72 views

Flume - 2 messages in a single file in HDFS

I am trying to ingest messages from IBM MQ using Apache Flume. I got the below configurations: # Source definition u.sources.s1.type=jms u.sources.s1.initialContextFactory=ABC u.sources.s1....
Rishu S's user avatar
  • 3,958
0 votes
2 answers
384 views

Use of flume as a kafka consumer

Is it possible to config flume sink to be my agent's file system. Do I have to sink to hdfs or hadoop? I am working with flume 1.6.0 and kafka 10.1.1 I will show you my flume config and flume command ...
philip78yahoo's user avatar
1 vote
1 answer
1k views

Using flume to import data from kafka topic to hdfs folder

I am using flume to load messages from kafka topic HDFS folder. So, I created a topic TT I sent messages to TT with a kafka console producer I configured the flume agent FF Run the flume agent ...
wwHh's user avatar
  • 17
1 vote
1 answer
118 views

Spark doesn't read the file properly

I run Flume to ingest Twitter data into HDFS (in JSON format) and run Spark to read that file. But somehow, it doesn't return the correct result: it seems the content of the file is not updated. ...
Yusata's user avatar
  • 309
0 votes
0 answers
734 views

Flume does not write to HDFS from kafka topic

I am trying to read from Kafka topic and store it to HDFS as Flume sink and input data is JSON, following is my config file, # components name a1.sources = source1 a1.channels = channel1 a1.sinks = ...
s.1234's user avatar
  • 23
1 vote
0 answers
111 views

Pyspark error reading file. Flume HDFS sink imports file with user=flume and permissions 644

I'm using Cloudera Quickstart VM 5.12 I have a Flume agent moving CSV files from spooldir source into HDFS sink. The operation works ok but the imported files have: User=flume Group=cloudera ...
Taka's user avatar
  • 659
0 votes
0 answers
388 views

Flume Hive Sink Error

I am generating data to a spool directory and redirecting that to a hive table using flume hive sink. Flume sink is connected with hive metastore but after that I am facing the following issue. ...
Praveen Y B R's user avatar
0 votes
1 answer
200 views

Error using flume while fetching twitter data to hdfs

While fetching the twitter data to HDFS using FLUME , I m getting this error again and again as far as i have changed the versions of the twitter4j.jar files ,please tell me why this error is coming....
Devansh Sharma's user avatar
0 votes
1 answer
316 views

EOFException from Kafka in Flume

I am trying to set up a simple data pipeline from a console Kafka producer to the Hadoop file system (HDFS). I am working on a 64bit Ubuntu Virtual Machine and have created separate users for both ...
stefanS's user avatar
  • 341
0 votes
2 answers
1k views

Is there a way to load streaming data from Kafka into HDFS using Spark and without Flume?

I was looking if there is a way to load the streaming data from Kafka directly into HDFS using spark streaming and without using Flume. I have tried it using Flume(Kafka source and HDFS sink) already. ...
Abhishek Jain's user avatar
12 votes
1 answer
1k views

How to configure Flume to listen a web api http petitions

I have built an api web application, which is published on IIS Server, I am trying to configure Apache Flume to listen that web api and to save the response of http petitions in HDFS, this is the post ...
MelgoV's user avatar
  • 656
1 vote
1 answer
834 views

Using FLUME to store data in Hadoop

I have followed all the steps for hadoop installation and Flume from tutorials. I am a naive in Big Data tools. I am getting the following errors. I dont understand, where the problem is? I have also ...
Shivam's user avatar
  • 113
3 votes
0 answers
45 views

HDFS ingestion rate frequently drops drastically from all Flume agents. How to investigate/rectify?

I have a good sized Hadoop cluster, with multiple Flume agents (1 agent per machine, not part of the cluster) writing to using HDFSSink. Almost 95% of the time, the Sink batch completion rate is in ...
Viren's user avatar
  • 180
1 vote
1 answer
23 views

Using flume to stream with average active website to HDFS. Is it efficient?

Our organization have a very average active website, that would get around 1000 hits per hour. We are planning to stream those logs to HDFS/Hive. Now the question is around HDFS efficiency around ...
sican's user avatar
  • 11
0 votes
2 answers
815 views

Flume not enough space error while data flow from Kafka to HDFS

We are struggling with data flow from Kafka to HDFS managing by Flume. Data is not fully transported to hdfs, becouse of exceptions described below. However this error looks misleading for us, we have ...
Tymek's user avatar
  • 37
0 votes
1 answer
53 views

Ingest flat data file from edge device to HDFS and process

I've an use case where devices on the vehicle have to send flat binary files to a cloud server, process them as and when they come in and store the data into Hbase. I'm wondering what data ingestion ...
Keerthi Jayarajan's user avatar