All Questions
192 questions
0
votes
0
answers
23
views
ERROR flume.SinkRunner: Unable to deliver event
I am trying to transfer data from flume to hdfs. My flume.conf files looks like:
agent.sources = tail
agent.channels = channel1
agent.sinks = sink1
agent.sources.tail.type = exec
agent.sources.tail....
0
votes
0
answers
39
views
Apache flume does not run hadoop 3.1.0 Flume 1.11
I configured my flume application to perform searches on Twitter but it has been stuck on that part for a long time and it won't go away, I've tried starting again several times
I was hoping it would ...
0
votes
0
answers
64
views
flume how to collect kafka protobuf data
there has some protobuf data in my kafka,the data is byte array,i want to use flume collect the kafka data to hdfs,and use spark analysis the hdfs data,after i use the flume config below
the flume ...
0
votes
1
answer
90
views
Apache Flume agent does not save the data in HDFS
I am trying to create an agent with Apache Flume, but I am new to this and I have not much idea. The agent has to receive the data from Netcat and save it in an HDFS file system. The data that the ...
0
votes
1
answer
182
views
Error in moving log files from local file system to HDFS via Apache Flume
I have log files in my local file system, that are required to be transferred to HDFS via Apache Flume. I am having the following configuration file in the home directory saved as net.conf
NetcatAgent....
0
votes
1
answer
136
views
Escape Sequences not populating hdfs path and file prefix
In my flume flow, I want to have a custom dynamic hdfs path but no data is being populated to the interceptors.
Example data:
188 17 2016-06-01 00:31:10 6200.041736 0
Config
agent2.sources....
0
votes
1
answer
121
views
How do I partition data from a txt/csv file by year and month using Flume? Is is possible to make the HDFS path dynamic?
I want to configure a flume flow so that it takes in a CSV file as a source, checks the data, and dynamically separates each row of data into folders by year/month in HDFS. Is this possible?
0
votes
0
answers
92
views
Create /origin_data/gmall/log/topic_start/2021-08-19/logstart-.1629387459287.gz.tmp, but there is no such file in hdfs
I'm using flume to get data from Kafka to HDFS. (Kafka Source and HDFS Sink). These are the versions I'm using.
hadoop-3.2.2
flume-1.9.0
kafka_2.11-0.10.1.0
This is my kafka-fluem-hdfs.conf:
a1....
1
vote
1
answer
197
views
How a HDFS directory by year month and day is created?
Following the question in this link, there is another question about the creating the directory on Hadoop HDFS.
I am new to Hadoop/Flume and I have picked up a project which use Flume to save csv data ...
0
votes
2
answers
1k
views
Unable to retrieve Twitter streaming data using Flume
I am trying to stream and retrieve Twitter data using Flume but unable to do so because of some sort of error.
When I try executing it using the command:
flume-ng agent -n TwitterAgent -c conf -f /...
0
votes
1
answer
149
views
Flume Twitter Streaming isssue
I'm trying to get some data from Twitter using Apache Flume and store then in HDFS, but i'm having some troubles
This is my flume-env.sh
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
$...
0
votes
1
answer
109
views
flume is adding a random number to the hdfs file that i want to push (test.csv > test.csv.1591560702234 )
When i put a file in the local directory (vagrant/flume/test.csv), in HDFS flume turns it into (/user/inputs/test.csv.1591560702234) ,i want to know why HDFS adds 1591560702234 and how to remove it !
...
0
votes
0
answers
655
views
Flume Twitter Connection Refused
I've been trying to collect data from Twitter Using Flume which is perfect for Hadoop clusters. Another option could be with fluentd but be honest I don't want to change the component as well as flume ...
0
votes
1
answer
1k
views
FLUME [HADOOP_ORG.APACHE.FLUME.TOOLS.GETJAVAPROPERTY_USER: Bad substitution]
I am trying to run the typical Flume first example to get tweets and store them in HDFS using Apache FLume.
[Hadoop version 3.1.3; Apache Flume 1.9.0]
I have configured flume-env.sh:
`
export ...
0
votes
0
answers
157
views
How to stream files from Hdfs directory and its sub directories to kafka
Avro files with Json data are written to hdfs directory every few minutes. For example, If today's date is 26/01/2020, a hdfs directory with name 20200126 will be created. Then there will be lot of ...
7
votes
5
answers
3k
views
Flume sink to HDFS error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
With:
Java 1.8.0_231
Hadoop 3.2.1
Flume 1.8.0
Have created a hdfs service on 9000 port.
jps:
11688 DataNode
10120 Jps
11465 NameNode
11964 SecondaryNameNode
12621 NodeManager
12239 ResourceManager
...
2
votes
2
answers
1k
views
Loading Oracle data in HDFS without Sqoop
I wanted to import data from an Oracle database to our Hadoop HDFS and considered using Sqoop. When I tried, I discovered that the data connector for Oracle and Hadoop was disconnected.
2019-07-18 09:...
0
votes
1
answer
312
views
Flume HDFS Sink Write error "no protocol: value"
When trying to run a flume job I am getting the error given below. I am running this on a cloudera setup.
Kafka is the source
Morphline is used as an interceptor with avro records getting created ...
0
votes
1
answer
123
views
Connection denied when i use flume to post file to HDFS in real-time display
i am a beignner of flume,when i try to write a template to study how to use flume of handling posting file to HDFS in real-time,i got a error about connection denied.
what i want to do in template:
-->...
0
votes
1
answer
891
views
How to stop a flume agent gracefully
Many websites suggest using kill -9 when stopping a flume agent.
However, when I stop the agent with kill -9, the HDFS sink files left open forever(as *.tmp).
How can I stop a flume agent gracefully ...
0
votes
1
answer
173
views
Flume HDFS sink with Kafka source - multiple files?
Flume HDFS sink configured as follows :
tier1.sinks.sink1.hdfs.path = /project/mgd/
tier1.sinks.sink1.hdfs.filePrefix = EMA_LOG%Y%m%d
tier1.sinks.sink1.hdfs.rollInterval = 86400
#tier1.sinks.sink1....
-2
votes
1
answer
33
views
Not able to store twitter data in flume
We we successful in extracting the data from twitter but we couldn't save it on our system using flume.Can you please explain
0
votes
1
answer
540
views
How to copy huge file(200-500GB) everyday from Teradata server to HDFS
I have teradata files on SERVER A and I need to copy to Server B into HDFS. what options do i have?
distcp is ruled because Teradata is not on HDFS
scp is not feasible for huge files
Flume and Kafka ...
0
votes
0
answers
112
views
Transfer data from local directory to Azure Data Lake store with flume without hadoop
** I'm trying to ingest data that is located on a directory of my vm linux
to a directory From my azure data lake store with flume I do not know what type of sink to use or even if it is ...
1
vote
1
answer
680
views
How to write data in real time to HDFS using Flume?
I am using Flume to store sensor data in HDFS. Once the data is received through MQTT. The subscriber posts the data in JSON format to Flume HTTP listener. It is currently working fine, but the ...
1
vote
1
answer
612
views
flume-ng throws Kafka topic must be specified
I'm trying to pull data off my kafka topic and write it to HDFS, and appear to have my flume conf identical to what I've seen in several examples, but I can't seem to get around the below error. I ...
0
votes
0
answers
96
views
Flume is not writing the twitter data into the /tmp/xx folder
I am loading twitter data using flume into hdfs location.
flume-ng command run successfully and it is showing message like below :
[![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 ...
0
votes
1
answer
309
views
Analysis of Log with Spark Streaming
I recently did analysis on a static log file with Spark SQL (find out stuff like the ip addresses which appear more than ten times). The problem was from this site. But I used my own implementation ...
0
votes
1
answer
287
views
roll setting of flume hdfs sink
Below is my setting in flume.conf:
agent.sources = srcpv
agent.channels = chlpv
agent.sinks = hdfsSink
agent.sources.srcpv.type = exec
agent.sources.srcpv.command = tail -F /var/log/collector/web/pv....
0
votes
1
answer
259
views
Flume leaves .tmp file in HDFS after when changing to the new day's directory
I'm using Flume 1.7.0 and HDFS sink. I configured Flume to put data in the date directory in HDFS so it will automatically change the directory when the new day comes. The problem is that I set flume ...
-3
votes
2
answers
349
views
Getting data directly from a website to a hdfs
How do I get data directly which is entering on a website concurrently on hdfs?
1
vote
0
answers
159
views
How to store data on hdfs using flume with existing schema file
I have json data coming from source and i want to dump it on hdfs using flume in avro format for which i already have avsc file, i am using following configuration for sink but thats not picking my ...
-1
votes
1
answer
73
views
Duplicates between Mysql and hdfs with flume?
Is there any duplicacy occurring when we use Flume to get live streaming data from mysql database?
And how does flume store the live data on table created on hdfs?
0
votes
1
answer
72
views
Flume - 2 messages in a single file in HDFS
I am trying to ingest messages from IBM MQ using Apache Flume. I got the below configurations:
# Source definition
u.sources.s1.type=jms
u.sources.s1.initialContextFactory=ABC
u.sources.s1....
0
votes
2
answers
384
views
Use of flume as a kafka consumer
Is it possible to config flume sink to be my agent's file system. Do I have to sink to hdfs or hadoop?
I am working with flume 1.6.0 and kafka 10.1.1
I will show you my flume config and flume command ...
1
vote
1
answer
1k
views
Using flume to import data from kafka topic to hdfs folder
I am using flume to load messages from kafka topic HDFS folder. So,
I created a topic TT
I sent messages to TT with a kafka console producer
I configured the flume agent FF
Run the flume agent ...
1
vote
1
answer
118
views
Spark doesn't read the file properly
I run Flume to ingest Twitter data into HDFS (in JSON format) and run Spark to read that file.
But somehow, it doesn't return the correct result: it seems the content of the file is not updated.
...
0
votes
0
answers
734
views
Flume does not write to HDFS from kafka topic
I am trying to read from Kafka topic and store it to HDFS as Flume sink and input data is JSON, following is my config file,
# components name
a1.sources = source1
a1.channels = channel1
a1.sinks = ...
1
vote
0
answers
111
views
Pyspark error reading file. Flume HDFS sink imports file with user=flume and permissions 644
I'm using Cloudera Quickstart VM 5.12
I have a Flume agent moving CSV files from spooldir source into HDFS sink. The operation works ok but the imported files have:
User=flume
Group=cloudera
...
0
votes
0
answers
388
views
Flume Hive Sink Error
I am generating data to a spool directory and redirecting that to a hive table using flume hive sink. Flume sink is connected with hive metastore but after that I am facing the following issue.
...
0
votes
1
answer
200
views
Error using flume while fetching twitter data to hdfs
While fetching the twitter data to HDFS using FLUME , I m getting this error again and again as far as i have changed the versions of the twitter4j.jar files ,please tell me why this error is coming....
0
votes
1
answer
316
views
EOFException from Kafka in Flume
I am trying to set up a simple data pipeline from a console Kafka producer to the Hadoop file system (HDFS). I am working on a 64bit Ubuntu Virtual Machine and have created separate users for both ...
0
votes
2
answers
1k
views
Is there a way to load streaming data from Kafka into HDFS using Spark and without Flume?
I was looking if there is a way to load the streaming data from Kafka directly into HDFS using spark streaming and without using Flume.
I have tried it using Flume(Kafka source and HDFS sink) already.
...
12
votes
1
answer
1k
views
How to configure Flume to listen a web api http petitions
I have built an api web application, which is published on IIS Server, I am trying to configure Apache Flume to listen that web api and to save the response of http petitions in HDFS, this is the post ...
1
vote
1
answer
834
views
Using FLUME to store data in Hadoop
I have followed all the steps for hadoop installation and Flume from tutorials.
I am a naive in Big Data tools. I am getting the following errors. I dont understand, where the problem is?
I have also ...
3
votes
0
answers
45
views
HDFS ingestion rate frequently drops drastically from all Flume agents. How to investigate/rectify?
I have a good sized Hadoop cluster, with multiple Flume agents (1 agent per machine, not part of the cluster) writing to using HDFSSink. Almost 95% of the time, the Sink batch completion rate is in ...
1
vote
1
answer
23
views
Using flume to stream with average active website to HDFS. Is it efficient?
Our organization have a very average active website, that would get around 1000 hits per hour. We are planning to stream those logs to HDFS/Hive.
Now the question is around HDFS efficiency around ...
0
votes
2
answers
815
views
Flume not enough space error while data flow from Kafka to HDFS
We are struggling with data flow from Kafka to HDFS managing by Flume.
Data is not fully transported to hdfs, becouse of exceptions described below.
However this error looks misleading for us, we have ...
0
votes
1
answer
53
views
Ingest flat data file from edge device to HDFS and process
I've an use case where devices on the vehicle have to send flat binary files to a cloud server, process them as and when they come in and store the data into Hbase. I'm wondering what data ingestion ...