Questions tagged [hadoop]
Hadoop provides High Availability services by distributing processing of large data sets across clusters of machines.
68 questions
1
vote
0
answers
23
views
Hadoop + warnings as slow block-receive from data-node machines
We have Hadoop cluster with 487 data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version.
Each ...
0
votes
0
answers
54
views
Hadoop Namenode heap size tuning
NameNode process is executed in Java virtual machine,
and Java object which NameNode creates is managed in Java virtual memory. As the fles or directories are created, inode objects and block objects ...
0
votes
1
answer
154
views
Reg. Hadoop Namenode format
I am building Hadoop Pseudo cluster in my personal machine using CentOS 9 and Hadoop 3.1.1. I completed the installation of hadoop and performed some operation, everything was fine. In later time, ...
1
vote
0
answers
307
views
Clear RAM Memory Cache and buffer on production Hadoop cluster with HDFS filesystem
we have Hadoop cluster with 265 Linux RHEL machines.
from total 265 machines, we have 230 data nodes machines with HDFS filesystem.
total memory on each data-node is 128G and we run many spark ...
1
vote
0
answers
56
views
Informatica job log files process through shell scripting
Log file raw data :
READER_1_1_1> BIGQUERYV2_10000 [2022-11-04 01:55:20.724] [INFO] Job statistics - \n Job ID [job_PsfUvYJkPeBfecxeIzUUrIIa9TEc] \n Job creation time [2022-11-04 01:54:54.724] , \n ...
0
votes
1
answer
3k
views
what is the different between Buffer Cache that displayed from free command VS the available memory [closed]
we have 463 RHEL 7.6 machines in the cluster most of then are HDFS machines ( datanode )
from free -g command we can that usually buff/cache is around 30-50 when total memory is 256G
as I know - a ...
0
votes
0
answers
126
views
How to move the last n files in hdfs
I have a folder in HDFS contains 830000 files, and I want to move the last "8797" files enter code here to another folder in HDFS? I tried using xargs but didn't work fine. Any other ideas?
...
0
votes
1
answer
94
views
I dont know why SSH and Hadoop connecting to wrong place
I have 3 virtual machines:
master@master-virtualbox
worker1@worker1-virtualbox
worker2@worker2-virtualbox
When I try to copy ssh id from worker1 to master, asked for password which I dont know what ...
-1
votes
1
answer
268
views
Unable to upgrade python on cloudera hdfs
Not able to upgrade python on Cloudera as it shows the error whenever I run the below command:
$ sudo yum install python27
error: No package python27 available.
$ sudo yum install python36u
error: No ...
2
votes
0
answers
1k
views
ssh: connect to host localhost port 22: Connection refused
I have installed hadoop and ssh. hadoop was working fine, then today I am getting the error below when I run the command sbin/start-dfs.sh:
Starting namenodes on [localhost]
localhost: ssh: connect ...
1
vote
0
answers
307
views
master: ssh: connect to host master port 22: Connection refused
i am trying to start my hadoop cluster using the command "start-dfs.sh" but am getting errors as shown below
Starting namenodes on [master]
master: ssh: connect to host master port 22: Connection ...
0
votes
0
answers
777
views
curl is stuck when trying to get NameNodeStatus
we have two namenode in the cluster hadoop cluster
this is good example of one of the name node that return good status
first namenode machine IP - 92.3.44.2
curl -v http://92.3.44.2:50070/jmx?qry=...
-2
votes
1
answer
90
views
Hadoop cluster + designing number of disks on data node machine and min requirements
we are using HDP version - 2.6.5 , and HDFS Block replication is 3
we are try to understand data nodes disks min requirements for production mode and according to the fact that Block replication=3
...
0
votes
0
answers
64
views
centOS 7 access denied when logging on kerberized environment
When I am trying to log in to one of my hadoop nodes running centos 7. I am receiving a 'access denied' error. Typically, in these situations, I login as the local admin for the box, go to /tmp and ...
1
vote
1
answer
871
views
What is the right mkfs cli in order to create xfs file-system on huge disk
We need to create xfs file-system on kafka disk
The special thing about kafka disk is the disk size
kafka disk have 20TB size in our case
I not sure about the following mkfs , but I need advice to ...
2
votes
1
answer
5k
views
Copy files from a hdfs folder to another hdfs location by filtering with modified date using shell script
I have 1 year data in my hdfs location and i want to copy data for last 6 months into another hdfs location.
Is it possible to copy data only for 6 months directly from hdfs command or do we need to ...
1
vote
0
answers
594
views
Unable to reach Hortonworks HDP on port 8080:This site can't be reached
My goal is to run HDP on VM VirtualBox
Image shows my port forwarding rules
When I try to launch dashboard
My host machine is 18.04 Ubuntu.
Port mapping output
8000/tcp open http-alt
8042/tcp ...
0
votes
1
answer
2k
views
bash: pig: command not found
I am trying to find out what version of pig I am using. I thought I already installed it
# yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\*
When I try to enter a pig ...
0
votes
1
answer
769
views
how to run CLI from root with user hdfs
when I run the following cli from hdfs user its running well
# su hdfs
$ hadoop fs -du -s /home/test/* | awk '{ sum += $1 } END { print sum }'
4182692
but when I run it from root , ...
1
vote
3
answers
2k
views
how to know the right swap size for hadoop Linux machines
we are using many hadoop clusters
for now we are using swap size - 16G
free -g
total used free shared buff/cache available
Mem: 125 ...
1
vote
1
answer
715
views
why sometime need to stop process by kill -9
we have kafka machines in hadoop cluster
the script that stop the kafka process do the following
kill PID
but we notice that the script that stop the kafka not really kill the process
therefore we ...
1
vote
1
answer
2k
views
how to manage cleaning of /tmp better on hadoop machines
As all know the content of /tmp should be deleted after some time.
In my case we have machines ( redhat version 7.2 ) that are configured as following.
As we can see the service that is triggered ...
0
votes
2
answers
91
views
How to sum value of specified column by specific date in kornshell?
I'm working in a unique validation framework that validates data. For each validation job there is a SQL job with an accompanying KSH job (kornshell). The SQL queries something in the database, and ...
0
votes
1
answer
936
views
Hadoop : find the hostname or IP address based on the process id
Is it possible to find the IP address or hostname of who submited a job based on the processid ?
We have some hadoop jobs running for hundreds of hours. We need to know from which local machine it ...
1
vote
1
answer
14k
views
localhost: Permission denied (publickey,password,keyboard-interactive)
I was trying to run Hadoop in Mac OS and I get the following errors,
$ hstart
WARNING: Attempting to start all Apache Hadoop daemons as chaklader in 10 seconds.
WARNING: This is not a recommended ...
1
vote
1
answer
1k
views
Not able to exit from interactive mode for yarn top command
I have a bash script which is on serverA. This script will ssh to serverB and runs yarn top command, pulls the metrics and put into the file(test.txt) on serverA. Below is the command which I am using:...
0
votes
1
answer
130
views
Can we mix MTU values in cluster
we have hadoop cluster ( all machines are linux redhat machines version 7.x )
on the VM machines we set MTU=8900 and all other machines we set MTU=9000
we set on VM MTU=8900 because we saw some ...
0
votes
0
answers
180
views
jumbo frame + fine tune
we have hadoop cluster ( machines are linux redhat 7 ) ,and we set jumbo frame according to documentation
first we set MTU=9000 on all nodes in the cluster
but we notice about some problems , so ...
1
vote
1
answer
1k
views
how to find the right value of MTU Jumbo frame [closed]
We've made the decision to set jumbo frames on all our Linux machines. We have a hadoop cluster with master machines, workers machines and Kafka machines.
Our switches (Cisco) are suitable to Jumbo ...
0
votes
1
answer
312
views
install hadoop_2_6_1_0_129-hdfs
tried to install Hadoop cluster
App Timeline Server Install returned error:
2018-02-26 19:31:49,406 - Installing package hadoop_2_6_1_0_129-hdfs ('/usr/bin/yum -d 0 -e 0 -y install ...
0
votes
2
answers
4k
views
systemd - define a service without ExecStop and be able to stop it without "fail"
I am with CentOS 7, and I want to start Kafka standalone producer (File Connector) as a service. The command is:
/opt/kafka/bin/connect-standalone.sh /opt/kafka/config/connect-standalone.properties /...
1
vote
0
answers
175
views
jps process issue about installing Hadoop
I'm trying to install hadoop with fully distributed mode in Centos6.4 (I use 4 Virtual boxes).
server1 NameNode
server2 SecondaryNameNode, Datanode
server3 datanode
server4 datanode
Maybe I guess I'...
2
votes
0
answers
623
views
Hadoop cluster not listening on port that I configured. What is wrong?
I set up a Hadoop cluster with RHEL 7.4 servers. There is no firewall between them. I am running Hadoop 3.0. On the namenode the core-site.xml file is configured to use port 54310.
I run this ...
2
votes
0
answers
1k
views
The command "hdfs dfsadmin -report" fails because "failed to connect to server"
I am trying to configure a multi-node cluster of open source Hadoop. I have Hadoop 3.0 installed on the namenode and the data node. Both are running Linux (SUSE and Ubuntu). None are CentOS, RedHat ...
5
votes
0
answers
3k
views
How do you get Hadoop commands to work when you get the error "Invalid HADOOP_COMMON_HOME"?
I had a Hadoop version 1.x installed on Linux SUSE 12.3. I moved the directory somewhere else to back it up. I tried to install Hadoop 3.0. I expect Hadoop commands to work based on what I did. I ...
0
votes
1
answer
246
views
Ambari and Spark cant start from CLI
From the Ambari GUI we can not start the Spark service. So we want to start it by command line as the following:
[spark@mas01 spark2]$ ./sbin/start-thriftserver.sh --master yarn-client --executor-...
0
votes
2
answers
3k
views
Passing inline arguments to shell script being executed on HDFS
I am running a shell script stored on HDFS (so that it can be recognized by my oozie workflow). to run this script I am using
hadoop fs -cat script.sh |exec sh
However I need to pass inline ...
1
vote
0
answers
279
views
How to connect Kerberos with multiple LDAP servers?
My actual task is to make our kerberized Hadoop cluster usable by all our teams.
Right now we have a very weird setup in our company:
The Hadoop cluster has a dedicated KDC (openSUSE Kerberos with ...
10
votes
5
answers
69k
views
RPC: Port mapper failure - Unable to receive: errno 113 (No route to host)
I am trying to mount hdfs on my local machine(ubuntu) using nfs by following the below link:--
https://www.cloudera.com/documentation/enterprise/5-2-x/topics/cdh_ig_nfsv3_gateway_configure.html#...
3
votes
2
answers
13k
views
mount.nfs: mount system call failed
I am trying to mount hdfs on my local machine running Ubuntu using the following command :---
sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/
But I am getting this ...
0
votes
1
answer
882
views
How to upload file into File View on Ambari Web?
I am using HDP 2.6 and ambari 2.5. on a 5 node cluster. The cluster was setup with vagrant following these instructions
https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+for+New+VM+Users ...
1
vote
1
answer
3k
views
Installing Oracle JDK 1.7 -- 404 error
I'm trying to install Hadoop 2.7.3. on Elementary OS (which ~ Ubuntu, I believe) following the instructions in the BUILDING.txt that came with the Hadoop files.
The file indicates that I need to ...
-1
votes
2
answers
334
views
Linux Hadoop shell script giving .class error
I am trying to run this script on for running map reduce on hadoop. But when I run this script, it is giving me the error attached in the screen shot.
Script:
rm -rf /home/sk/Desktop/abc/wordcountc/
...
4
votes
2
answers
639
views
LD_LIBRARY_PATH lost when using mount command
TL;DR
When a fuse filesystem is mounted via the mount command, the environment variables are not passed to the fuse script. Why?
Context
I am trying to mount hdfs (hadoop file system) via fuse.
...
0
votes
1
answer
113
views
How do I install an .rpm that fails with an error about an .so file not being found from Python 2.6?
When I try to install cloudera-manager-agent 5.7 from an .rpm I get an error. The error says that a dependency has not been met because yum could not find libpython2.6.so.1.0(64bit). I would expect ...
-1
votes
2
answers
2k
views
CentOS7 SSH can't connect localhost?
I'm configuring Hadoop environment.
I have use $ ssh-keygen -t rsa -P "" to generate id_rsa.pub and id_rsa.
And use cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys to set password-free login.
...
2
votes
1
answer
2k
views
False error starting Hue
I have installed Hue in CentOS 7 from Cloudera CDH5 repository.
Upon starting it reports an error:
# systemctl status hue
hue.service - SYSV: Hue web server
Loaded: loaded (/etc/rc.d/init.d/hue)
...
1
vote
2
answers
4k
views
Delete files 10 days older from hdfs
I am writing a ksh script to clean up hdfs directories and files at least 10 days old. I am testing the deletion command in a terminal, but it kept saying it is wrong:
$ hdfs dfs -find "/file/path/...
0
votes
3
answers
3k
views
How to split the date range into days using script
I have this input:
startdate end date val1 val2
2015-10-13 07:00:02 2015-10-19 00:00:00 45 1900
in which one line specifies a date range that spans multiple ...
2
votes
0
answers
271
views
Which OS should I use for Hadoop cluster?
I have a client setting up a Hadoop cluster. We have all used and are very familiar with CentOS 7. I was told Scientific Linux maybe better optimized for Hadoop.
Is there any truth to that?