Skip to main content

Questions tagged [hadoop]

Hadoop provides High Availability services by distributing processing of large data sets across clusters of machines.

Filter by
Sorted by
Tagged with
1 vote
0 answers
23 views

Hadoop + warnings as slow block-receive from data-node machines

We have Hadoop cluster with 487 data-nodes machines ( each data-node machine include also the Service node-manager ) , all machines are physical machines ( DELL ) , and OS is RHEL 7.9 version. Each ...
yael's user avatar
  • 13.7k
0 votes
0 answers
54 views

Hadoop Namenode heap size tuning

NameNode process is executed in Java virtual machine, and Java object which NameNode creates is managed in Java virtual memory. As the fles or directories are created, inode objects and block objects ...
yael's user avatar
  • 13.7k
0 votes
1 answer
154 views

Reg. Hadoop Namenode format

I am building Hadoop Pseudo cluster in my personal machine using CentOS 9 and Hadoop 3.1.1. I completed the installation of hadoop and performed some operation, everything was fine. In later time, ...
user3625945's user avatar
1 vote
0 answers
307 views

Clear RAM Memory Cache and buffer on production Hadoop cluster with HDFS filesystem

we have Hadoop cluster with 265 Linux RHEL machines. from total 265 machines, we have 230 data nodes machines with HDFS filesystem. total memory on each data-node is 128G and we run many spark ...
yael's user avatar
  • 13.7k
1 vote
0 answers
56 views

Informatica job log files process through shell scripting

Log file raw data : READER_1_1_1> BIGQUERYV2_10000 [2022-11-04 01:55:20.724] [INFO] Job statistics - \n Job ID [job_PsfUvYJkPeBfecxeIzUUrIIa9TEc] \n Job creation time [2022-11-04 01:54:54.724] , \n ...
kasim basha's user avatar
0 votes
1 answer
3k views

what is the different between Buffer Cache that displayed from free command VS the available memory [closed]

we have 463 RHEL 7.6 machines in the cluster most of then are HDFS machines ( datanode ) from free -g command we can that usually buff/cache is around 30-50 when total memory is 256G as I know - a ...
yael's user avatar
  • 13.7k
0 votes
0 answers
126 views

How to move the last n files in hdfs

I have a folder in HDFS contains 830000 files, and I want to move the last "8797" files enter code here to another folder in HDFS? I tried using xargs but didn't work fine. Any other ideas? ...
Omar AlSaghier's user avatar
0 votes
1 answer
94 views

I dont know why SSH and Hadoop connecting to wrong place

I have 3 virtual machines: master@master-virtualbox worker1@worker1-virtualbox worker2@worker2-virtualbox When I try to copy ssh id from worker1 to master, asked for password which I dont know what ...
omer's user avatar
  • 3
-1 votes
1 answer
268 views

Unable to upgrade python on cloudera hdfs

Not able to upgrade python on Cloudera as it shows the error whenever I run the below command: $ sudo yum install python27 error: No package python27 available. $ sudo yum install python36u error: No ...
vicky sood's user avatar
2 votes
0 answers
1k views

ssh: connect to host localhost port 22: Connection refused

I have installed hadoop and ssh. hadoop was working fine, then today I am getting the error below when I run the command sbin/start-dfs.sh: Starting namenodes on [localhost] localhost: ssh: connect ...
Sanaya's user avatar
  • 31
1 vote
0 answers
307 views

master: ssh: connect to host master port 22: Connection refused

i am trying to start my hadoop cluster using the command "start-dfs.sh" but am getting errors as shown below Starting namenodes on [master] master: ssh: connect to host master port 22: Connection ...
Sanaya's user avatar
  • 31
0 votes
0 answers
777 views

curl is stuck when trying to get NameNodeStatus

we have two namenode in the cluster hadoop cluster this is good example of one of the name node that return good status first namenode machine IP - 92.3.44.2 curl -v http://92.3.44.2:50070/jmx?qry=...
yael's user avatar
  • 13.7k
-2 votes
1 answer
90 views

Hadoop cluster + designing number of disks on data node machine and min requirements

we are using HDP version - 2.6.5 , and HDFS Block replication is 3 we are try to understand data nodes disks min requirements for production mode and according to the fact that Block replication=3 ...
yael's user avatar
  • 13.7k
0 votes
0 answers
64 views

centOS 7 access denied when logging on kerberized environment

When I am trying to log in to one of my hadoop nodes running centos 7. I am receiving a 'access denied' error. Typically, in these situations, I login as the local admin for the box, go to /tmp and ...
GenericDisplayName's user avatar
1 vote
1 answer
871 views

What is the right mkfs cli in order to create xfs file-system on huge disk

We need to create xfs file-system on kafka disk The special thing about kafka disk is the disk size kafka disk have 20TB size in our case I not sure about the following mkfs , but I need advice to ...
yael's user avatar
  • 13.7k
2 votes
1 answer
5k views

Copy files from a hdfs folder to another hdfs location by filtering with modified date using shell script

I have 1 year data in my hdfs location and i want to copy data for last 6 months into another hdfs location. Is it possible to copy data only for 6 months directly from hdfs command or do we need to ...
Antony's user avatar
  • 131
1 vote
0 answers
594 views

Unable to reach Hortonworks HDP on port 8080:This site can't be reached

My goal is to run HDP on VM VirtualBox Image shows my port forwarding rules When I try to launch dashboard My host machine is 18.04 Ubuntu. Port mapping output 8000/tcp open http-alt 8042/tcp ...
MikiBelavista's user avatar
0 votes
1 answer
2k views

bash: pig: command not found

I am trying to find out what version of pig I am using. I thought I already installed it # yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\* When I try to enter a pig ...
ubliat's user avatar
  • 1
0 votes
1 answer
769 views

how to run CLI from root with user hdfs

when I run the following cli from hdfs user its running well # su hdfs $ hadoop fs -du -s /home/test/* | awk '{ sum += $1 } END { print sum }' 4182692 but when I run it from root , ...
yael's user avatar
  • 13.7k
1 vote
3 answers
2k views

how to know the right swap size for hadoop Linux machines

we are using many hadoop clusters for now we are using swap size - 16G free -g total used free shared buff/cache available Mem: 125 ...
yael's user avatar
  • 13.7k
1 vote
1 answer
715 views

why sometime need to stop process by kill -9

we have kafka machines in hadoop cluster the script that stop the kafka process do the following kill PID but we notice that the script that stop the kafka not really kill the process therefore we ...
yael's user avatar
  • 13.7k
1 vote
1 answer
2k views

how to manage cleaning of /tmp better on hadoop machines

As all know the content of /tmp should be deleted after some time. In my case we have machines ( redhat version 7.2 ) that are configured as following. As we can see the service that is triggered ...
yael's user avatar
  • 13.7k
0 votes
2 answers
91 views

How to sum value of specified column by specific date in kornshell?

I'm working in a unique validation framework that validates data. For each validation job there is a SQL job with an accompanying KSH job (kornshell). The SQL queries something in the database, and ...
Nathan's user avatar
  • 11
0 votes
1 answer
936 views

Hadoop : find the hostname or IP address based on the process id

Is it possible to find the IP address or hostname of who submited a job based on the processid ? We have some hadoop jobs running for hundreds of hours. We need to know from which local machine it ...
Abhinay's user avatar
  • 103
1 vote
1 answer
14k views

localhost: Permission denied (publickey,password,keyboard-interactive)

I was trying to run Hadoop in Mac OS and I get the following errors, $ hstart WARNING: Attempting to start all Apache Hadoop daemons as chaklader in 10 seconds. WARNING: This is not a recommended ...
Arefe's user avatar
  • 233
1 vote
1 answer
1k views

Not able to exit from interactive mode for yarn top command

I have a bash script which is on serverA. This script will ssh to serverB and runs yarn top command, pulls the metrics and put into the file(test.txt) on serverA. Below is the command which I am using:...
MichealMills's user avatar
0 votes
1 answer
130 views

Can we mix MTU values in cluster

we have hadoop cluster ( all machines are linux redhat machines version 7.x ) on the VM machines we set MTU=8900 and all other machines we set MTU=9000 we set on VM MTU=8900 because we saw some ...
yael's user avatar
  • 13.7k
0 votes
0 answers
180 views

jumbo frame + fine tune

we have hadoop cluster ( machines are linux redhat 7 ) ,and we set jumbo frame according to documentation first we set MTU=9000 on all nodes in the cluster but we notice about some problems , so ...
yael's user avatar
  • 13.7k
1 vote
1 answer
1k views

how to find the right value of MTU Jumbo frame [closed]

We've made the decision to set jumbo frames on all our Linux machines. We have a hadoop cluster with master machines, workers machines and Kafka machines. Our switches (Cisco) are suitable to Jumbo ...
yael's user avatar
  • 13.7k
0 votes
1 answer
312 views

install hadoop_2_6_1_0_129-hdfs

tried to install Hadoop cluster App Timeline Server Install returned error: 2018-02-26 19:31:49,406 - Installing package hadoop_2_6_1_0_129-hdfs ('/usr/bin/yum -d 0 -e 0 -y install ...
Nikolay Baranenko's user avatar
0 votes
2 answers
4k views

systemd - define a service without ExecStop and be able to stop it without "fail"

I am with CentOS 7, and I want to start Kafka standalone producer (File Connector) as a service. The command is: /opt/kafka/bin/connect-standalone.sh /opt/kafka/config/connect-standalone.properties /...
WesternGun's user avatar
1 vote
0 answers
175 views

jps process issue about installing Hadoop

I'm trying to install hadoop with fully distributed mode in Centos6.4 (I use 4 Virtual boxes). server1 NameNode server2 SecondaryNameNode, Datanode server3 datanode server4 datanode Maybe I guess I'...
Seung's user avatar
  • 11
2 votes
0 answers
623 views

Hadoop cluster not listening on port that I configured. What is wrong?

I set up a Hadoop cluster with RHEL 7.4 servers. There is no firewall between them. I am running Hadoop 3.0. On the namenode the core-site.xml file is configured to use port 54310. I run this ...
Jermoe's user avatar
  • 111
2 votes
0 answers
1k views

The command "hdfs dfsadmin -report" fails because "failed to connect to server"

I am trying to configure a multi-node cluster of open source Hadoop. I have Hadoop 3.0 installed on the namenode and the data node. Both are running Linux (SUSE and Ubuntu). None are CentOS, RedHat ...
Jermoe's user avatar
  • 111
5 votes
0 answers
3k views

How do you get Hadoop commands to work when you get the error "Invalid HADOOP_COMMON_HOME"?

I had a Hadoop version 1.x installed on Linux SUSE 12.3. I moved the directory somewhere else to back it up. I tried to install Hadoop 3.0. I expect Hadoop commands to work based on what I did. I ...
Jermoe's user avatar
  • 111
0 votes
1 answer
246 views

Ambari and Spark cant start from CLI

From the Ambari GUI we can not start the Spark service. So we want to start it by command line as the following: [spark@mas01 spark2]$ ./sbin/start-thriftserver.sh --master yarn-client --executor-...
yael's user avatar
  • 13.7k
0 votes
2 answers
3k views

Passing inline arguments to shell script being executed on HDFS

I am running a shell script stored on HDFS (so that it can be recognized by my oozie workflow). to run this script I am using hadoop fs -cat script.sh |exec sh However I need to pass inline ...
user2211504's user avatar
1 vote
0 answers
279 views

How to connect Kerberos with multiple LDAP servers?

My actual task is to make our kerberized Hadoop cluster usable by all our teams. Right now we have a very weird setup in our company: The Hadoop cluster has a dedicated KDC (openSUSE Kerberos with ...
Mihail Gershkovich's user avatar
10 votes
5 answers
69k views

RPC: Port mapper failure - Unable to receive: errno 113 (No route to host)

I am trying to mount hdfs on my local machine(ubuntu) using nfs by following the below link:-- https://www.cloudera.com/documentation/enterprise/5-2-x/topics/cdh_ig_nfsv3_gateway_configure.html#...
Bhavya Jain's user avatar
3 votes
2 answers
13k views

mount.nfs: mount system call failed

I am trying to mount hdfs on my local machine running Ubuntu using the following command :--- sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/ But I am getting this ...
Bhavya Jain's user avatar
0 votes
1 answer
882 views

How to upload file into File View on Ambari Web?

I am using HDP 2.6 and ambari 2.5. on a 5 node cluster. The cluster was setup with vagrant following these instructions https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+for+New+VM+Users ...
yxs8495's user avatar
  • 21
1 vote
1 answer
3k views

Installing Oracle JDK 1.7 -- 404 error

I'm trying to install Hadoop 2.7.3. on Elementary OS (which ~ Ubuntu, I believe) following the instructions in the BUILDING.txt that came with the Hadoop files. The file indicates that I need to ...
patrick's user avatar
  • 1,022
-1 votes
2 answers
334 views

Linux Hadoop shell script giving .class error

I am trying to run this script on for running map reduce on hadoop. But when I run this script, it is giving me the error attached in the screen shot. Script: rm -rf /home/sk/Desktop/abc/wordcountc/ ...
Hasan Iqbal's user avatar
4 votes
2 answers
639 views

LD_LIBRARY_PATH lost when using mount command

TL;DR When a fuse filesystem is mounted via the mount command, the environment variables are not passed to the fuse script. Why? Context I am trying to mount hdfs (hadoop file system) via fuse. ...
Guillaume's user avatar
  • 211
0 votes
1 answer
113 views

How do I install an .rpm that fails with an error about an .so file not being found from Python 2.6?

When I try to install cloudera-manager-agent 5.7 from an .rpm I get an error. The error says that a dependency has not been met because yum could not find libpython2.6.so.1.0(64bit). I would expect ...
Kiran's user avatar
  • 321
-1 votes
2 answers
2k views

CentOS7 SSH can't connect localhost?

I'm configuring Hadoop environment. I have use $ ssh-keygen -t rsa -P "" to generate id_rsa.pub and id_rsa. And use cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys to set password-free login. ...
kevin4z's user avatar
  • 103
2 votes
1 answer
2k views

False error starting Hue

I have installed Hue in CentOS 7 from Cloudera CDH5 repository. Upon starting it reports an error: # systemctl status hue hue.service - SYSV: Hue web server Loaded: loaded (/etc/rc.d/init.d/hue) ...
Kombajn zbożowy's user avatar
1 vote
2 answers
4k views

Delete files 10 days older from hdfs

I am writing a ksh script to clean up hdfs directories and files at least 10 days old. I am testing the deletion command in a terminal, but it kept saying it is wrong: $ hdfs dfs -find "/file/path/...
Misha's user avatar
  • 13
0 votes
3 answers
3k views

How to split the date range into days using script

I have this input: startdate end date val1 val2 2015-10-13 07:00:02 2015-10-19 00:00:00 45 1900 in which one line specifies a date range that spans multiple ...
AAA's user avatar
  • 13
2 votes
0 answers
271 views

Which OS should I use for Hadoop cluster?

I have a client setting up a Hadoop cluster. We have all used and are very familiar with CentOS 7. I was told Scientific Linux maybe better optimized for Hadoop. Is there any truth to that?
Dovid Bender's user avatar