264 questions
0
votes
0
answers
25
views
Unable to Move Deleted Files to Trash via Hadoop Web Interface
I have encountered an issue with the Hadoop-3.3.6 Web interface regarding file deletion. By default, when I delete files through the Hadoop Web interface, they are permanently removed and do not go to ...
0
votes
0
answers
66
views
How to check if webhdfs is working properly, and debug it
I am having a Hadoop setup on my machine, and I want to use Python hdfs library to send a file.
On my localhost:9870 I am seeing a hadoop user interface, which works nicely.
But, when I go to ...
2
votes
0
answers
85
views
WebHdfs Api successfully CREATES file, but APPEND is apparent failure despite 200 response using HttpClient()
I have a WebHdfs API that successfully CREATES a file, but then when I try to APPEND a byte array of data to the same file I run into issues.
I have no issues creating a file, so I know it is not a ...
0
votes
1
answer
602
views
How I can upload a file into HDFS using WebHDFS REST API
I want to upload a file from local server to to HDFS via webHDFS REST API.
Based on the documentation, this operation take two steps:
Submit a HTTP PUT request, that return the location
...
0
votes
1
answer
237
views
WebHDFS REST API and Spring Boot
I have a Hadoop cluster and I want to manipulate data from a Spring Boot microservice: Create folders / Put Data/ Read Data/ Delete Data...
There is an API: https://hadoop.apache.org/docs/stable/...
1
vote
1
answer
615
views
Installing WebHDFS library in Docker failed, Error shows "krb5-config: Permission denied"
I'm trying to install apache-airflow-providers-apache-hdfs library in my Airflow-Docker 2.5.3.
I've installed all the necessary Kerberos' libs, and I got the following error:
#0 5.236 Requirement ...
0
votes
1
answer
697
views
Web interface login Apache Hadoop Cluster with Kerberos
I've a Docker stack with an Apache Hadoop (version 3.3.4) cluster, composed by one namenode and two datanodes, and a container with both Kerberos admin server and Kerberos kdc.
I'm trying to configure ...
1
vote
1
answer
2k
views
how to connect hdfs in airflow?
How to perform HDFS operation in Airflow?
make sure you install following python package
pip install apache-airflow-providers-apache-hdfs
#Code Snippet
#Import packages
from airflow import ...
0
votes
1
answer
423
views
Error in web ui hadoop related to webhdfs
I am using a single-node hadoop version release-3.3.1-RC3. In web ui hadoop under utilities -> browse the file system it is possible to view the contents of the file (beginning and end) directly in ...
0
votes
1
answer
124
views
Not able to access files in hadoop cluster
I was trying to read the file present in hadoop cluster through the following code. The default port used is 9000. (since at 50700, it is not getting connected)
//webhdfs-read-tests.js
// Include ...
0
votes
0
answers
575
views
Setup WebHDFS authentication
I have setup a WebHDFS server with a self signed SSL certificate for testing. Now I need some kind of authentication on it where the user has to pass some credentials in the WebHDFS rest call. I am ...
0
votes
0
answers
518
views
Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error
vijay@ubuntu:~$ start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as vijay in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C ...
0
votes
1
answer
650
views
How to setup webhdfs hadoop
I am trying to configure Hadoop with WebHDFS enabled, and then I also want to enable SSL on it. My hdfs-site.xml looks like this:
<configuration>
<property>
<name>dfs....
-1
votes
1
answer
153
views
How to get specific key/value from HDFS via HTTP or JAVA API?
How can I get the value of one or more keys in HDFS via HTTP or JAVA api from remote client? For example, the file below has a million keys and values. I just want to get the values of the 'phone' and ...
0
votes
1
answer
371
views
Writing to kerberosed hdfs using python | Max retries exceeded with url
I am trying to use python to write to secure hdfs using the following lib link
Authentication part:
def init_kinit():
kinit_args = ['/usr/bin/kinit', '-kt', '/tmp/xx.keytab',
'...
0
votes
0
answers
281
views
Do I need to do checksum verification of my file post upload to my hadoop cluster using webhdfs? How to compare local file and hadoop file checksum
Does webhdfs carry out checksum verification? When I upload a file to my remote hadoop cluster using webhdfs, does it carry out checksum verification of the file before upload and after upload to ...
-1
votes
1
answer
246
views
Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server Error on macOS Monterey
I have installed Hadoop and able to access localhost Hadoop interface. When I try to upload files the interface gives me the error "Failed to retrieve data from /webhdfs/v1/?op=LISTSTATUS: Server ...
-1
votes
1
answer
685
views
Erreur: HTTPConnectionPool(host='dnode2', port=9864): Max retries exceeded with url: /webhdfs
I'm trying to read a file on my hdfs server in my python app deployed with docker, during dev, I don't have any problem, but in prod there are this error :
Erreur: HTTPConnectionPool(host='dnode2', ...
0
votes
1
answer
178
views
Azure Data Factory HDFS dataset preview error
I'm trying to connect to the HDFS from the ADF. I created a folder and sample file (orc format) and put it in the newly created folder.
Then in ADF I created successfully linked service for HDFS using ...
0
votes
1
answer
99
views
In webhdfs, what is the difference between length and spaceConsumed?
Using webhdfs we can get the content summary of a directory/file.
However, the following properties are unclear for me:
"length":
{
"description": "The ...
0
votes
1
answer
259
views
Why does a datanode doesn´t disappear in the hadoop web site when the datanode job is killed?
I have a 3 node HA cluster in a CentOS 8 VM. I am using ZK 3.7.0 and Hadoop 3.3.1.
In my cluster I have 2 namenodes, node1 is the active namenode and node2 is the standby namenode in case that node1 ...
0
votes
0
answers
826
views
PUT Data on HDFS via HTTP WEB API
I'm trying to implement a PUT request on HDFS via the HDFS Web API.
So I looked up the Documentation on how to do that : https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
First do a PUT ...
0
votes
1
answer
682
views
webhdfs sensor-Airflow
I want to use sensors to check the arrival of files in hdfs.
I used hdfs sensor but I was not able to install snakebite as it required python2 and I'm running on python3.
As an alternative I am using ...
1
vote
1
answer
4k
views
Unable to upload file or create directory via Hadoop UI
I have installed hadoop-3.2.1 in Ubuntu 18.04 with Java-8. I am able to send files to HDFS using the hadoop fs -put command via terminal. But when I try to upload files or create a directory via UI, I ...
0
votes
2
answers
861
views
How to upload files to hdfs web page from terminal?
I just started hadoop and doing hdfs configuration. I have done all the steps but this last part of uploading the file is not working.
I used this to make my directory, it works
hadoop fs -mkdir /...
0
votes
0
answers
516
views
Is there a way to read a file from a Kerberized HDFS into a non kerberized spark cluster given the keytab file, principal and other details?
I need to read data from a Kerberized HDFS cluster using webHDFS in a non Kerberized Spark cluster. I have access to the Keytab file, username/principal, and can access any other details needed to log ...
4
votes
1
answer
724
views
High availability HDFS client python
In HDFSCLI docs it says that it can be configured to connect to multiple hosts by adding urls separated with semicolon ; (https://hdfscli.readthedocs.io/en/latest/quickstart.html#configuration).
I use ...
0
votes
1
answer
135
views
How return the list of file form HDFS using the HDFS API
I created a java function to open a file in HDFS. The function is used only the API HDFS. I do not use any Hadoop dependencies in my code.
My function worked well:
public static openFile()
{
...
1
vote
1
answer
285
views
How Upload file from EFS (WinSCP) to WebHDFS (Hue/Cloudera) in PowerShell?
I've been trying to break down that problem in two parts in order to automate that:
PowerShell: Transfer file from local Desktop to EFS (via WinSCP) - OK
PowerShell: Get that same file on EFS (via ...
1
vote
0
answers
729
views
How to use hdfscli python library?
I have following use case,
I wanted to connect a remote hadoop cluster. So, I got all the hadoop conf files (coresite.xml, hdfs-site.xml and others) and stored it in one directory in local file system....
0
votes
1
answer
133
views
Connect to WebHDFS using powershell: How to set the Different Credentials
I am rtying to connect te WebHDFS by powershell and have been retrieving some errors. I think the 401 error is because of the Credentials.
The code I've been using is:
Invoke-RestMethod -...
0
votes
0
answers
313
views
Can't access WebHDFS using Big Data Europe with docker-compose
I can't access WebHDFS via Curl or using Python HDFS when using the Big Data Europe 2020 Hadoop Cluster via docker-compose (https://github.com/big-data-europe/docker-hadoop/). For instance, the ...
1
vote
0
answers
174
views
HttpClient behavior different between .net core 3.1 and .net 5
The below code retrieves a JSON document from a WebHDFS instance using Kerberos authentication:
HttpClientHandler clientHandler = new()
{
Credentials = CredentialCache.DefaultNetworkCredentials,
...
0
votes
1
answer
562
views
Can WebHDFS UI delete functionality be disabled?
Starting from HDP 3.0, the WebHDFS UI (i.e. the namenode UI file explorer on port 50070) now includes a bin icon that can be used to delete HDFS files. It seems to do this by calling a rest api DELETE ...
0
votes
0
answers
1k
views
How can i get the schema details(table structure) from parquet file using hdfs API
I have a parquet file located in hdfs system. I am using webhdfs API to read file but not getting the schema details in proper format.
Any help would be appriciated?
1
vote
1
answer
981
views
How to connect and access Azure Datalake Gen1 storage using Azure Ad username and password only - c#
I want to connect and access Azure Datalake Gen1 storage using Azure Ad username and password only.
I have a service account that has access to the Azure Datalake Gen1 storage. I am able to connect ...
1
vote
0
answers
416
views
Airflow conn_id with multiple server
I am using WebHDFSSensor and for that we need to provide namenode. However, active namenode and standBy namenode change. I can't just provide current namenode host to webhdfs_conn_id. I have to create ...
2
votes
2
answers
2k
views
Hadoop Can't access datanode without using the IP
I have the following system:
Windows host
Linux guest with Docker (in Virtual Box)
I have installed HDFS in Docker (Ubuntu, Virtual Box). I have used the bde2020 hadoop image from Docker Hub. This ...
0
votes
1
answer
396
views
How can I get passed Connection error in pywebhfds?
I have a locally single-node hosted hadoop. my name and datanode are same.
I'm trying to create a file using python library.
self.hdfs = PyWebHdfsClient(host='192.168.231.130', port='9870', user_name='...
1
vote
2
answers
3k
views
How to read parquet files from remote HDFS in python using Dask/ pyarrow
Please help me with reading parquet files from remote HDFS i.e.; setup on Linux server using Dask or pyarrow in python?
Also suggest me if there are better ways to do the same other than the above two ...
0
votes
1
answer
295
views
create file with webHdfs
I would like to create a file to hdfs with webhdfs, I wrote the function below
public ResponseEntity createFile(MultipartFile f) throws URISyntaxException {
URI uriPut = new URI(
...
1
vote
1
answer
1k
views
WebHDFS FileNotFoundException rest api
I am posting this question as a continuation of post webhdfs rest api throwing file not found exception
I have an image file I would like to OPEN through the WebHDFS rest api.
the file exists in hdfs ...
0
votes
1
answer
1k
views
webHDFS curl --negotiate on Windows
Following command works on Linux but fails on Windows. Before I run the Command I use kinit to get a valid Kerberos Ticket.
curl -v -i --negotiate -u : -b ~/cookiejar.txt -c ~/cookiejar.txt "http:...
1
vote
1
answer
533
views
Unable Connecting Power BI to Hadoop HDFS failed to get contents
When I'm trying to connect Power BI to Hadoop webhdfs, i get this error
DataSource.Error: HDFS failed to get contents from 'http://xxx.xx.x.x:50070/webhdfs/v1/myFolder/20200626150740_PERSONAL_IDS'. ...
0
votes
1
answer
131
views
SQL Server BDC Pools and Performance
Against an AKS based SQL Server 2019 BDC, I loaded the Flight_delay dataset that is available at www.kaggle.com. I wanted to test the performance of the various data stores, ie, master instance, data ...
1
vote
1
answer
1k
views
logstash to webhdfs Failed to APPEND_FILE /user/
I try to ingest csv file vrom filebeat into hdfs by logstash.
Filebeat successfully transferred it to logstash because im using stdout{codec=>rubydebug} and i can see the them being parsed.Seems ...
0
votes
0
answers
271
views
ORC file read from WebHDFS Rest API
I have a task of reading orc file from my java program , I am able to read successfully if the orc file is in my local machine using below code, where below targetFilePath is the file path and name of ...
0
votes
0
answers
326
views
download large file from Jetty (ambari webhdfs) is slow
I have a file about 5G, download from hdfs using python client at 12M/s, buy my network could reach 500M/s, and smaller file work fine. Then I reproduced this problem with curl.
Here is curl debug ...
0
votes
1
answer
466
views
Hadoop got Expected JSON. Is WebHDFS enabled? Got ''
I have serveral csv file in hadoop already, when I try
hdfs = pyhdfs.HdfsClient(hosts='34.71.193.160:8123', user_name='root')
files_name = hdfs.listdir('/user/input/')
Got this error message, can'...
0
votes
1
answer
3k
views
Hadoop: Failed to connect to HDFS(Hadoop) using python
I am trying to connect to HDFS which is in VM with Ubuntu by using python jupyter tool from windows10. Can anybody help me with the below connection error am getting. Thank you.
Package used:
...