Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
0 answers
288 views

Fastest way to copy large data from HDFS location to GCP bucket using command

I have a 5TB of data which need to transfer to GCP bucket using some command. I tried using hadoop discp -m num -strategy dynamic source_path destination_path. It's still getting executed since long. ...
Chetan Mane's user avatar
0 votes
2 answers
890 views

distcp - copy data from cloudera hdfs to cloud storage

I am trying to replicate data between hdfs and my gcp cloud storage. This is not one time data copy. After first copy, I want copy only new files, updates files. and if files are deleted on on-prem it ...
Gaurang Shah's user avatar
  • 12.8k
2 votes
0 answers
281 views

Error in accessing google cloud storage bucket via hadoop fs -ls that runs on Cloudera Hadoop CDH 6.3.3 integrated with Kerberos/SSL/LDAP cluster

I am getting the below error while accessing a Google Cloud Storage bucket for the first time via Cloudera CDH 6.3.3 Hadoop Cluster. I am running the command on the edge node where Google Cloud SDK is ...
bobby's user avatar
  • 21
2 votes
1 answer
589 views

Hadoop distcp copy from on prem to gcp strange behavior

when I user distcp command as hadoop distcp /a/b/c/d gs:/gcp-bucket/a/b/c/ , where d is a folder on HDFS containing subfolders. If folder c is already there on gcp then it copies d ( and its ...
Vicky's user avatar
  • 1,318
0 votes
2 answers
1k views

DISTCP to GCS behind PROXY

I am trying to use distcp to copy some files from HDFS to Amazon gcs. My Hadoop cluster connects to the internet through an HTTP proxy, but I can't figure out how to specify this when connecting to ...
AGA ALIAS's user avatar