Newest 'hadoop' Questions

0 votes

0 answers

10 views

Apache Spark: java.lang.NoClassDefFoundError for software.amazon.awssdk.transfer.s3.progress.TransferListener when reading CSV from S3

I am trying to read a CSV file from S3 using Apache Spark, but I encounter the following error: java.lang.NoClassDefFoundError: software/amazon/awssdk/transfer/s3/progress/TransferListener at java....

Krishna Basutkar

1

asked 11 hours ago

0 votes

0 answers

14 views

Error: `callbackHandler` may not be null when connecting to HDFS using Kerberos in Jakarta EE

I am trying to connect to HDFS using Kerberos authentication in a JakartaEE application. The connection code appears to be set up correctly, but I am encountering the following error when attempting ...

ilham hitam

1

asked 19 hours ago

0 votes

1 answer

12 views

Flink-SQL dependencies : How to find in Marven Repo

I'm a beginner for apache platforms as well as flink. Im trying to query below Flink-SQL code. I have 2 questions I need to find the connector "filesystem" (in maven repo or elsewhere) Is ...

Tharindu Wijethunga

1

asked yesterday

1 vote

1 answer

52 views

Can't configure Hive Metastore client jars in Spark 3.5.1

I need to configure my Spark 3.5.1 application so it uses specific version of Hive Metastore client. I read in the documentation that I can use: spark.sql.hive.metastore.jars spark.sql.hive.metastore....

mox601

432

asked Dec 9 at 9:11

0 votes

0 answers

26 views

Spark - Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration

I am testing a spark application locally, when it writes to GCS bucket I get the below error. Error: java.lang.reflect.InvocationTargetException java.lang.RuntimeException: java.lang.reflect....

saravana ir

183

asked Dec 3 at 9:09

1 vote

1 answer

35 views

Spark and HDFS cluster running on docker

I'm trying to set up a Spark application running on my local machine to connect to an HDFS cluster where the NameNode is running inside a Docker container. Here are the relevant details of my setup: ...

Paul Hartmann

11

asked Dec 2 at 17:10

0 votes

0 answers

12 views

How to handle duplicate key outputs in the Mapper phase for HDFS PageRank implementation?

I was writing the PageRank code to run on HDFS, so I wrote the Mapper and Reducer. The data I have is in the following format: page 'outgoing_links,' such as: Page_1 Page_18,Page_109,Page_696,...

Khaled Saleh

1

asked Dec 2 at 8:01

0 votes

0 answers

49 views

Hadoop : Exception in thread "main" java.lang.UnsupportedOperationException: 'posix:permissions' not supported as initial attribute

I am using this command for word count in Command Prompt in Windows hadoop jar "C:\hadoop\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.4.0.jar" wordcount /newdir/HadoopSmall.txt /...

Sudhanshu kumar

15

asked Nov 23 at 5:18

-1 votes

1 answer

26 views

Is the Hadoop documentation wrong for set

The documentation of the Hadoop Job API gives as example: From https://hadoop.apache.org/docs/r3.3.5/api/org/apache/hadoop/mapreduce/Job.html Here is an example on how to submit a job: // ...

user1551605

173

asked Nov 22 at 10:51

0 votes

0 answers

19 views

About flink application request numbers of slot when startup and runtime are difference question?

I found that when flink application start，the number of slots request is SUM(Maximum parallelism of each task), but when the application is running, the number of slots request is JobManager(1) + ...

rock ju

13

asked Nov 21 at 4:46

1 vote

1 answer

70 views

how to set "api-version" dynamically in fs.azure.account.oauth2.msi.endpoint

Currently I'm using hadoop-azure-3.4.1 via pyspark library to connect to ABFS. According to the documentation - https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity - ...

Sergei Varaksin

13

asked Nov 20 at 3:18

0 votes

0 answers

16 views

Hadoop INIT_FAILD: Data is not Replicated on DataNode

I am using Hadoop 3.2.4 as standalone service on Windows 11 23H2 OS, I am trying to ingest the data from Apache Nifi into the hadoop hdfs two-node cluster. NameNode (Also behaving as Datanode1) - ...

Filbadeha

401

asked Nov 18 at 7:45

0 votes

0 answers

10 views

How to solve Unrecognized VM option 'UseConcMarkSweepGC' in tryuing to install Apache Hbase [duplicate]

When i try to run ./start-hbase.cmd i get the following error: PS C:\hbasesetup\hbase-2.6.1\bin> ./start-hbase.cmd Unrecognized VM option 'UseConcMarkSweepGC' Error: Could not create the Java ...

Giuliano Lorenzo

1

asked Nov 15 at 14:57

-1 votes

0 answers

41 views

java.lang.NoClassDefFoundError: org.apache.hadoop.security.SecurityUtil (initialization failure)

I have a spark structured streaming application, with hadoop dependencies included. To support java 17 I have added below jvm args in build.gradle test { jvmArgs += [ '--add-exports=...

Ams

19

asked Nov 14 at 5:12

0 votes

1 answer

18 views

how building hadoop source code on macos m2 chip

I am trying to build hadoop source code on macOS(m2 chip) the system version macOS Ventura. The problem: ld: warning: ignoring file '/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/lib/...

mark

2,507

asked Nov 13 at 11:24

1 vote

1 answer

98 views

MERGE INTO TABLE is not supported temporarily. while trying to merge data into an iceberg table in hadoop hive environment

I need to perform a merge operation into my iceberg table. I am using a jupyter notebook on an Aws emr setup. Spark: '3.4.1-amzn-2' Hadoop: 3.3.6 Hive: 3.1.3 EMR Version: 6.15.0 Scala: 'version 2.12....

shady xv

23

asked Nov 12 at 5:16

0 votes

1 answer

42 views

Java action in Apache Oozie workflow

I am trying to configure an Apache Oozie workflow to execute different actions depending on the day of the week. After reading https://stackoverflow.com/questions/71422257/oozie-coordinator-get-day-of-...

Lorenzo Panebianco

1

asked Nov 12 at 3:47

1 vote

0 answers

19 views

Dataproc Hive Job - OutOfMemoryError: Java heap space

I have a dataproc cluster, we are running INSERT OVERWRITE QUERY through HIVE CLI which fails with OutOfMemoryError: Java heap space. We adjusted memory configurations for reducers and Tez tasks, ...

Parmeet Singh

11

asked Nov 11 at 19:56

0 votes

0 answers

38 views

java.net.ConnectException: Connection refused when trying to connect to HDFS from Spark on EMR to create an ICEBERG table

I am new to spark and I'm working with a Spark job on an AWS EMR cluster using jupyter notebook. I'm trying to interact with HDFS. Specifically, I am trying to create an Apache Iceberg table. However, ...

shady xv

23

asked Nov 7 at 8:03

0 votes

1 answer

91 views

Missing PutHDFS Processor in Apache NiFi 2.0.0

I'm using Apache NiFi 2.0.0, which unfortunately does not include the PutHDFS processor. My project requires this version of NiFi due to its integration capabilities with Python scripting, so ...

Filbadeha

401

asked Nov 6 at 5:58

0 votes

0 answers

42 views

Apache Nifi: Puthdfs Processor -replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded

I am using Apache NIFI 1.28 version, i am trying to create a minimalistic data flow where i am generate the data and want to ingest in HDFS in `HDP (Hortonworks Data Platform) 2.5.0 , i am getting the ...

Filbadeha

401

asked Oct 31 at 10:48

0 votes

1 answer

56 views

Apache Nifi: PutHDFS Processor issue - PutHDFS Failed to write to HDFS java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configurable

I am using Apache NIFI 1.28 version, i am trying to create a minimalistic data flow where i am generate the data and want to ingest in HDFS in `HDP (Hortonworks Data Platform) 2.5.0 , i am getting the ...

Filbadeha

401

asked Oct 31 at 5:13

0 votes

1 answer

32 views

Interact with HBase running in Docker through Java code

I am quite new to Hbase and Java. I managed to run an HBase image in docker, I can interact with using the hbase shell smoothly. And I also have access to the UI for monitoring HBase. However, when I ...

Romee Zhou

41

asked Oct 31 at 4:31

0 votes

0 answers

49 views

PutHDFS Processor Failed to write in HDP2.5.0 HDFS

I am using Apache Nifi 1.28, i have configured the apache nifi in Windows system, My HDP2.5.0 is running on the VM, I want to ingest data from my local file system to HDFS service running in ...

Filbadeha

401

asked Oct 30 at 6:54

0 votes

0 answers

18 views

Update statement in hadoop ecosystem using hive (WARNING: An illegal reflective access operation has occurred)

I am having trouble with running a simple update statement. The query i am excecuting is the follow NSERT INTO data_source_metadata VALUES ( 'source1', 'User ...

Artemis

23

asked Oct 29 at 14:40

0 votes

0 answers

29 views

How to properly (or at least somehow) use org.apache.hadoop.fs.MultipartUploader?

I have Hadoop in my environment and use it as S3. My current task is to implement a logic for uploading large file (say, >1Gb) to Hadoop with no buffering, so the data should be streamed into it. ...

Michael

41

asked Oct 18 at 16:22

0 votes

0 answers

52 views

Failing to repartition data with PySpark on HDFS

I have over 7k parquet files in an HDFS dataset, totaling a little under 74 GB. One problem is that the file sizes are quite variable, ranging from 12 KB to 622 MB. So, I'd like to repartition the ...

CopyOfA

853

asked Oct 17 at 16:27

0 votes

2 answers

186 views

java.lang.UnsupportedOperationException: 'posix:permissions'

I am trying to to wordcount opeartion using hadoop. Hadoop is configured and I can see datanode, namenode, resourcemanager, and nodemanager running. I am using hadoop version 3.4.0 and Java version 8. ...

Zumrud Isgandarli

23

asked Oct 16 at 8:21

0 votes

0 answers

37 views

How to connect airflow continer with hdfs container

Currently I'm using docker for HDFS and airflow both have share the network but I can't use HDFS commands inside the airflow container I've tried to install apache-airflow-providers-apache-hdfs but I ...

Faa

1

asked Oct 13 at 3:22

0 votes

0 answers

22 views

how does the CombineHiveInputFormat split the files(ORC/TextFile)

set maxsize of splits: set mapreduce.input.fileinputformat.split.maxsize=209715200; --200MB（256MB default） set mapreduce.input.fileinputformat.split.minsize=0; run the hive sql on a table STORED AS ...

user27636532

1

asked Oct 5 at 2:15

0 votes

1 answer

92 views

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z unresolved

I have looked through answers for similar issues and none have resolved the issue I'm having. Some hadoop commands seem to work (example hadoop fs -cat) while others not (hadoop fs -ls, which threw ...

Colin Hicks

360

asked Oct 3 at 6:03

1 vote

1 answer

74 views

Reading parquet takes too long in parquet-java

I am using parquet-hadoop to read a Snappy-compressed parquet file. However, I discovered that the reading time is quadratic to the file size, and it is unacceptably long. The following is the code I ...

Desk Reference

69

asked Oct 3 at 1:48

0 votes

0 answers

74 views

Pyspark, Hadoop, and S3: java.lang.NoSuchMethodError: org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator

I've been facing compatibility issues related to getting delta-spark to work straight out of the box with S3 and wanted to get some advice. I've tried dozens of combinations of versions between Spark, ...

Hantoa Tenwhij

11

asked Oct 2 at 19:46

0 votes

1 answer

42 views

Failed to transform org.apache.hadoop.fs.FileSystem$Cache. org.apache.hadoop.fs.FileSystem$Cache$Key class is frozen

I am trying to mock the hadoop filesystem in my scala test. Any Idea how to go around this please: import java.net.URI import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs....

rahman

4,928

asked Sep 27 at 10:59

0 votes

0 answers

25 views

Unable to Move Deleted Files to Trash via Hadoop Web Interface

I have encountered an issue with the Hadoop-3.3.6 Web interface regarding file deletion. By default, when I delete files through the Hadoop Web interface, they are permanently removed and do not go to ...

leizhuokin

1

asked Sep 26 at 10:49

-3 votes

1 answer

34 views

.gz files are unsplittable. But if I place them in HDFS, they create multiple blocks depending on block size

We all know .gz is non-splittable, that means only single core can read it. This means, when I place a huge .gz file on HDFS, it should actually be present as a single block. I see it is getting split ...

Praveen Kumar B N

109

asked Sep 26 at 1:55

0 votes

1 answer

39 views

hdfs dfs -mkdir no such file or directory

I'm new to hadoop and I'm trying to create a directory in hdfs called input_dir. I have set up my vm, installed and started hadoop successfully. This is the command that I run: hdfs dfs -mkdir ...

Bloom

1

asked Sep 24 at 18:47

1 vote

1 answer

72 views

Unable to connect to GCS Bucket using hadoop-connector over WIF

Getting bellow error while connecting to GCS with Hadoop Error reading credentials from stream, 'type' value 'external_account' not recognized. Expecting 'authorized_user' or 'service_account' I am ...

diptam

63

asked Sep 24 at 10:23

4 votes

2 answers

6k views

Hadoop Installation, Error: getSubject is supported only if a security manager is allowed

I tried to install Hadoop on my macOS Ventura but has been failed several times. I tried download the lower versions of hadoop as well but no luck so far. Tried Hadoop Versions: 3.4.0 and 3.3.6 Java ...

Suman Bhattarai

141

asked Sep 23 at 20:23

0 votes

1 answer

80 views

Hive Always Fails at Mapreduce

I just installed hadoop 3.3.6 and hive 4.0.0 with mysql as metastore. when running create table or select * from... it runs well. But when I try to do insert or select join, hive always fails. I'm ...

Dzaki Wicaksono

1

asked Sep 23 at 12:09

0 votes

0 answers

36 views

i set up a port forwarding on port 50070 to access the hadoop master node

$ ssh -i C:/Users/amyousufi/Desktop/private-key -L 50070:localhost:50070 [email protected] after set up my port, i recieve the following error: bind [127.0.0.1]:50070: Permission denied ...

Aminullah Yousufi

1

asked Sep 12 at 13:02

0 votes

0 answers

28 views

acl for yarn capacity scheduler is not working

I have a cluster hadoop (1 master 1 slave) and I divided the resources into 2 queue: a and b then i use Acls to grant permissions to user1 can submit queue a, user2 can use queue b. I try run [user2@...

Đắc An Nguyễn

11

asked Sep 12 at 8:50

1 vote

0 answers

25 views

tracking URL disappears when Spark3.5.1 runs on Hadoop3

spark2.4, spark3.3.0, spark3.4.3 or any version of spark below 3.5, their tracking URL show ok for the same community version apache hadoop-yarn-3.2. But spark 3.5.1 directly just show the app id in ...

AppleCEO

73

asked Sep 12 at 3:34

0 votes

1 answer

43 views

Hql query through using YARN APPLICATION ID

So I want to know if I can get the HQL query or the SQL query using the applicationId of a hive query that is running on YARN. I tried using yarn logs applicationid But it's showing the entire ...

blackishgray

13

asked Sep 10 at 22:58

1 vote

1 answer

289 views

How to set up a connection between Spark code and Spark container using Docker?

I am working with a Docker setup for Hadoop and Spark using the following repository: docker-hadoop-spark. My Docker Compose YAML configuration is working correctly, and I am able to run the ...

Sadegh

412

asked Sep 9 at 11:17

0 votes

0 answers

44 views

Custom NEWLINE symbol in Greenplum

I want to create extern table in Greenplum like this: CREATE READABLE EXTERNAL TABLE "table" ("id" INTEGER, "name" TEXT) LOCATION ('<file location>') FORMAT 'TEXT' (...

Vadim Myakish

1

asked Sep 9 at 9:57

0 votes

0 answers

26 views

Issues with ImportTsv - Job Fails with Exit Code 1

I am currently facing an issue while using the ImportTsv command to load data into an HBase table. I am running the command and the job is failing with a non-zero exit code 1. Hadoop version : 3.4.0 ...

Amandi Ekanayake

1

asked Sep 9 at 5:30

0 votes

0 answers

92 views

Py4JJavaError: An error occurred while calling o117.showString

I am encountering a java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ error when trying to run a PySpark query to show tables from Hive. My environment ...

Gutivan Alief Syahputra

1

asked Sep 5 at 5:03

0 votes

0 answers

8 views

how many calls per handler are allowed in the queue

Maybe someone can tell me how to calculate the size of ipc.server.handler.queue.size core-site.xml From the description of the property: Specifies how many calls for each handler are allowed in the ...

Alatau

41

asked Sep 4 at 13:33

0 votes

0 answers

13 views

Solr - Hadoop theory of data writing

I'm wondering how the data writing actually works. In a Solr cluster that saves the indexes onto hadoop datanodes, if I instruct a shard splitting how is the data managed? let's say that I split a ...

Roberto D. Maggi

43

asked Sep 2 at 13:39

Collectives™ on Stack Overflow

Related Tags