Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
10 views

Apache Spark: java.lang.NoClassDefFoundError for software.amazon.awssdk.transfer.s3.progress.TransferListener when reading CSV from S3

I am trying to read a CSV file from S3 using Apache Spark, but I encounter the following error: java.lang.NoClassDefFoundError: software/amazon/awssdk/transfer/s3/progress/TransferListener at java....
Krishna Basutkar's user avatar
0 votes
0 answers
14 views

Error: `callbackHandler` may not be null when connecting to HDFS using Kerberos in Jakarta EE

I am trying to connect to HDFS using Kerberos authentication in a JakartaEE application. The connection code appears to be set up correctly, but I am encountering the following error when attempting ...
ilham hitam's user avatar
0 votes
1 answer
12 views

Flink-SQL dependencies : How to find in Marven Repo

I'm a beginner for apache platforms as well as flink. Im trying to query below Flink-SQL code. I have 2 questions I need to find the connector "filesystem" (in maven repo or elsewhere) Is ...
Tharindu Wijethunga's user avatar
1 vote
1 answer
52 views

Can't configure Hive Metastore client jars in Spark 3.5.1

I need to configure my Spark 3.5.1 application so it uses specific version of Hive Metastore client. I read in the documentation that I can use: spark.sql.hive.metastore.jars spark.sql.hive.metastore....
mox601's user avatar
  • 432
0 votes
0 answers
26 views

Spark - Could not initialize class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemConfiguration

I am testing a spark application locally, when it writes to GCS bucket I get the below error. Error: java.lang.reflect.InvocationTargetException java.lang.RuntimeException: java.lang.reflect....
saravana ir's user avatar
1 vote
1 answer
35 views

Spark and HDFS cluster running on docker

I'm trying to set up a Spark application running on my local machine to connect to an HDFS cluster where the NameNode is running inside a Docker container. Here are the relevant details of my setup: ...
Paul Hartmann's user avatar
0 votes
0 answers
12 views

How to handle duplicate key outputs in the Mapper phase for HDFS PageRank implementation?

I was writing the PageRank code to run on HDFS, so I wrote the Mapper and Reducer. The data I have is in the following format: page 'outgoing_links,' such as: Page_1 Page_18,Page_109,Page_696,...
Khaled Saleh's user avatar
0 votes
0 answers
49 views

Hadoop : Exception in thread "main" java.lang.UnsupportedOperationException: 'posix:permissions' not supported as initial attribute

I am using this command for word count in Command Prompt in Windows hadoop jar "C:\hadoop\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.4.0.jar" wordcount /newdir/HadoopSmall.txt /...
Sudhanshu kumar's user avatar
-1 votes
1 answer
26 views

Is the Hadoop documentation wrong for set

The documentation of the Hadoop Job API gives as example: From https://hadoop.apache.org/docs/r3.3.5/api/org/apache/hadoop/mapreduce/Job.html Here is an example on how to submit a job: // ...
user1551605's user avatar
0 votes
0 answers
19 views

About flink application request numbers of slot when startup and runtime are difference question?

I found that when flink application start,the number of slots request is SUM(Maximum parallelism of each task), but when the application is running, the number of slots request is JobManager(1) + ...
rock ju's user avatar
  • 13
1 vote
1 answer
70 views

how to set "api-version" dynamically in fs.azure.account.oauth2.msi.endpoint

Currently I'm using hadoop-azure-3.4.1 via pyspark library to connect to ABFS. According to the documentation - https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity - ...
Sergei Varaksin's user avatar
0 votes
0 answers
16 views

Hadoop INIT_FAILD: Data is not Replicated on DataNode

I am using Hadoop 3.2.4 as standalone service on Windows 11 23H2 OS, I am trying to ingest the data from Apache Nifi into the hadoop hdfs two-node cluster. NameNode (Also behaving as Datanode1) - ...
Filbadeha's user avatar
  • 401
0 votes
0 answers
10 views

How to solve Unrecognized VM option 'UseConcMarkSweepGC' in tryuing to install Apache Hbase [duplicate]

When i try to run ./start-hbase.cmd i get the following error: PS C:\hbasesetup\hbase-2.6.1\bin> ./start-hbase.cmd Unrecognized VM option 'UseConcMarkSweepGC' Error: Could not create the Java ...
Giuliano Lorenzo's user avatar
-1 votes
0 answers
41 views

java.lang.NoClassDefFoundError: org.apache.hadoop.security.SecurityUtil (initialization failure)

I have a spark structured streaming application, with hadoop dependencies included. To support java 17 I have added below jvm args in build.gradle test { jvmArgs += [ '--add-exports=...
Ams's user avatar
  • 19
0 votes
1 answer
18 views

how building hadoop source code on macos m2 chip

I am trying to build hadoop source code on macOS(m2 chip) the system version macOS Ventura. The problem: ld: warning: ignoring file '/Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre/lib/...
mark's user avatar
  • 2,507
1 vote
1 answer
98 views

MERGE INTO TABLE is not supported temporarily. while trying to merge data into an iceberg table in hadoop hive environment

I need to perform a merge operation into my iceberg table. I am using a jupyter notebook on an Aws emr setup. Spark: '3.4.1-amzn-2' Hadoop: 3.3.6 Hive: 3.1.3 EMR Version: 6.15.0 Scala: 'version 2.12....
shady xv's user avatar
0 votes
1 answer
42 views

Java action in Apache Oozie workflow

I am trying to configure an Apache Oozie workflow to execute different actions depending on the day of the week. After reading https://stackoverflow.com/questions/71422257/oozie-coordinator-get-day-of-...
Lorenzo Panebianco's user avatar
1 vote
0 answers
19 views

Dataproc Hive Job - OutOfMemoryError: Java heap space

I have a dataproc cluster, we are running INSERT OVERWRITE QUERY through HIVE CLI which fails with OutOfMemoryError: Java heap space. We adjusted memory configurations for reducers and Tez tasks, ...
Parmeet Singh's user avatar
0 votes
0 answers
38 views

java.net.ConnectException: Connection refused when trying to connect to HDFS from Spark on EMR to create an ICEBERG table

I am new to spark and I'm working with a Spark job on an AWS EMR cluster using jupyter notebook. I'm trying to interact with HDFS. Specifically, I am trying to create an Apache Iceberg table. However, ...
shady xv's user avatar
0 votes
1 answer
91 views

Missing PutHDFS Processor in Apache NiFi 2.0.0

I'm using Apache NiFi 2.0.0, which unfortunately does not include the PutHDFS processor. My project requires this version of NiFi due to its integration capabilities with Python scripting, so ...
Filbadeha's user avatar
  • 401
0 votes
0 answers
42 views

Apache Nifi: Puthdfs Processor -replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded

I am using Apache NIFI 1.28 version, i am trying to create a minimalistic data flow where i am generate the data and want to ingest in HDFS in `HDP (Hortonworks Data Platform) 2.5.0 , i am getting the ...
Filbadeha's user avatar
  • 401
0 votes
1 answer
56 views

Apache Nifi: PutHDFS Processor issue - PutHDFS Failed to write to HDFS java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configurable

I am using Apache NIFI 1.28 version, i am trying to create a minimalistic data flow where i am generate the data and want to ingest in HDFS in `HDP (Hortonworks Data Platform) 2.5.0 , i am getting the ...
Filbadeha's user avatar
  • 401
0 votes
1 answer
32 views

Interact with HBase running in Docker through Java code

I am quite new to Hbase and Java. I managed to run an HBase image in docker, I can interact with using the hbase shell smoothly. And I also have access to the UI for monitoring HBase. However, when I ...
Romee Zhou's user avatar
0 votes
0 answers
49 views

PutHDFS Processor Failed to write in HDP2.5.0 HDFS

I am using Apache Nifi 1.28, i have configured the apache nifi in Windows system, My HDP2.5.0 is running on the VM, I want to ingest data from my local file system to HDFS service running in ...
Filbadeha's user avatar
  • 401
0 votes
0 answers
18 views

Update statement in hadoop ecosystem using hive (WARNING: An illegal reflective access operation has occurred)

I am having trouble with running a simple update statement. The query i am excecuting is the follow NSERT INTO data_source_metadata VALUES ( 'source1', 'User ...
Artemis's user avatar
  • 23
0 votes
0 answers
29 views

How to properly (or at least somehow) use org.apache.hadoop.fs.MultipartUploader?

I have Hadoop in my environment and use it as S3. My current task is to implement a logic for uploading large file (say, >1Gb) to Hadoop with no buffering, so the data should be streamed into it. ...
Michael's user avatar
  • 41
0 votes
0 answers
52 views

Failing to repartition data with PySpark on HDFS

I have over 7k parquet files in an HDFS dataset, totaling a little under 74 GB. One problem is that the file sizes are quite variable, ranging from 12 KB to 622 MB. So, I'd like to repartition the ...
CopyOfA's user avatar
  • 853
0 votes
2 answers
186 views

java.lang.UnsupportedOperationException: 'posix:permissions'

I am trying to to wordcount opeartion using hadoop. Hadoop is configured and I can see datanode, namenode, resourcemanager, and nodemanager running. I am using hadoop version 3.4.0 and Java version 8. ...
Zumrud Isgandarli's user avatar
0 votes
0 answers
37 views

How to connect airflow continer with hdfs container

Currently I'm using docker for HDFS and airflow both have share the network but I can't use HDFS commands inside the airflow container I've tried to install apache-airflow-providers-apache-hdfs but I ...
Faa's user avatar
  • 1
0 votes
0 answers
22 views

how does the CombineHiveInputFormat split the files(ORC/TextFile)

set maxsize of splits: set mapreduce.input.fileinputformat.split.maxsize=209715200; --200MB(256MB default) set mapreduce.input.fileinputformat.split.minsize=0; run the hive sql on a table STORED AS ...
user27636532's user avatar
0 votes
1 answer
92 views

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z unresolved

I have looked through answers for similar issues and none have resolved the issue I'm having. Some hadoop commands seem to work (example hadoop fs -cat) while others not (hadoop fs -ls, which threw ...
Colin Hicks's user avatar
1 vote
1 answer
74 views

Reading parquet takes too long in parquet-java

I am using parquet-hadoop to read a Snappy-compressed parquet file. However, I discovered that the reading time is quadratic to the file size, and it is unacceptably long. The following is the code I ...
Desk Reference's user avatar
0 votes
0 answers
74 views

Pyspark, Hadoop, and S3: java.lang.NoSuchMethodError: org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator

I've been facing compatibility issues related to getting delta-spark to work straight out of the box with S3 and wanted to get some advice. I've tried dozens of combinations of versions between Spark, ...
Hantoa Tenwhij's user avatar
0 votes
1 answer
42 views

Failed to transform org.apache.hadoop.fs.FileSystem$Cache. org.apache.hadoop.fs.FileSystem$Cache$Key class is frozen

I am trying to mock the hadoop filesystem in my scala test. Any Idea how to go around this please: import java.net.URI import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs....
rahman's user avatar
  • 4,928
0 votes
0 answers
25 views

Unable to Move Deleted Files to Trash via Hadoop Web Interface

I have encountered an issue with the Hadoop-3.3.6 Web interface regarding file deletion. By default, when I delete files through the Hadoop Web interface, they are permanently removed and do not go to ...
leizhuokin's user avatar
-3 votes
1 answer
34 views

.gz files are unsplittable. But if I place them in HDFS, they create multiple blocks depending on block size

We all know .gz is non-splittable, that means only single core can read it. This means, when I place a huge .gz file on HDFS, it should actually be present as a single block. I see it is getting split ...
Praveen Kumar B N's user avatar
0 votes
1 answer
39 views

hdfs dfs -mkdir no such file or directory

I'm new to hadoop and I'm trying to create a directory in hdfs called input_dir. I have set up my vm, installed and started hadoop successfully. This is the command that I run: hdfs dfs -mkdir ...
Bloom's user avatar
  • 1
1 vote
1 answer
72 views

Unable to connect to GCS Bucket using hadoop-connector over WIF

Getting bellow error while connecting to GCS with Hadoop Error reading credentials from stream, 'type' value 'external_account' not recognized. Expecting 'authorized_user' or 'service_account' I am ...
diptam's user avatar
  • 63
4 votes
2 answers
6k views

Hadoop Installation, Error: getSubject is supported only if a security manager is allowed

I tried to install Hadoop on my macOS Ventura but has been failed several times. I tried download the lower versions of hadoop as well but no luck so far. Tried Hadoop Versions: 3.4.0 and 3.3.6 Java ...
Suman Bhattarai's user avatar
0 votes
1 answer
80 views

Hive Always Fails at Mapreduce

I just installed hadoop 3.3.6 and hive 4.0.0 with mysql as metastore. when running create table or select * from... it runs well. But when I try to do insert or select join, hive always fails. I'm ...
Dzaki Wicaksono's user avatar
0 votes
0 answers
36 views

i set up a port forwarding on port 50070 to access the hadoop master node

$ ssh -i C:/Users/amyousufi/Desktop/private-key -L 50070:localhost:50070 [email protected] after set up my port, i recieve the following error: bind [127.0.0.1]:50070: Permission denied ...
Aminullah Yousufi's user avatar
0 votes
0 answers
28 views

acl for yarn capacity scheduler is not working

I have a cluster hadoop (1 master 1 slave) and I divided the resources into 2 queue: a and b then i use Acls to grant permissions to user1 can submit queue a, user2 can use queue b. I try run [user2@...
Đắc An Nguyễn's user avatar
1 vote
0 answers
25 views

tracking URL disappears when Spark3.5.1 runs on Hadoop3

spark2.4, spark3.3.0, spark3.4.3 or any version of spark below 3.5, their tracking URL show ok for the same community version apache hadoop-yarn-3.2. But spark 3.5.1 directly just show the app id in ...
AppleCEO's user avatar
0 votes
1 answer
43 views

Hql query through using YARN APPLICATION ID

So I want to know if I can get the HQL query or the SQL query using the applicationId of a hive query that is running on YARN. I tried using yarn logs applicationid But it's showing the entire ...
blackishgray's user avatar
1 vote
1 answer
289 views

How to set up a connection between Spark code and Spark container using Docker?

I am working with a Docker setup for Hadoop and Spark using the following repository: docker-hadoop-spark. My Docker Compose YAML configuration is working correctly, and I am able to run the ...
Sadegh's user avatar
  • 412
0 votes
0 answers
44 views

Custom NEWLINE symbol in Greenplum

I want to create extern table in Greenplum like this: CREATE READABLE EXTERNAL TABLE "table" ("id" INTEGER, "name" TEXT) LOCATION ('<file location>') FORMAT 'TEXT' (...
Vadim Myakish's user avatar
0 votes
0 answers
26 views

Issues with ImportTsv - Job Fails with Exit Code 1

I am currently facing an issue while using the ImportTsv command to load data into an HBase table. I am running the command and the job is failing with a non-zero exit code 1. Hadoop version : 3.4.0 ...
Amandi Ekanayake's user avatar
0 votes
0 answers
92 views

Py4JJavaError: An error occurred while calling o117.showString

I am encountering a java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ error when trying to run a PySpark query to show tables from Hive. My environment ...
Gutivan Alief Syahputra's user avatar
0 votes
0 answers
8 views

how many calls per handler are allowed in the queue

Maybe someone can tell me how to calculate the size of ipc.server.handler.queue.size core-site.xml From the description of the property: Specifies how many calls for each handler are allowed in the ...
Alatau's user avatar
  • 41
0 votes
0 answers
13 views

Solr - Hadoop theory of data writing

I'm wondering how the data writing actually works. In a Solr cluster that saves the indexes onto hadoop datanodes, if I instruct a shard splitting how is the data managed? let's say that I split a ...
Roberto D. Maggi's user avatar

1
2 3 4 5
888