Newest 'qubole' Questions

0 votes

1 answer

87 views

Pyspark error- Invalid argument, not a string or column

I have a dataframe in Pyspark - df_all. It has some data and need to do the following count = ceil(df_all.count()/1000000) It gives the following error TypeError: Invalid argument, not a string or ...

user2280352

155

asked Aug 16, 2023 at 18:22

2 votes

1 answer

794 views

How do you write a presto query to split a string into its own column

Trying to splint a string into multiple columns in qubole using presto query. {"field0":[{"startdate":"2022-07-13","lastnightdate":"2022-07-16","...

Abe

23

asked Jul 12, 2022 at 14:37

0 votes

1 answer

834 views

Presto Pivoting Data

I am really new to Presto and having trouble pivoting data in it. The method I am using is the following: select distinct location_id, case when role_group = 'IT' then employee_number end as ...

llorcs

79

asked Mar 12, 2022 at 15:45

2 votes

1 answer

510 views

need regexp_extract help, beginner

I have string column "49b8b35e-b62c-4a42-9d73-192d131d127a,03c8a7e0-5153-11ec-873a-0242ac11000a,eec8aee4-0500-4940-b319-15924cc2d248" this string column has 3 values separate by ","...

ajk

21

asked Dec 13, 2021 at 15:35

2 votes

1 answer

52 views

Data comparisons in Qubole

I am very new to Qubole.We recently migrated Oracle ebiz data to Saleforce.We have both Ebiz and Salesforce data in the Qubole Data Lake.There are some discrepancies between Ebiz and Salesforce.What ...

user2280352

155

asked Dec 6, 2021 at 23:40

1 vote

0 answers

475 views

Insert overwrite doesn't delete all the old data files

We are trying to insert overwrite a hive table. Most of the times it's overwriting as expected, i.e deleting any old files and replace new files. We are seeing some inconsistencies with this behavior, ...

Jas

11

asked May 18, 2021 at 4:57

1 vote

1 answer

970 views

Retrieve value in an array of an array with struct

I have a column in Hive table with type: array<array<struct<type:string,value:string,currency:string>>> Here is the sample of data in the column: [ [ { "type":...

user1761325

93

asked May 6, 2021 at 4:15

0 votes

0 answers

358 views

Query Qubole data in Python

I'm trying to query Qubole data in Python, but running into some issues. Below is my code: from qds_sdk.qubole import Qubole Qubole.configure(api_token="api_token", api_url="https://us....

BirdPlay6

33

asked Apr 30, 2021 at 20:53

0 votes

1 answer

782 views

How to safely insert parameters into a SQL query and get the resulting query?

I have to use a non DBAPI-compliant library to interact with a database (qds_sdk for Qubole). This library only allows to send raw SQL queries without parameters. Thus I would like a SQL injection-...

Roméo Després

2,103

asked Jan 4, 2021 at 16:51

1 vote

1 answer

83 views

Exclude records with certain values in Qubole

Using Qubole I have Table A (columns in json parsed...) ID Recommendation Decision 1 GOOD GOOD 2 BAD BAD 2 GOOD BAD 3 GOOD BAD 4 ...

Kurlito

13

asked Nov 24, 2020 at 8:00

1 vote

2 answers

126 views

How to connect UiPath to Qubole Hive cluster and run a query

One of the teams using RPA in my company wants to automate reporting that is run in Qubole - Hive environment. The initial approach is to unleash the robot to log in to Okta, then Workbench in Qubole, ...

Krystian Duda

11

asked Sep 21, 2020 at 23:55

0 votes

2 answers

320 views

How to get Python in Qubole to save CSV and TXT files to Azure data lake?

I have Qubole connected to Azure data lake, and I can start a spark cluster, and run PySpark on it. However, I can't save any native Python output, like text files or CSVs. I can't save anything other ...

HT.

211

asked Aug 3, 2020 at 19:21

1 vote

2 answers

305 views

Result-set inconsistency between hive and hive-llap

we are using Hive 3.1.x clusters on HDI 4.0, with 1 being LLAP and another Just HIVE. we've created a managed tables on both the clusters with the row count being 272409. Before merge on both ...

Vinay K L

45

asked Jul 30, 2020 at 17:51

0 votes

1 answer

477 views

How to change the timeout value when running commands on QDS

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours. Can someone help ...

Trupti

1

asked Jun 17, 2020 at 4:53

0 votes

1 answer

321 views

Logging and Debuging on Qubole

How does one log on Qubole/access logs from spark on Qubole? The setup I have: java library (JAR) Zeppelin Notebook (Scala), simply calling a method from the library Spark, Yarn cluster Log4j2 used ...

bde.dev

769

asked May 26, 2020 at 6:50

0 votes

1 answer

293 views

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: ...

Shuwn Yuan Tee

5,748

asked May 22, 2020 at 6:56

2 votes

1 answer

11k views

Pyspark Logging: Printing information at the wrong log level

Thanks for your time! I'd like to create and print legible summaries of my (hefty) data to my output when debugging my code, but stop creating and printing those summaries once finished to speed ...

Amit

41

asked May 13, 2020 at 19:07

0 votes

1 answer

1k views

Avoid pre-signed URL expiry when IAM role key rotates

In Airflow I have 2 tasks defined that run every day: the first one creates a zip file and saves it in AWS under s3://{bucket-name}/foo/bar/{date}/archive.zip the second one pre-signs that url (...

Maria Livia

75

asked May 12, 2020 at 13:16

0 votes

3 answers

136 views

How to query table partitions list using

I need to programmatically query Qubole for the list of partitions for a Hive table. I can do this by calling the correct API endpoint as described here, but I would like to use the qds-sdj-java ...

GreenGiant

5,196

asked Apr 22, 2020 at 22:44

-1 votes

1 answer

243 views

trying to execute s3-sqs qubole connector for spark structured streaming

I am trying to follow, https://github.com/qubole/s3-sqs-connector and trying to load the connector but seems like the connector is not available on maven and while generating the buiold manually the ...

Dipesh

1

asked Apr 14, 2020 at 10:32

0 votes

1 answer

424 views

Qubole Presto datatype "Map" using the Like Operator

So I am trying to apply a simple like function for a Qubole query on Presto. For a string datatype I can simply do like '%United States of America%'. However for the column I am trying to apply ...

pp2000

35

asked Apr 3, 2020 at 0:03

1 vote

1 answer

329 views

Spark Submit Default Command line options

How can we change the parameters in Spark Submit Default Command line options in Qubole. Though there is a option to override the values if needed under "Spark Submit Command Line Options" but this ...

Throw

11

asked Apr 2, 2020 at 6:51

-1 votes

1 answer

89 views

Can I write an HTML script and pass information from the script to a cell on Qubole?

Is it possible to write an HTML script and have the user interact on the HTML script and pass the data back to the zeppelin cell and have it rerun the data passed back? Thank you! Update: Have some ...

Dillon

11

asked Mar 21, 2020 at 20:44

0 votes

1 answer

131 views

How to upgrade Python version on Qubole?

The current version on Qubole is 3.5.3, and some packages, like PyMC3 and future XGBoost need higher versions. How do I upgrade? And would that affect other clusters' settings? error message

HT.

211

asked Mar 12, 2020 at 1:23

0 votes

1 answer

462 views

Unable to write or read from S3 bucket with Default AWS KMS encryption enabled

I am unable to read or write into a Default AWS KMS encrypted bucket without using the following configuration on my Qubole cluster fs.s3a.server-side-encryption-algorithm=SSE-KMS fs.s3a.server-side-...

Nunna Krishna Teja

1

asked Feb 20, 2020 at 11:54

0 votes

1 answer

219 views

Qubole Kinesis Connector for Spark structured streaming throws an error

We are using Qubole Kinesis Connector (jar) for Spark structured streaming. This used to work fine but suddenly, it is throwing an error "S3 filesystem not found". We could use the KCL but we need ...

Lightning-Analytics

11

asked Feb 14, 2020 at 1:38

0 votes

2 answers

69 views

Rest api in testdrive account?

Hi I am using Qubole trial version and it is test drive account so I am not getting API Token from control panel my accounts tab in qubole is there a way to access REST API's Now? Thanks in Advance

sai Kumar

43

asked Feb 4, 2020 at 6:30

0 votes

2 answers

374 views

Running Scala jobs in Scheduler

My job runs fine in my notebook, but when I copy and paste the script into the Spark Scala scheduled job, I run into errors like "script.scala:15: error: not found: value sqlContext". What do I need ...

Paul Mineau

11

asked Jan 7, 2020 at 21:43

0 votes

1 answer

85 views

PySpark Machine Learning on Wide Data in Qubole

I have a large dataset, with roughly 250 features, that I would like to use in a gradient-boosted trees classifier. I have millions of observations, but I'm having trouble getting the model to work ...

ErrorJordan

641

asked Jan 2, 2020 at 18:33

0 votes

1 answer

95 views

Setting up AWS Glue to crawl Qubole

Currently I work with Qubole to access Hive data. I've added metadata from several databases, and want to add all the Hive metadata to AWS Glue. Is this possible? Any help is appreciated.

Ash_s94

807

asked Dec 23, 2019 at 19:07

0 votes

1 answer

109 views

Scale plot size of matplotlib plots in Qubole Notebook

Is there a possibility of increasing the size of the plot plotted using z.showplot() in qubole notebooks. import matplotlib as plt plt.figure() plt.bar(pandas_df_hr_sg[:]['hour'],pandas_df_hr_sg[:]['...

Mustajib Mohammed Khan

1

asked Dec 14, 2019 at 13:18

0 votes

2 answers

265 views

How do I upgrade a library in Qubole's Jupyter Notebook, using PySpark?

Is there a way to do it right from a cell in the notebook? similar to pip install ... --upgrade I didn't know how to do what's instructed on https://docs.qubole.com/en/latest/faqs/general-questions/...

HT.

211

asked Dec 6, 2019 at 16:28

0 votes

1 answer

177 views

How to pass --properties-file to spark-submit in Qubole?

I am using Spark in Qubole by having the clusters created in AWS. In Qubole Workbench, when I execute the below Command Line, it works fine and the command is successful /usr/lib/spark/bin/spark-...

Saravanan

49

asked Nov 27, 2019 at 12:36

0 votes

2 answers

161 views

How to import a .py file to Qubole?

I'm connecting to Azure data lake, and I have the file there, but it's in a different path, and I don't know how to import it. Thank you in advance for your help!

HT.

211

asked Nov 25, 2019 at 15:20

0 votes

1 answer

52 views

In the new Analyze UI, how do I edit the title of my query?

In the new Qubole Analyze UI that came out recently, I cannot seem to find a way to change the title of a command. In the old interface, I could click on the command title and it would become an ...

GreenGiant

5,196

asked Nov 18, 2019 at 17:09

1 vote

1 answer

692 views

How to create hive external table with avro file on qubole?

Can someone point in the doc to create external table on qubole base on avro files? CREATE TABLE my_table_name ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS ...

user10714010

935

asked Oct 25, 2019 at 1:25

0 votes

1 answer

129 views

Performance analysis using Sparklens of Spark Streaming Application

I am trying to get performance analysis of a spark streaming application using sparklens. It is giving results like this Executor count 1 ( 80%) estimated time 01m 29s and estimated cluster ...

Abhay

697

asked Aug 2, 2019 at 16:30

0 votes

0 answers

904 views

How to fix 'Malformed class name' error in Spark Scala?

In Qubole notebook I am trying to get certain string from API response. It seems to be working just fine for sample data but fails when I use the full set. Spark version: 2.3.1; Scala version: 2.11; ...

Piotr

1

asked Jul 18, 2019 at 13:49

1 vote

2 answers

201 views

Implement case class inside a class

I am using the below code to run in Qubole Notebook and the code is running successfully. case class cls_Sch(Id:String, Name:String) class myClass { implicit val sparkSession = org.apache.spark....

Sarath Subramanian

21.2k

asked Jul 10, 2019 at 14:17

1 vote

1 answer

741 views

Extracting json field from string in Hive using dataset

I am trying a very basic hive query. I am trying to extract a json field from a dataset but I always get \N for the json field, however some_string comes okay Here is my query : WITH dataset AS ...

Bhavya Arora

800

asked May 30, 2019 at 18:25

0 votes

1 answer

80 views

retrieve size of data copied with hadoop distcp

I am running a hadoop distcp command as below: hadoop distcp src-loc target-loc I want to know the size of the data copied by running this command. I am planning to run the command on Qubole. Any ...

sneha salvi

57

asked May 16, 2019 at 23:41

2 votes

1 answer

6k views

How to create external tables from parquet files in s3 using hive 1.2?

I have created an external table in Qubole(Hive) which reads parquet(compressed: snappy) files from s3, but on performing a SELECT * table_name I am getting null values for all columns except the ...

S.Mehra

56

asked May 15, 2019 at 20:21

1 vote

1 answer

73 views

Get Qubole data row wise using java

Am trying to run a hive query using Qubole SDK. Though am able to get the desired result as string, in order to better process it, am looking to access this row-wise. Something like a list of java ...

roger_that

9,781

asked Apr 18, 2019 at 10:09

1 vote

1 answer

75 views

Recommendation on Performance optimization for SQL code

I have a code in Qubole that's taking almost 3 hours to execute. I am looking for some recommendations to decrease the code execution time. WITH -- Get latest date - 10 days before as day d AS ( ...

Flash

11

asked Apr 11, 2019 at 17:09

1 vote

1 answer

520 views

Syncing Qubole HIve table to Snowflake with Struct field

I have a table like following Qubole: use dm; CREATE EXTERNAL TABLE IF NOT EXISTS fact ( id string, fact_attr struct< attr1 : String, attr2 : String > ) STORED AS ...

Ambrish

3,667

asked Oct 23, 2018 at 8:15

1 vote

2 answers

200 views

Different results when distinct count by different time periods

I am trying to get a count of unique visitors. I first checked it by total without separating it by anytime frame. Main table (big data table sample): +-----------+----+-------+ |theDateTime|vD | ...

noobeerp

427

asked Oct 13, 2018 at 18:15

1 vote

1 answer

1k views

Big files causing shuffle error in hadoop map reduce

I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB . App > Error: org.apache.hadoop.mapreduce....

Jal

2,292

asked Oct 8, 2018 at 18:18

0 votes

1 answer

208 views

Get correct value from array in Hive QL

I have a Wrapped Array and want to only get the corresponding value struct when I query with LATERAL VIEW EXPLODE. SAMPLE STRUCTURE: COLUMNNAME: theARRAY WrappedArray([null,theVal,valTags,[123,...

noobeerp

427

asked Sep 26, 2018 at 0:49

2 votes

1 answer

137 views

Debug failed shuffles in hadoop map reduces

I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly. eg. 75GB took 1h 86GB took 5h I also see average shuffle time increase 10 ...

Jal

2,292

asked Sep 21, 2018 at 18:03

0 votes

0 answers

61 views

Convert column in presto from epoch to date [duplicate]

I tried this but that didn't work. cast(from_unixtime('1532568232662880')) as date Any other ideas?

Nick Knauer

4,233

asked Aug 30, 2018 at 18:59

Collectives™ on Stack Overflow

Related Tags