Newest 'amazon-redshift+etl' Questions

0 votes

0 answers

33 views

How to transfer data using Zero-ETL to a writable DB in Redshift

Im using Zero-Etl to move my data from Aurora to Redshift. But this moves the data to a read-only database. How do I then move my data to a full access database? I have tried creating a materialized ...

Alexander Hernandez

23

asked Nov 14 at 5:30

0 votes

0 answers

197 views

use AWS GLUE to execute sql on redshift

I have a set of ETL queries and am trying to create an AWS GLUE job to run against a redshift cluster. I have looked at pre/post actions but that doesn't seem to match my use case as the queries ARE ...

Aaron Luman

653

asked Apr 10 at 23:33

0 votes

1 answer

239 views

psycopg2.OperationalError: connection to server at "default-workgroupxxx.redshift-serverlessx" (172.31.1.60), port 5439 failed: Connection timed out

I am trying to Load data to AWS Redshift Serverless using a python psycopy2 #from warehouseInterface import Warehouse import boto3 from dotenv import load_dotenv import os import psycopg2 #any file ...

Max

23

asked Dec 16, 2023 at 16:13

0 votes

1 answer

823 views

How to import csv file with double quotes in a string from S3 to Redshift using copy command?

I am trying to import a file in csv format from S3 into Redshift. The fail from stl_load_errors is Invalid quote formatting for CSV . Unfortunately I can't handle the source it comes from, so I am ...

cmos

1

asked Mar 2, 2023 at 10:40

1 vote

1 answer

684 views

Implementing scd2 in dbt using aws redshift, how do I define conditional natural keys?

Implementing scd2 in dbt using aws redshift. How do I define conditional natural keys? unique_id = ['crm_id', 'curr_recrd_flg', 'actve_flg'] I want to provide conditions like curr_recrd_flg = 'Y' and ...

MOT

91

asked Jan 27, 2023 at 19:38

0 votes

1 answer

549 views

Is it any way to ignore the record that isn't correct and go ahead with the next record while using COPY command to upload data from s3 to redshift?

I have a '.csv' file in the s3 that has a lot of text data. I am trying to upload the data from s3 to redshift table but my data is not consistent , it has a lot of special character. Some records may ...

Ashutosh kumar

21

asked Nov 11, 2022 at 18:14

0 votes

1 answer

136 views

Updating Tables in AWS Redshift Datawamart

Background: We use fivetran to ingest data from multiple sources into AWS Redshift. We've got ETL scripts which we run on top of these tables to create other more relevant tables. Furthermore, we've ...

AIViz

102

asked Nov 9, 2022 at 14:30

0 votes

2 answers

688 views

Data Ingestion in Amazon Redshift

I have multiple data source from which I need to build and implement a DWH in AWS. I have one challenge with respect to one of my unstructured data source (Data coming from different APIs). How can I ...

KeenLearner

755

asked May 20, 2022 at 19:13

-1 votes

1 answer

355 views

Redshift Data Transformation

What is the best way to transform data in Redshift? Like creating a stored procedure that transforms data within the same schema. My background is Oracle using PL/SQL and I used to create functions ...

Ian Medina Torreverde

1

asked May 19, 2022 at 15:56

2 votes

1 answer

422 views

Unable to connect to AWS redshift via Meltano

hope you all are doing great. Can any one support me with an issue regarding Meltano? I want to perform EL (Extract Load) where I extract data from shopify and load it to AWS Redshift. To extract data ...

Zaki Khurshid

33

asked Mar 17, 2022 at 7:40

0 votes

0 answers

3k views

Redshift - Insert data from pandas dataframe using Redshift Data API

I trying to load data that I have in a pandas data frame into a Redshift cluster using AWS lambda. I can't use a connector with the redshift endpoint url because the current VPC setup doesn't allow ...

Cristobal Sarome

694

asked Feb 24, 2022 at 23:51

3 votes

1 answer

3k views

Getting "invalid quote formatting for csv" error while copying data from s3 to redshift

My csv file uploaded to S3 is having data as below: id,name,post,description,salary 10001,ajay,jr.Engg,"to handle ""cloud related tasks",100000 10002,bimal,sr.engg,"to handle \...

DataEngineer

31

asked Feb 1, 2022 at 15:56

0 votes

2 answers

636 views

How to load and transform data from Table to another table in Amazon Redshift?

I am loading data from S3 to Redshift Database , now have a requirement to Perform ETL on that table and after filtering data , load into another table in another schema in Redshift. How I can load ...

Mohammad Azam

33

asked Jan 31, 2022 at 13:53

1 vote

0 answers

207 views

What is a proper way to write an Airflow Sensor on top of Redshift Spectrum with option to skip partitions

I have a hourly Airflow ETL that rely on some data in Redshift Spectrum (with Glue as a metastore). I want this ETL to wait for a full data be available, that's why I have sql sensor that waits for ...

Ivan Rubanau

384

asked Jan 17, 2022 at 15:09

0 votes

1 answer

510 views

Create dynamic string for IN clause

We are using Matillion ETL API to pass query parameters to the underlying Redshift query. A query parameter variable for States is created using concatenation from the dropdown list and passed in the ...

Gaurav S

1,009

asked Dec 21, 2021 at 0:57

0 votes

0 answers

202 views

Dumping from NiFi to Redshift using PutDataBaseRecord Processor is really slow

I am new to both Redshift and Apache NiFi. I am trying to dump CSV record (~3000 rows per record) every 15 minutes into a Redshift table using JDBC connection. The PutDatabaseRecord Processor is ...

Amith J Madathil

1

asked Nov 15, 2021 at 13:56

0 votes

1 answer

122 views

ETL with Talend + AWS Redshift?

Background: I want to building up the ETL with the drag and drop function in Talend Question: May I know whether the Talend can compose the ETL built with drag and drop into the Redshift SQL, and run ...

SamMeow

374

asked Nov 10, 2021 at 2:47

0 votes

2 answers

2k views

AWS S3: How to plug in a dynamic file name in the S3 directory in COPY command

I have job in Redshift that is responsible for pulling 6 files every month from S3. File names follow a standard naming convention as "file_label_MonthNameYYYY_Batch01.CSV". I'd like to ...

SusanD

141

asked Jul 30, 2021 at 0:51

1 vote

1 answer

353 views

Redshift Alter table command returns `Target table and source table attributes don't match.`

I have an airflow pipeline which creates a staging table from an existing table, loads data in it from a csv, and then the following alter command is executed. ALTER TABLE "schema"."...

Madhur Kerni

35

asked Jun 7, 2021 at 8:29

3 votes

1 answer

207 views

Using sortkeys and compression for a staging table with Redshift

Does it make sense to add sortkeys and compression to staging tables which are truncated daily in Redshift if target tables already have them? Does it make it any difference when you're performing ...

deepvalue

93

asked Jun 1, 2021 at 8:02

0 votes

0 answers

170 views

Best Practice to load single fact table from multiple sources in Redshift

Is there a best practice to load the fact data from multiple sources in redshift? Redshift doesn't have a partitioning concept, it does however have the distribution concept. With distribution, one ...

Shaounak Nasikkar

314

asked Apr 13, 2021 at 12:57

1 vote

1 answer

944 views

How many temporary/staging tables to use during the transform step of ETL?

My first thought is to first load data from S3 to a temporary table, apply the necessary transformations and then INSERT INTO target, final table. All the tables would have the same columns and are in ...

deepvalue

93

asked Apr 1, 2021 at 14:54

0 votes

0 answers

457 views

Export data from RocksDb to S3 for ETL purpose

My new project uses RocksDb as storage. I would like to run a analysis query on the data in the RocksDb table daily and it will need to scan the entire table. I would like to export the data from ...

Iris Li

11

asked Mar 10, 2021 at 0:34

3 votes

1 answer

647 views

nvalid syntax: Create table sortkey auto with initial sortkeys

I'm trying to use target-redshift to push data to aws-redshift https://pypi.org/project/target-redshift/ I am using airflow to monitor etl status This is error log and i have no clue what it means. ...

rojer_1

55

asked Mar 8, 2021 at 16:41

5 votes

1 answer

5k views

How to create a Redshift table using Glue Data Catalog

I'm developing ETL pipeline using AWS Glue. So I have a csv file that is transformed in many ways using PySpark, such as duplicate column, change data types, add new columns, etc. I ran a crawler with ...

tikirin tukurun

141

asked Feb 28, 2021 at 5:49

2 votes

1 answer

2k views

Upload data to Redshift with PySpark

I have a script written on pyspark. What I try to do is read *.csv file from S3 bucket in AWS using pyspark. I create a DataFrame with all data, select all the columns I need and cast them types my ...

4d61726b

487

asked Dec 29, 2020 at 17:27

1 vote

2 answers

2k views

ETL + sync data between with Redshift and Dynamodb

I need to aggregate data coming from DynamoDB to AWS Redshift, and I need to be accurate and in-sync. For the ETL I'm planning to use DynamoDB Streams, Lambda transform, Kinesis Firehorse to, finally, ...

Courier

63

asked Dec 16, 2020 at 14:26

2 votes

2 answers

934 views

ELT pipeline for Mongo

I am trying to get my data into Amazon Redshift using Fivetran, but have some questions in general about the ELT/ETL process. My source database is Mongo but I want to perform deep analysis on the ...

joethemow

1,773

asked Oct 28, 2020 at 3:57

0 votes

0 answers

181 views

SQL to UNION history table with current table (with date range aligned)

Hi i have below 2 tables table_histroy version, from_date to_date ID Place 1 1900-01-01 00:00:00 2020-07-08 10:00:49 123 Delhi 2 2020-07-08 10:...

user14464843

1

asked Oct 16, 2020 at 20:40

0 votes

1 answer

679 views

Building OLTP DB from Datalake?

I'm confused and having trouble finding examples and reference architecture where someone wants to extract data from an existing data lake (S3/Lakeformation in my case) and build a OLTP datastore that ...

wishiwasabigdataguy

147

asked Oct 1, 2020 at 19:30

0 votes

1 answer

275 views

How do I avoid memory error in EC2 when loading a huge table in Pandas dataframe?

I tried to connect to redshift and load my huge fact table into pandas dataframe like below, and I always encounter memory error when I execute the script. I am thinking either the loading by chunk ...

Henry

23

asked Aug 28, 2020 at 6:06

1 vote

0 answers

430 views

KeyError: '_id' CRITICAL ('Exception writing records', KeyError('_id')) Singer.io

I got this error while doing ELT from mongo-tap to redshift. When I simply extract data from MongoDB to CSV it works fine. but when piping out output to redshift I am getting this _id error. not sure ...

Waleed Arshad

1,099

asked Jul 8, 2020 at 11:33

0 votes

1 answer

4k views

Redshift loading data issue: Specified types or functions (one per INFO message) not supported on Redshift tables

SELECT s.store_id AS store_key, s.store_id, a.address, a.address2, a.district, c.city, co.country, a.postal_code, st.first_name AS ...

wyn

141

asked Jul 3, 2020 at 13:37

1 vote

1 answer

211 views

how do you design an OLAP system to power a dashboard of hourly (or even more granular) API usage stats

For background, I collect API usage logs (request, response, latency, userId, etc) for an application. A typical day will accumulate 200-300 million records. This data is currently stored on s3 in ...

Matt H

409

asked Jun 20, 2020 at 11:35

1 vote

0 answers

7k views

How to transform a debezium message in JSON format such that it can be loaded into Redshift

I need help to achieve few things. I have created a data pipeline as mentioned below. Mysql-->debezium--> Kafka-->Kafka Connect--->AWS S3. Now S3 will have a debezium event message in JSON format. ...

user2322440

23

asked Jun 10, 2020 at 5:52

0 votes

1 answer

529 views

What would be a good approach to migrate Aurora data into a Redshift DWH?

We need to move and consolidate data from various Aurora databases into a Redshift one. Since our endpoints are AWS services we are learning about Glue, Pipeline and also Matillion. Is it Glue ...

Aleix

451

asked Apr 29, 2020 at 8:48

0 votes

1 answer

510 views

AWS Glue check file contents correctness

I have a project in AWS to insert data from some files, which will be in S3, to Redshift. The point is that the ETL has to be scheduled each day to find new files in S3 and then check if those files ...

user2728349

179

asked Mar 21, 2020 at 15:09

0 votes

1 answer

50 views

Select Date1, Date2 From Table while in Redshift

What is the right syntax for creating a transaction base on date range. for ex. this is my date set Table DocID Date1 Date2 0001 2020-01-01 2020-01-03 and this is what i ...

Ren

13

asked Mar 17, 2020 at 7:35

2 votes

1 answer

656 views

Inserting data into one table from second table on multiple join condition

I am trying to insert data in one table (trgt_tbl) from a second table (src_tbl) using join on key fields. The query seems to work fine but it's extremely slow. There are around 16 mil records in ...

Prashant Rai

135

asked Feb 26, 2020 at 20:45

3 votes

2 answers

2k views

How to transform data from S3 bucket before writing to Redshift DW?

I'm creating a (modern) data warehouse in redshift. All of our infrastructure is hosted at Amazon. So far, I have setup DMS to ingest data (including changed data) from some tables of our business ...

Henrique Miranda

1,108

asked Feb 11, 2020 at 21:36

0 votes

1 answer

3k views

Invalid timestamp format in Redshift COPY command

I have tried almost every solution from SO but still same issue. I have a CSV file in S3 and a table in Redshift. Table structure is as below: like_id => inetger p_id => integer c_id => ...

Muhammad Hashir Anwaar

614

asked Nov 3, 2019 at 11:46

0 votes

1 answer

1k views

How to deal with Linebreaks in redshift load?

I have a csv which has line breaks in one of the column. I get the error Delimiter not found. If I replace the text as continuous without line-breaks then it works. But how do I deal with line-breaks....

karthikeya akula

1

asked Aug 25, 2019 at 14:56

0 votes

1 answer

587 views

How does overwrite existing insert mode work in redshiftcopyactivity for aws data pipeline

I am new to aws data pipeline. We have a use case where we copy updated data into redshift . I wanted to know whether I can use OVERWRITE_EXISTING insert mode for redshiftcopyactivity. Also, please ...

SIMRAN MEHTA

1

asked Aug 13, 2019 at 9:24

3 votes

1 answer

980 views

Does Node-Redshift supports Copy command (query) to load data from S3 to Redshift?

I wanted to know whether "Node-redshift" module supports Copy From query, to get bulk data from S3 bucket and load it in Redshift? If not what other options I can use to connect to Redshift and use ...

RST234

31

asked Aug 2, 2019 at 18:58

3 votes

1 answer

2k views

Migrating Data From Amazon Redshift into Aurora DB

There are tons of examples to migrate data from Aurora DB to Redshift, but I couldn't find any example or documentation for migrating data from Redshift to Aurora DB. Any suggestion/example/doc for ...

The CoBe

63

asked Jun 18, 2019 at 21:06

1 vote

1 answer

221 views

How to model S3 storage for query using AWS RedShift Spectrum

There is a users table present in a MySQL database. We want to migrate the data into AMazon S3 for further analysis using Amazon Redshift. Day1 - Export 10 rows of data from users table (Total Rows: ...

r123

618

asked Jun 13, 2019 at 10:52

-1 votes

2 answers

337 views

is 100-200 upserts and inserts in a 10 second window into a 3 node redshift cluster a realistic architecture?

Under 3 nodes using redshift we plan on doing 50-100 inserts every 10 seconds. Within that 10 second window we also will try to do the equivalent of a redshift upsert as documented here https://docs....

Brian Yeh

3,227

asked May 9, 2019 at 23:47

0 votes

1 answer

135 views

Query giving wrong results. Any idea why is this happening?

The query is not returning any rows despite running successfully in ETL Manager. Any idea why this is happening? In case you think the first two temp tables might contain zero results, I can assure ...

RaphX

103

asked Mar 6, 2019 at 11:14

1 vote

0 answers

3k views

RedShift run multiple queries in parallel

I have 20 ETL queries with multiple statements, i have to run all these scripts all in one go (or you can say in parallel) in RedShift. I'm doing the following: I have created a main.sql file with ...

Aneela Saleem Ramzan

137

asked Mar 5, 2019 at 8:01

3 votes

3 answers

10k views

Ingesting Google Analytics data into S3 or Redshift [closed]

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of ...

Prajakta Yerpude

265

asked Feb 27, 2019 at 17:37

Collectives™ on Stack Overflow

All Questions

Related Tags