Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
0 votes
0 answers
33 views

How to transfer data using Zero-ETL to a writable DB in Redshift

Im using Zero-Etl to move my data from Aurora to Redshift. But this moves the data to a read-only database. How do I then move my data to a full access database? I have tried creating a materialized ...
Alexander Hernandez's user avatar
0 votes
0 answers
197 views

use AWS GLUE to execute sql on redshift

I have a set of ETL queries and am trying to create an AWS GLUE job to run against a redshift cluster. I have looked at pre/post actions but that doesn't seem to match my use case as the queries ARE ...
Aaron Luman's user avatar
0 votes
1 answer
239 views

psycopg2.OperationalError: connection to server at "default-workgroupxxx.redshift-serverlessx" (172.31.1.60), port 5439 failed: Connection timed out

I am trying to Load data to AWS Redshift Serverless using a python psycopy2 #from warehouseInterface import Warehouse import boto3 from dotenv import load_dotenv import os import psycopg2 #any file ...
Max's user avatar
  • 23
0 votes
1 answer
823 views

How to import csv file with double quotes in a string from S3 to Redshift using copy command?

I am trying to import a file in csv format from S3 into Redshift. The fail from stl_load_errors is Invalid quote formatting for CSV . Unfortunately I can't handle the source it comes from, so I am ...
cmos's user avatar
  • 1
1 vote
1 answer
684 views

Implementing scd2 in dbt using aws redshift, how do I define conditional natural keys?

Implementing scd2 in dbt using aws redshift. How do I define conditional natural keys? unique_id = ['crm_id', 'curr_recrd_flg', 'actve_flg'] I want to provide conditions like curr_recrd_flg = 'Y' and ...
MOT's user avatar
  • 91
0 votes
1 answer
549 views

Is it any way to ignore the record that isn't correct and go ahead with the next record while using COPY command to upload data from s3 to redshift?

I have a '.csv' file in the s3 that has a lot of text data. I am trying to upload the data from s3 to redshift table but my data is not consistent , it has a lot of special character. Some records may ...
Ashutosh kumar's user avatar
0 votes
1 answer
136 views

Updating Tables in AWS Redshift Datawamart

Background: We use fivetran to ingest data from multiple sources into AWS Redshift. We've got ETL scripts which we run on top of these tables to create other more relevant tables. Furthermore, we've ...
AIViz's user avatar
  • 102
0 votes
2 answers
688 views

Data Ingestion in Amazon Redshift

I have multiple data source from which I need to build and implement a DWH in AWS. I have one challenge with respect to one of my unstructured data source (Data coming from different APIs). How can I ...
KeenLearner's user avatar
-1 votes
1 answer
355 views

Redshift Data Transformation

What is the best way to transform data in Redshift? Like creating a stored procedure that transforms data within the same schema. My background is Oracle using PL/SQL and I used to create functions ...
Ian Medina Torreverde's user avatar
2 votes
1 answer
422 views

Unable to connect to AWS redshift via Meltano

hope you all are doing great. Can any one support me with an issue regarding Meltano? I want to perform EL (Extract Load) where I extract data from shopify and load it to AWS Redshift. To extract data ...
Zaki Khurshid's user avatar
0 votes
0 answers
3k views

Redshift - Insert data from pandas dataframe using Redshift Data API

I trying to load data that I have in a pandas data frame into a Redshift cluster using AWS lambda. I can't use a connector with the redshift endpoint url because the current VPC setup doesn't allow ...
Cristobal Sarome's user avatar
3 votes
1 answer
3k views

Getting "invalid quote formatting for csv" error while copying data from s3 to redshift

My csv file uploaded to S3 is having data as below: id,name,post,description,salary 10001,ajay,jr.Engg,"to handle ""cloud related tasks",100000 10002,bimal,sr.engg,"to handle \...
DataEngineer's user avatar
0 votes
2 answers
636 views

How to load and transform data from Table to another table in Amazon Redshift?

I am loading data from S3 to Redshift Database , now have a requirement to Perform ETL on that table and after filtering data , load into another table in another schema in Redshift. How I can load ...
Mohammad Azam's user avatar
1 vote
0 answers
207 views

What is a proper way to write an Airflow Sensor on top of Redshift Spectrum with option to skip partitions

I have a hourly Airflow ETL that rely on some data in Redshift Spectrum (with Glue as a metastore). I want this ETL to wait for a full data be available, that's why I have sql sensor that waits for ...
Ivan Rubanau's user avatar
0 votes
1 answer
510 views

Create dynamic string for IN clause

We are using Matillion ETL API to pass query parameters to the underlying Redshift query. A query parameter variable for States is created using concatenation from the dropdown list and passed in the ...
Gaurav S's user avatar
  • 1,009
0 votes
0 answers
202 views

Dumping from NiFi to Redshift using PutDataBaseRecord Processor is really slow

I am new to both Redshift and Apache NiFi. I am trying to dump CSV record (~3000 rows per record) every 15 minutes into a Redshift table using JDBC connection. The PutDatabaseRecord Processor is ...
Amith J Madathil's user avatar
0 votes
1 answer
122 views

ETL with Talend + AWS Redshift?

Background: I want to building up the ETL with the drag and drop function in Talend Question: May I know whether the Talend can compose the ETL built with drag and drop into the Redshift SQL, and run ...
SamMeow's user avatar
  • 374
0 votes
2 answers
2k views

AWS S3: How to plug in a dynamic file name in the S3 directory in COPY command

I have job in Redshift that is responsible for pulling 6 files every month from S3. File names follow a standard naming convention as "file_label_MonthNameYYYY_Batch01.CSV". I'd like to ...
SusanD's user avatar
  • 141
1 vote
1 answer
353 views

Redshift Alter table command returns `Target table and source table attributes don't match.`

I have an airflow pipeline which creates a staging table from an existing table, loads data in it from a csv, and then the following alter command is executed. ALTER TABLE "schema"."...
Madhur Kerni's user avatar
3 votes
1 answer
207 views

Using sortkeys and compression for a staging table with Redshift

Does it make sense to add sortkeys and compression to staging tables which are truncated daily in Redshift if target tables already have them? Does it make it any difference when you're performing ...
deepvalue's user avatar
0 votes
0 answers
170 views

Best Practice to load single fact table from multiple sources in Redshift

Is there a best practice to load the fact data from multiple sources in redshift? Redshift doesn't have a partitioning concept, it does however have the distribution concept. With distribution, one ...
Shaounak Nasikkar's user avatar
1 vote
1 answer
944 views

How many temporary/staging tables to use during the transform step of ETL?

My first thought is to first load data from S3 to a temporary table, apply the necessary transformations and then INSERT INTO target, final table. All the tables would have the same columns and are in ...
deepvalue's user avatar
0 votes
0 answers
457 views

Export data from RocksDb to S3 for ETL purpose

My new project uses RocksDb as storage. I would like to run a analysis query on the data in the RocksDb table daily and it will need to scan the entire table. I would like to export the data from ...
Iris Li's user avatar
  • 11
3 votes
1 answer
647 views

nvalid syntax: Create table sortkey auto with initial sortkeys

I'm trying to use target-redshift to push data to aws-redshift https://pypi.org/project/target-redshift/ I am using airflow to monitor etl status This is error log and i have no clue what it means. ...
rojer_1's user avatar
  • 55
5 votes
1 answer
5k views

How to create a Redshift table using Glue Data Catalog

I'm developing ETL pipeline using AWS Glue. So I have a csv file that is transformed in many ways using PySpark, such as duplicate column, change data types, add new columns, etc. I ran a crawler with ...
tikirin tukurun's user avatar
2 votes
1 answer
2k views

Upload data to Redshift with PySpark

I have a script written on pyspark. What I try to do is read *.csv file from S3 bucket in AWS using pyspark. I create a DataFrame with all data, select all the columns I need and cast them types my ...
4d61726b's user avatar
  • 487
1 vote
2 answers
2k views

ETL + sync data between with Redshift and Dynamodb

I need to aggregate data coming from DynamoDB to AWS Redshift, and I need to be accurate and in-sync. For the ETL I'm planning to use DynamoDB Streams, Lambda transform, Kinesis Firehorse to, finally, ...
Courier's user avatar
  • 63
2 votes
2 answers
934 views

ELT pipeline for Mongo

I am trying to get my data into Amazon Redshift using Fivetran, but have some questions in general about the ELT/ETL process. My source database is Mongo but I want to perform deep analysis on the ...
joethemow's user avatar
  • 1,773
0 votes
0 answers
181 views

SQL to UNION history table with current table (with date range aligned)

Hi i have below 2 tables table_histroy version, from_date to_date ID Place 1 1900-01-01 00:00:00 2020-07-08 10:00:49 123 Delhi 2 2020-07-08 10:...
user14464843's user avatar
0 votes
1 answer
679 views

Building OLTP DB from Datalake?

I'm confused and having trouble finding examples and reference architecture where someone wants to extract data from an existing data lake (S3/Lakeformation in my case) and build a OLTP datastore that ...
wishiwasabigdataguy's user avatar
0 votes
1 answer
275 views

How do I avoid memory error in EC2 when loading a huge table in Pandas dataframe?

I tried to connect to redshift and load my huge fact table into pandas dataframe like below, and I always encounter memory error when I execute the script. I am thinking either the loading by chunk ...
Henry's user avatar
  • 23
1 vote
0 answers
430 views

KeyError: '_id' CRITICAL ('Exception writing records', KeyError('_id')) Singer.io

I got this error while doing ELT from mongo-tap to redshift. When I simply extract data from MongoDB to CSV it works fine. but when piping out output to redshift I am getting this _id error. not sure ...
Waleed Arshad's user avatar
0 votes
1 answer
4k views

Redshift loading data issue: Specified types or functions (one per INFO message) not supported on Redshift tables

SELECT s.store_id AS store_key, s.store_id, a.address, a.address2, a.district, c.city, co.country, a.postal_code, st.first_name AS ...
wyn's user avatar
  • 141
1 vote
1 answer
211 views

how do you design an OLAP system to power a dashboard of hourly (or even more granular) API usage stats

For background, I collect API usage logs (request, response, latency, userId, etc) for an application. A typical day will accumulate 200-300 million records. This data is currently stored on s3 in ...
Matt H's user avatar
  • 409
1 vote
0 answers
7k views

How to transform a debezium message in JSON format such that it can be loaded into Redshift

I need help to achieve few things. I have created a data pipeline as mentioned below. Mysql-->debezium--> Kafka-->Kafka Connect--->AWS S3. Now S3 will have a debezium event message in JSON format. ...
user2322440's user avatar
0 votes
1 answer
529 views

What would be a good approach to migrate Aurora data into a Redshift DWH?

We need to move and consolidate data from various Aurora databases into a Redshift one. Since our endpoints are AWS services we are learning about Glue, Pipeline and also Matillion. Is it Glue ...
Aleix's user avatar
  • 451
0 votes
1 answer
510 views

AWS Glue check file contents correctness

I have a project in AWS to insert data from some files, which will be in S3, to Redshift. The point is that the ETL has to be scheduled each day to find new files in S3 and then check if those files ...
user2728349's user avatar
0 votes
1 answer
50 views

Select Date1, Date2 From Table while in Redshift

What is the right syntax for creating a transaction base on date range. for ex. this is my date set Table DocID Date1 Date2 0001 2020-01-01 2020-01-03 and this is what i ...
Ren's user avatar
  • 13
2 votes
1 answer
656 views

Inserting data into one table from second table on multiple join condition

I am trying to insert data in one table (trgt_tbl) from a second table (src_tbl) using join on key fields. The query seems to work fine but it's extremely slow. There are around 16 mil records in ...
Prashant Rai's user avatar
3 votes
2 answers
2k views

How to transform data from S3 bucket before writing to Redshift DW?

I'm creating a (modern) data warehouse in redshift. All of our infrastructure is hosted at Amazon. So far, I have setup DMS to ingest data (including changed data) from some tables of our business ...
Henrique Miranda's user avatar
0 votes
1 answer
3k views

Invalid timestamp format in Redshift COPY command

I have tried almost every solution from SO but still same issue. I have a CSV file in S3 and a table in Redshift. Table structure is as below: like_id => inetger p_id => integer c_id => ...
Muhammad Hashir Anwaar's user avatar
0 votes
1 answer
1k views

How to deal with Linebreaks in redshift load?

I have a csv which has line breaks in one of the column. I get the error Delimiter not found. If I replace the text as continuous without line-breaks then it works. But how do I deal with line-breaks....
karthikeya akula's user avatar
0 votes
1 answer
587 views

How does overwrite existing insert mode work in redshiftcopyactivity for aws data pipeline

I am new to aws data pipeline. We have a use case where we copy updated data into redshift . I wanted to know whether I can use OVERWRITE_EXISTING insert mode for redshiftcopyactivity. Also, please ...
SIMRAN MEHTA's user avatar
3 votes
1 answer
980 views

Does Node-Redshift supports Copy command (query) to load data from S3 to Redshift?

I wanted to know whether "Node-redshift" module supports Copy From query, to get bulk data from S3 bucket and load it in Redshift? If not what other options I can use to connect to Redshift and use ...
RST234's user avatar
  • 31
3 votes
1 answer
2k views

Migrating Data From Amazon Redshift into Aurora DB

There are tons of examples to migrate data from Aurora DB to Redshift, but I couldn't find any example or documentation for migrating data from Redshift to Aurora DB. Any suggestion/example/doc for ...
The CoBe's user avatar
1 vote
1 answer
221 views

How to model S3 storage for query using AWS RedShift Spectrum

There is a users table present in a MySQL database. We want to migrate the data into AMazon S3 for further analysis using Amazon Redshift. Day1 - Export 10 rows of data from users table (Total Rows: ...
r123's user avatar
  • 618
-1 votes
2 answers
337 views

is 100-200 upserts and inserts in a 10 second window into a 3 node redshift cluster a realistic architecture?

Under 3 nodes using redshift we plan on doing 50-100 inserts every 10 seconds. Within that 10 second window we also will try to do the equivalent of a redshift upsert as documented here https://docs....
Brian Yeh's user avatar
  • 3,227
0 votes
1 answer
135 views

Query giving wrong results. Any idea why is this happening?

The query is not returning any rows despite running successfully in ETL Manager. Any idea why this is happening? In case you think the first two temp tables might contain zero results, I can assure ...
RaphX's user avatar
  • 103
1 vote
0 answers
3k views

RedShift run multiple queries in parallel

I have 20 ETL queries with multiple statements, i have to run all these scripts all in one go (or you can say in parallel) in RedShift. I'm doing the following: I have created a main.sql file with ...
Aneela Saleem Ramzan's user avatar
3 votes
3 answers
10k views

Ingesting Google Analytics data into S3 or Redshift [closed]

I am looking for options to ingest Google Analytics data(historical data as well) into Redshift. Any suggestions regarding tools, API's are welcomed. I searched online and found out Stitch as one of ...
Prajakta Yerpude's user avatar