Skip to main content

All Questions

Filter by
Sorted by
Tagged with
0 votes
1 answer
1k views

Can Confluent's S3 Sink Connector for Kafka Connect write topics to a nested (not a top-level) folder in an S3 bucket using `topics.dir`?

Can Confluent's S3 Sink Connector for Kafka Connect write topics to a nested (not a top-level) folder in an S3 bucket using topics.dir? For example, if I set topics.dir to the value thisistoplevel/...
benjaminedwardwebb's user avatar
2 votes
2 answers
1k views

Can Kafka Connect be made rack aware so that my connector reads all partitions from one broker?

We have a Kafka cluster in Amazon MSK that has 3 brokers in different availability zones of the same region. I want to set up a Kafka Connect connector that backs up all data from our Kafka brokers to ...
GMA's user avatar
  • 6,066
0 votes
2 answers
941 views

Compressing Avro data in Confluent S3 Kafka Connector

I have a Confluent sink connector which is taking data from a Kafka topic. It is then ingesting into an S3 bucket. The ingest works fine and all was well, however now I am required to compress the ...
Zack Amin's user avatar
  • 536
0 votes
1 answer
430 views

Kafka-s3-connect killed instantly after start

I want to connect aws-Kafka with s3 using confluence connector on my ec2 server. I try to configure everything like in tutorials. When I run connect-standalone or connect-distributed, at first ...
Adam Gwóźdź's user avatar
1 vote
1 answer
5k views

Kafka S3 Source Connector for PARQUET format in S3

I have topic events being produced using Protobuf. I could successfully sink my topic events into an S3 bucket using the S3 Sink connector in Parquet format. Now I have in my S3 bucket objects of type ...
crileroro's user avatar
0 votes
1 answer
1k views

S3 sink connector creating too many small files when there is a lag

I am using s3 sink connector to ingest data in s3 with below settings. "rotate.interval.ms": "3600000" "flush.size": "2147483647" It works fine, when there is ...
Bhumi's user avatar
  • 73
0 votes
0 answers
143 views

Missing an complete nested object while using confluent kafka-S3 connector

I am using confluent kafka s3 connector to index my files to s3 bucket. I have my customisation included. Recently I noticed randomly for few records partial nested field is missing. This is ...
Ashit_Kumar's user avatar
0 votes
1 answer
778 views

Config settings for S3 sink connector

I am new to S3 sink connector, I am trying to setup s3 connector for my project. I have few doubts: What is the use of flush.size in config? what if I give the very large number(2147483647) for it, ...
Bhumi's user avatar
  • 73
0 votes
1 answer
694 views

Confluent Kafka-to-S3 sink custom s3 naming for easy partitioning

I'm backing up my kafka topics to s3 using confluent's kafka-connect-s3 https://www.confluent.io/hub/confluentinc/kafka-connect-s3. I want to be able to easily query this data using Athena and have it ...
Daniel Epstein's user avatar
1 vote
1 answer
527 views

how does the confluent s3 source connector know which files it has already ingested and which ones are new?

https://docs.confluent.io/kafka-connect-s3-source/current/ I think this connector polls s3 for a list of files -- but does it keep state about which ones it has processed and which ones are new? If it ...
andryushka x's user avatar
1 vote
1 answer
1k views

Kafka-connect without schema registry

I have a kafka-topic and I would like to feed it with AVRO data (currently in JSON). I know the "proper" way to do it is to use schema-registry but for testing purposes I would like to make ...
fricadelle's user avatar
3 votes
2 answers
3k views

Kafka S3 Source Connector

I have a requirement where sources outside of our application will drop a file in an S3 bucket that we have to load in a kafka topic. I am looking at Confluent's S3 Source connector and currently ...
adbdkb's user avatar
  • 2,151
0 votes
0 answers
453 views

What would be better in terms of performance wise Kafka to s3 connector

Google didn't help me, so I want to ask you. I have a lot of kafka topics I have seend many articles about java heap memory and so on but I need some guidance I have a lot of kafka topics that I ...
twintorrent's user avatar
1 vote
0 answers
89 views

Kafka Connector: S3OutputStream can not be cast to S3ParquetOutputStream

I am building a Kafka connector for S3 where I want to store output as parquet files. I know there is already a Confluent connector doing it but we need an avro registry. This is something we do not ...
Yassir S's user avatar
  • 1,042
0 votes
2 answers
1k views

How does Kafka S3 Connector Once Delivery Guarantee Work

I have read their blog and understand their examples. https://www.confluent.io/blog/apache-kafka-to-amazon-s3-exactly-once/ But I am trying to wrap my head around this scenario I have. My current ...
collarblind's user avatar
  • 4,739
2 votes
3 answers
3k views

kafka connect failes to write to S3 with - ERROR Multipart upload

We have Kafka connect running on Kube pods. We are seeing the following errors in the worker logs. We tried restarting the pods. [2020-04-02 14:40:13,237] WARN Aborting multi-part upload with id '...
user8072466's user avatar
2 votes
1 answer
3k views

How to create partition using FieldPartitioner from Kafka S3 Sink Connector

I am looking for the correct format to be applied to s3-sink.properties for Kafka S3 Sink Connector, using partition.class FieldPartitioner. I need to create 3 partitions (one sub-partition of the ...
Julia Bel's user avatar
  • 347
0 votes
0 answers
632 views

Confluent Platform S3SinkConnector not working

I've tried to connect s3 bucket using below S3SinkConnector config, status show Degraded /failed { "name": "mytestkafkatopic1", "connector.class": "io.confluent.connect.s3.S3SinkConnector", "...
Elankeeran's user avatar
  • 6,184
2 votes
1 answer
3k views

Tuning S3 file sizes for Kafka

I am trying to understand flush.size and rotate.interval.ms configuration for S3 connector in depth. I deployed S3 connector and I seem to have file sizes ranging from 6 kb all the way to 30 mb ...
DataJanitor's user avatar
0 votes
0 answers
160 views

Kafka To S3 Connector

Let assume we are using Kafka S3 Sink Connector in a Standalone mode. As it's written on the confluent page, it has exactly once delivery garantee. I don't understand how does it work... If for ...
Narek  Karapetyan's user avatar
0 votes
2 answers
940 views

Kafka connect s3 bucket folder

Can I create my own directory in s3 using confluent S3SinkConnector? I know it creates a folder structure, unfortunately we need a new directory strcuture.
Sudhakar Ganapathy's user avatar
0 votes
0 answers
427 views

Kafka Connect S3 sink - how to utilize no partitioning in s3-sink.properties

For legitimate reasons I am looking to dump all Kafka messages into a AWS S3 bucket without partitioning. I have a requirement to use JSON. My initial thought was to use the FieldPartitioner and ...
Liquidgenius's user avatar
2 votes
0 answers
1k views

Kafka - Confluent s3 connect - Connector fails to connect to s3

I'm trying to build a demo application where I read into a kafka topic from a public source and write this data into s3. Exactly as indicated in this link - https://www.confluent.fr/blog/apache-kafka-...
Prabhudev Prakash's user avatar
3 votes
1 answer
994 views

Kafka S3 connect: timed rotation based on wall clock doesn't seem to write

We are using Confluent's Kafka S3 Connector, version 5.2.1. Running with one node in a Distributed Worker setting. Per documentation we should be able to set the flush to S3 both on a size as well ...
gregsilin's user avatar
  • 287
1 vote
1 answer
2k views

Kafka Connect S3 Dynamic S3 Folder Structure Creation?

I have manually installed Confluent Kafka Connect S3 using the standalone method and not through Confluent's process or as part of the whole platform. I can successfully launch the connector from the ...
Liquidgenius's user avatar
3 votes
0 answers
2k views

Data Loss in Kafka S3 Connector

We are using Kafka S3 Connector for log pipeline, as it guarantees exactly-once semantics. However, we've experienced two data loss events on different topics. We found a suspicious error message in ...
Double Infinity's user avatar
1 vote
2 answers
5k views

Kafka Connect S3 sink - how to use the timestamp from the message itself [timestamp extractor]

I've been struggling with a problem using kafka connect and the S3 sink. First the structure: { Partition: number Offset: number Key: string Message: json string Timestamp: timestamp }...
Hespen's user avatar
  • 1,454
1 vote
0 answers
798 views

Kafka Connect Sink to S3: `AmazonS3Exception: We encountered an internal error`

I have a Kafka Connect S3 Sink writing records to Amazon S3. This particular sink is writing about 4k rec/sec. Every few days, one of the Kafka Connect worker tasks fails with the following error. A ...
clay's user avatar
  • 20.3k
0 votes
1 answer
2k views

kafka connect transforms RegExRouter exiting with unrecoverable exception

I have made a kafka pipeline to copy a sqlserver table to s3 During sink, i'm trying to transform topic names dropping prefix with the regexrouter function : "transforms":"dropPrefix", ...
Ftagn's user avatar
  • 315
1 vote
1 answer
55 views

Is there a way I can define a function to decide S3 path based on topic message in kafka connect

This question is related to Kafka to S3. Requirement: One of the kafka topics we are interested has some particular information, i.e, timestamp, table and etc. We can use this data to decide which S3 ...
Xiaohe Dong's user avatar
  • 5,023
0 votes
0 answers
593 views

Kafka Connect S3 - get the header from the message

use-case : Consume the messages sent to a topic and store in AWS S3 im using Kafka-s3-connector to achieve this , its working perfectly fine but Each file is encoded as ...
Chaitanya Muthyala's user avatar
3 votes
1 answer
903 views

How to configure kafka s3 sink connector for json using its fields AND time based partitioning?

I have a json coming in like this: { "app" : "hw", "content" : "hello world", "time" : "2018-05-06 12:53:04" } I wish to push to S3 in the following file format: /upper-directory/$...
progsync's user avatar
0 votes
1 answer
674 views

Error creating hive table from avro schema

I am trying to create a hive table by extracting the schema from Avro data which is stored in s3. Data is stored in s3 using the s3 Kafka connector. I am publishing a simple POJO to the producer. ...
Ayush Chauhan's user avatar
5 votes
1 answer
3k views

Force Confluent s3 sink to flush

I setup kafka connect s3 sink, duration set to 1 hour, and also I setup a rather big flush count, say 10,000. Now if there is not many message in the kafka channel, s3 sink will try to buffer them in ...
Xiang Zhang's user avatar
  • 2,953
5 votes
1 answer
7k views

How to properly restart a kafka s3 sink connect?

I started a kafka s3 sink connector (bundle connector from confluent package) since 1 May. It works fine until 8 May. Checking the status, it tells that some aws exception crashes this connector. This ...
Xiang Zhang's user avatar
  • 2,953
2 votes
1 answer
1k views

where does confluent s3 sink put the key?

I setup a confluent s3 sink connect, it stores .avro files in s3. I dump those files, and find out that they are just the message itself, I don't know where can I find the message key, any idea? The ...
Xiang Zhang's user avatar
  • 2,953
0 votes
1 answer
924 views

Handle lags in Kafka S3 Connector

We'are using Kafka Connect [distributed, confluence 4.0]. It works very well, except that there always remain an uncommitted messages in the topic that connector listens to. The behavior probably ...
Arkadiy Verman's user avatar
1 vote
1 answer
360 views

confluent kafka to s3 connection failed with ERROR Unexpected exception in Thread[KafkaBasedLog Work Thread -

I set up on EC2 the confluent (4.0) connector that reads from kafka and writes to S3. The standalone try goes well: bin/connect-standalone etc/standalone/example-connect-worker.properties etc/...
Boyu Wang's user avatar
9 votes
2 answers
8k views

Properly Configuring Kafka Connect S3 Sink TimeBasedPartitioner

I am trying to use the TimeBasedPartitioner of the Confluent S3 sink. Here is my config: { "name":"s3-sink", "config":{ "connector.class":"io.confluent.connect.s3.S3SinkConnector", "tasks....
Daniel's user avatar
  • 1,600
7 votes
2 answers
4k views

Restarting Kafka Connect S3 Sink Task Loses Position, Completely Rewrites everything

After restarting a Kafka Connect S3 sink task, it restarted writing all the way from the beginning of the topic and wrote duplicate copies of older records. In other words, Kafka Connect seemed to ...
clay's user avatar
  • 20.3k
1 vote
0 answers
1k views

Confluent Kafka S3 Connector doesn't upload to S3

I'm trying to follow the tutorial in this link http://docs.confluent.io/current/connect/connect-storage-cloud/kafka-connect-s3/docs/s3_connector.html#quickstart I managed to create the topic and ...
Kelvin Low's user avatar
4 votes
2 answers
3k views

Kafka Confluent S3 Connector "Failed to find class"

I'm trying a simple quickstart example and I get: Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to find any class that implements Connector and which name matches io....
clay's user avatar
  • 20.3k
36 votes
2 answers
24k views

Ideal value for Kafka Connect Distributed tasks.max configuration setting?

I am looking to productionize and deploy my Kafka Connect application. However, there are two questions I have about the tasks.max setting which is required and of high importance but details are ...
Phillip Mann's user avatar