All Questions
Tagged with confluent-platform amazon-s3
43 questions
0
votes
1
answer
1k
views
Can Confluent's S3 Sink Connector for Kafka Connect write topics to a nested (not a top-level) folder in an S3 bucket using `topics.dir`?
Can Confluent's S3 Sink Connector for Kafka Connect write topics to a nested (not a top-level) folder in an S3 bucket using topics.dir?
For example, if I set topics.dir to the value thisistoplevel/...
2
votes
2
answers
1k
views
Can Kafka Connect be made rack aware so that my connector reads all partitions from one broker?
We have a Kafka cluster in Amazon MSK that has 3 brokers in different availability zones of the same region. I want to set up a Kafka Connect connector that backs up all data from our Kafka brokers to ...
0
votes
2
answers
941
views
Compressing Avro data in Confluent S3 Kafka Connector
I have a Confluent sink connector which is taking data from a Kafka topic. It is then ingesting into an S3 bucket.
The ingest works fine and all was well, however now I am required to compress the ...
0
votes
1
answer
430
views
Kafka-s3-connect killed instantly after start
I want to connect aws-Kafka with s3 using confluence connector on my ec2 server. I try to configure everything like in tutorials. When I run connect-standalone or connect-distributed, at first ...
1
vote
1
answer
5k
views
Kafka S3 Source Connector for PARQUET format in S3
I have topic events being produced using Protobuf. I could successfully sink my topic events into an S3 bucket using the S3 Sink connector in Parquet format. Now I have in my S3 bucket objects of type ...
0
votes
1
answer
1k
views
S3 sink connector creating too many small files when there is a lag
I am using s3 sink connector to ingest data in s3 with below settings.
"rotate.interval.ms": "3600000"
"flush.size": "2147483647"
It works fine, when there is ...
0
votes
0
answers
143
views
Missing an complete nested object while using confluent kafka-S3 connector
I am using confluent kafka s3 connector to index my files to s3 bucket. I have my customisation included. Recently I noticed randomly for few records partial nested field is missing. This is ...
0
votes
1
answer
778
views
Config settings for S3 sink connector
I am new to S3 sink connector, I am trying to setup s3 connector for my project.
I have few doubts:
What is the use of flush.size in config? what if I give the very large number(2147483647) for it, ...
0
votes
1
answer
694
views
Confluent Kafka-to-S3 sink custom s3 naming for easy partitioning
I'm backing up my kafka topics to s3 using confluent's kafka-connect-s3 https://www.confluent.io/hub/confluentinc/kafka-connect-s3. I want to be able to easily query this data using Athena and have it ...
1
vote
1
answer
527
views
how does the confluent s3 source connector know which files it has already ingested and which ones are new?
https://docs.confluent.io/kafka-connect-s3-source/current/
I think this connector polls s3 for a list of files -- but does it keep state about which ones it has processed and which ones are new? If it ...
1
vote
1
answer
1k
views
Kafka-connect without schema registry
I have a kafka-topic and I would like to feed it with AVRO data (currently in JSON). I know the "proper" way to do it is to use schema-registry but for testing purposes I would like to make ...
3
votes
2
answers
3k
views
Kafka S3 Source Connector
I have a requirement where sources outside of our application will drop a file in an S3 bucket that we have to load in a kafka topic. I am looking at Confluent's S3 Source connector and currently ...
0
votes
0
answers
453
views
What would be better in terms of performance wise Kafka to s3 connector
Google didn't help me, so I want to ask you. I have a lot of kafka topics I have seend many articles about java heap memory and so on but I need some guidance
I have a lot of kafka topics that I ...
1
vote
0
answers
89
views
Kafka Connector: S3OutputStream can not be cast to S3ParquetOutputStream
I am building a Kafka connector for S3 where I want to store output as parquet files. I know there is already a Confluent connector doing it but we need an avro registry. This is something we do not ...
0
votes
2
answers
1k
views
How does Kafka S3 Connector Once Delivery Guarantee Work
I have read their blog and understand their examples.
https://www.confluent.io/blog/apache-kafka-to-amazon-s3-exactly-once/
But I am trying to wrap my head around this scenario I have. My current ...
2
votes
3
answers
3k
views
kafka connect failes to write to S3 with - ERROR Multipart upload
We have Kafka connect running on Kube pods. We are seeing the following errors in the worker logs. We tried restarting the pods.
[2020-04-02 14:40:13,237] WARN Aborting multi-part upload with id '...
2
votes
1
answer
3k
views
How to create partition using FieldPartitioner from Kafka S3 Sink Connector
I am looking for the correct format to be applied to s3-sink.properties for Kafka S3 Sink Connector, using partition.class FieldPartitioner.
I need to create 3 partitions (one sub-partition of the ...
0
votes
0
answers
632
views
Confluent Platform S3SinkConnector not working
I've tried to connect s3 bucket using below S3SinkConnector config, status show Degraded /failed
{
"name": "mytestkafkatopic1",
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"...
2
votes
1
answer
3k
views
Tuning S3 file sizes for Kafka
I am trying to understand flush.size and rotate.interval.ms configuration for S3 connector in depth. I deployed S3 connector and I seem to have file sizes ranging from 6 kb all the way to 30 mb ...
0
votes
0
answers
160
views
Kafka To S3 Connector
Let assume we are using Kafka S3 Sink Connector in a Standalone mode.
As it's written on the confluent page, it has exactly once delivery garantee.
I don't understand how does it work...
If for ...
0
votes
2
answers
940
views
Kafka connect s3 bucket folder
Can I create my own directory in s3 using confluent S3SinkConnector?
I know it creates a folder structure, unfortunately we need a new directory strcuture.
0
votes
0
answers
427
views
Kafka Connect S3 sink - how to utilize no partitioning in s3-sink.properties
For legitimate reasons I am looking to dump all Kafka messages into a AWS S3 bucket without partitioning. I have a requirement to use JSON.
My initial thought was to use the FieldPartitioner and ...
2
votes
0
answers
1k
views
Kafka - Confluent s3 connect - Connector fails to connect to s3
I'm trying to build a demo application where I read into a kafka topic from a public source and write this data into s3. Exactly as indicated in this link - https://www.confluent.fr/blog/apache-kafka-...
3
votes
1
answer
994
views
Kafka S3 connect: timed rotation based on wall clock doesn't seem to write
We are using Confluent's Kafka S3 Connector, version 5.2.1. Running with one node in a Distributed Worker setting.
Per documentation we should be able to set the flush to S3 both on a size as well ...
1
vote
1
answer
2k
views
Kafka Connect S3 Dynamic S3 Folder Structure Creation?
I have manually installed Confluent Kafka Connect S3 using the standalone method and not through Confluent's process or as part of the whole platform.
I can successfully launch the connector from the ...
3
votes
0
answers
2k
views
Data Loss in Kafka S3 Connector
We are using Kafka S3 Connector for log pipeline, as it guarantees exactly-once semantics. However, we've experienced two data loss events on different topics. We found a suspicious error message in ...
1
vote
2
answers
5k
views
Kafka Connect S3 sink - how to use the timestamp from the message itself [timestamp extractor]
I've been struggling with a problem using kafka connect and the S3 sink.
First the structure:
{
Partition: number
Offset: number
Key: string
Message: json string
Timestamp: timestamp
}...
1
vote
0
answers
798
views
Kafka Connect Sink to S3: `AmazonS3Exception: We encountered an internal error`
I have a Kafka Connect S3 Sink writing records to Amazon S3. This particular sink is writing about 4k rec/sec. Every few days, one of the Kafka Connect worker tasks fails with the following error. A ...
0
votes
1
answer
2k
views
kafka connect transforms RegExRouter exiting with unrecoverable exception
I have made a kafka pipeline to copy a sqlserver table to s3
During sink, i'm trying to transform topic names dropping prefix with the regexrouter function :
"transforms":"dropPrefix",
...
1
vote
1
answer
55
views
Is there a way I can define a function to decide S3 path based on topic message in kafka connect
This question is related to Kafka to S3.
Requirement: One of the kafka topics we are interested has some particular information, i.e, timestamp, table and etc. We can use this data to decide which S3 ...
0
votes
0
answers
593
views
Kafka Connect S3 - get the header from the message
use-case : Consume the messages sent to a topic and store in AWS S3
im using Kafka-s3-connector to achieve this , its working perfectly fine but
Each file is encoded as
...
3
votes
1
answer
903
views
How to configure kafka s3 sink connector for json using its fields AND time based partitioning?
I have a json coming in like this:
{
"app" : "hw",
"content" : "hello world",
"time" : "2018-05-06 12:53:04"
}
I wish to push to S3 in the following file format:
/upper-directory/$...
0
votes
1
answer
674
views
Error creating hive table from avro schema
I am trying to create a hive table by extracting the schema from Avro data which is stored in s3. Data is stored in s3 using the s3 Kafka connector. I am publishing a simple POJO to the producer.
...
5
votes
1
answer
3k
views
Force Confluent s3 sink to flush
I setup kafka connect s3 sink, duration set to 1 hour, and also I setup a rather big flush count, say 10,000. Now if there is not many message in the kafka channel, s3 sink will try to buffer them in ...
5
votes
1
answer
7k
views
How to properly restart a kafka s3 sink connect?
I started a kafka s3 sink connector (bundle connector from confluent package) since 1 May. It works fine until 8 May. Checking the status, it tells that some aws exception crashes this connector. This ...
2
votes
1
answer
1k
views
where does confluent s3 sink put the key?
I setup a confluent s3 sink connect, it stores .avro files in s3.
I dump those files, and find out that they are just the message itself, I don't know where can I find the message key, any idea?
The ...
0
votes
1
answer
924
views
Handle lags in Kafka S3 Connector
We'are using Kafka Connect [distributed, confluence 4.0].
It works very well, except that there always remain an uncommitted messages in the topic that connector listens to. The behavior probably ...
1
vote
1
answer
360
views
confluent kafka to s3 connection failed with ERROR Unexpected exception in Thread[KafkaBasedLog Work Thread -
I set up on EC2 the confluent (4.0) connector that reads from kafka and writes to S3.
The standalone try goes well:
bin/connect-standalone etc/standalone/example-connect-worker.properties etc/...
9
votes
2
answers
8k
views
Properly Configuring Kafka Connect S3 Sink TimeBasedPartitioner
I am trying to use the TimeBasedPartitioner of the Confluent S3 sink. Here is my config:
{
"name":"s3-sink",
"config":{
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"tasks....
7
votes
2
answers
4k
views
Restarting Kafka Connect S3 Sink Task Loses Position, Completely Rewrites everything
After restarting a Kafka Connect S3 sink task, it restarted writing all the way from the beginning of the topic and wrote duplicate copies of older records. In other words, Kafka Connect seemed to ...
1
vote
0
answers
1k
views
Confluent Kafka S3 Connector doesn't upload to S3
I'm trying to follow the tutorial in this link http://docs.confluent.io/current/connect/connect-storage-cloud/kafka-connect-s3/docs/s3_connector.html#quickstart
I managed to create the topic and ...
4
votes
2
answers
3k
views
Kafka Confluent S3 Connector "Failed to find class"
I'm trying a simple quickstart example and I get:
Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to
find any class that implements Connector and which name matches
io....
36
votes
2
answers
24k
views
Ideal value for Kafka Connect Distributed tasks.max configuration setting?
I am looking to productionize and deploy my Kafka Connect application. However, there are two questions I have about the tasks.max setting which is required and of high importance but details are ...