0

I have the following combination of tap/target in Meltano: tap-marketo and target-s3-parquet.

I want to extract data from tap-marketo from data A to date B in the past.

I saw that we can only define start_date and max_export_days.

I have tried to start with start_date A and stop the run once I reach B. But this does not work.

The loader only emit the state once their work is completely done, and the target is not called. So a load was not done.

I also saw that, the export is being done.

{'run_id': '46ba5256-7019-48c7-890a-28746bb5272a', 'state_id': '2023-02-09T152428--tap-marketo--target-s3-parquet', 'stdio': 'stderr', 'cmd_type': 'extractor', 'name': 'tap-marketo', 'event': 'INFO GET: https://XXXXXXX/bulk/v1/activities/export/6636daf1-ad1e-41e1-b8d5-cdd31de5d4e0/file.json', 'level': 'info', 'timestamp': '2023-02-09T17:35:02.098016Z'}

But where do I find this file in my container?

I want to invoke the target separately but need to give the --input.

# meltano invoke target-s3-parquet --help
Environment 'dev' is active
Usage: target-s3-parquet [OPTIONS]

  Execute the Singer target.

Options:
  --input FILENAME          A path to read messages from instead of from
                            standard in.
  --config TEXT             Configuration file location or 'ENV' to use
                            environment variables.
  --format [json|markdown]  Specify output style for --about
  --about                   Display package metadata and settings.
  --version                 Display the package version.
  --help                    Show this message and exit.
1
  • I've provided an answer below that should help with an isolated replay/retry of just the target. Hopefully this helps! Commented Feb 9, 2023 at 19:59

1 Answer 1

4

To invoke the tap and target separately

meltano invoke tap-marketo > ./outfile.singer.jsonl
cat ./outfile.singer.jsonl | meltano invoke target-s3-parquet

Which is equivalent to:

meltano invoke tap-marketo > ./outfile.singer.jsonl
meltano invoke target-s3-parquet --input=./outfile.singer.jsonl

In both of the above cases, you can retry just the second step.

However, if you invoke both together, using meltano run tap-marketo target-s3-parquet or similar, the intermediate file will not be stored on disk, and you would not be able to replay just the target-side processing.

Why these files aren't stored on disk by default

The stream of messages you'll see in the examples above will necessarily contain potentially secret or confidential data, and the volume contained within the stream can be extremely large, since it contains the records themselves as well as metadata used for coordinating between the tap and target. For this reason, this stream of messages from tap to target is not stored to disk during a normal sync operation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.