Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
1 vote
0 answers
637 views

EMR Cluster: AutoScaling Policy For Instance Group Could Not Attach And Failed

I am trying to automate the EMR cluster creation through boto3. Unfortunately, I'm getting the following warning: The Auto Scaling policy for instance group ig-MI0ANZ0C3WNN in Amazon EMR cluster j-...
Riley Hun's user avatar
  • 2,757
0 votes
1 answer
333 views

Arguments for jar file incorrect - spinning up EMR cluster using Boto3

I am writing a Python code using library Boto3 to spin up an EMR cluster. During the Steps part, I have my jar file listed. This jar file is a Scala script that takes arguments like this: -l '...
Hana's user avatar
  • 1,470
0 votes
1 answer
3k views

Understanding EMR autoscaling

I have the following code, which is working fine: def emr_client(): config = get_aws_config() return boto3.client( 'emr', region_name=config['aws_region'], ...
Alon's user avatar
  • 11.8k
1 vote
2 answers
2k views

How to get the list of instances for AWS EMR?

Why is the list for EC2 different from the EMR list? EC2: https://aws.amazon.com/ec2/spot/pricing/ EMR: https://aws.amazon.com/emr/pricing/ Why are not all the types of instances from the EC2 ...
Keepun's user avatar
  • 87
7 votes
4 answers
9k views

How to wait for a step completion in AWS EMR cluster using Boto3

Given a step id I want to wait for that AWS EMR step to finish. How can I achieve this? Is there a built-in function? At the time of writing, the Boto3 Waiters for EMR allow to wait for Cluster ...
fuggy_yama's user avatar
5 votes
1 answer
4k views

Failing to find script-runner.jar

Here's the code to install and run hive over EMR args = ['s3://' + zone_name + '.elasticmapreduce/libs/hive/hive-script', '--base-path', 's3://' + zone_name + '.elasticmapreduce/libs/hive/', '...
vks's user avatar
  • 67.9k
0 votes
1 answer
219 views

BOTO throwing UnboundLocalError while doing get_bucket

I am trying to upload a big file of ~46Gb file to S3 from EMR using boto. The code I wrote is >>> import math, os >>> import boto >>> from filechunkio import FileChunkIO #...
UnderWood's user avatar
  • 873
0 votes
1 answer
2k views

Create EMR Cluster and Terminate after running Python script from S3 using boto3

Is it possible to use boto3 to create an emr cluster and read a python script in s3 and then terminate. I know this could be done with creating cluster and then manually copying the script from s3 to ...
horatio1701d's user avatar
  • 9,149
4 votes
2 answers
494 views

How can I launch an EMR using SPOT Block using boto?

How can I launch an EMR using spot block (AWS) using boto ? I am trying to launch it using boto but I cannot find any parameter --block-duration-minutes in boto, I am unable to find how to do this ...
Rahul Arora's user avatar
7 votes
2 answers
4k views

Boto3 EMR - Hive step

Is it possible to carry out hive steps using boto 3? I have been doing so using AWS CLI, but from the docs (http://boto3.readthedocs.org/en/latest/reference/services/emr.html#EMR.Client....
intl's user avatar
  • 2,773
0 votes
2 answers
1k views

Error while connecting to a region under a profile

As per document region is also a parameter of boto.emr.EmrConnection class however, I get the follwoing error while making the connection: conn = boto.emr.EmrConnection(profile_name='profile_name', ...
ankit tyagi's user avatar
10 votes
1 answer
6k views

ClusterID vs JobFlowID on AWS EMR

I am a bit confused about the APIs available and the two identifiers. I am using boto, but don't think that is the problem here : my question regards any api (but not cli). I start a JobFlow with ...
user2123288's user avatar
  • 1,119
2 votes
0 answers
612 views

Elastic MapReduce with boto - InstanceProfile is required for creating cluster

Im trying to do a elastic mapreduce job with code below, but when I try this I get an error: InstanceProfile is required for creating cluster Someone knows why Im getting this error? def createmrjob(...
techman's user avatar
  • 433
1 vote
1 answer
2k views

EMR/boto - How to get cluster id and step id using boto?

There are some describe_* functions in boto.emr need step_id. But the document does not describe very clearly how to obtain the step_id after submitting steps. How can I get these step_ids after ...
Fred Pym's user avatar
  • 2,329
3 votes
1 answer
10k views

How to get data from s3 and do some work on it? python and boto

I have a project task to use some output data I have already produced on s3 in an EMR task. So previously I have ran an EMR job that produced some output in one of my s3 buckets in the form of ...
Nino Dorić's user avatar
2 votes
1 answer
254 views

Creating EMR using Boto fails

I am trying to create emr cluster from python using the boto library, I tried a few things but the end result is "Shut down as step failed" I tried running an example code that amazon supplies about ...
ohad edelstain's user avatar
0 votes
1 answer
235 views

boto-emr job error: python broken pipeline error and java.lang.OutOfMemoryError

I've prepared a streaming boto jobflow on AWS/EMR that runs perfectly well using the familiar test pipe: sed -n '0~10000p' Big.csv | ./map.py | sort -t$'\t' -k1 | ./reduce.py The boto emr job run ...
user2105469's user avatar
  • 1,433
1 vote
0 answers
181 views

EmrResponseError: 505 HTTP Version Not Supported

I got the following error when I ran the python file from ec2 machine. Hadoop version: 2.4.0 ami version : 3.5.0 Boto Version : 2.32.0 Traceback (most recent call last): File "/home/ec2-user/...
Brisi's user avatar
  • 1,811
0 votes
1 answer
66 views

AWS EMR: how to get the first element out of describe_jobflows() API call result

I cannot figure out how to get the first element of the result from calling one of the boto emr APIs: describe_jobflows() i know it returns a list of jobflows, but when I'm trying to access it by ...
Fisher Coder's user avatar
  • 3,566
0 votes
0 answers
69 views

AWS: sort all emr (Elastic MapReduce) jobflows based on its terminated time in python boto

One easy question for gurus, but I just cannot figure it out: I'm using boto python API, the relevant code is: terminatedjobflows = emr_connection.decribe_jobflows(states=["TERMINATED"], ...
Fisher Coder's user avatar
  • 3,566
0 votes
1 answer
656 views

boto does not like EMR BootstrapAction paramater

I'm trying to launch AWS EMR cluster using boto library, everything works well. Because of that I need to install required python libraries, tried to add bootstrap action step using boto.emr....
code_ada's user avatar
  • 884
1 vote
1 answer
931 views

Hue install bootstrap error using AWS EMR with Boto

With the release of AMI 3.3.0, AWS supports Hue as an installable "app" in EMR, like Hive/Pig. Using the EMR web UI, creating a cluster with Hue works fine for me, however when adding a Hue ...
erikreed's user avatar
  • 1,539
0 votes
1 answer
42 views

ElasticMapReduce streaming compressed output

I'm running streaming jobs, with python scripts for the map and reduce. The job flow I create with the boto library. I'm using gzip input files. How can I create gzip output files, though?
eran's user avatar
  • 15.1k
17 votes
2 answers
3k views

AWS EMR perform "bootstrap" script on all the already running machines in cluster

I have one EMR cluster which is running 24/7. I can't turn it off and launch the new one. What I would like to do is to perform something like bootstrap action on the already running cluster, ...
ziky90's user avatar
  • 2,697
23 votes
4 answers
25k views

How to launch and configure an EMR cluster using boto

I'm trying to launch a cluster and run a job all using boto. I find lot's of examples of creating job_flows. But I can't for the life of me, find an example that shows: How to define the cluster to ...
eran's user avatar
  • 15.1k
2 votes
3 answers
4k views

How to access EMR master private ip address using pure python / boto

I've searched on this site and google but have not been able to get an answer for this. I have code running from an EC2 instance which creates and manager EMR clusters using boto. I can use this ...
krack krackerz's user avatar
1 vote
1 answer
2k views

Boto EMR creation gives "Log Uri is not in the required format" error

The following code gives the error message EmrResponseError: EmrResponseError: 400 Bad Request <ErrorResponse xmlns="http://elasticmapreduce.amazonaws.com/doc/2009-03-31"> <Error> &...
gallamine's user avatar
  • 875
5 votes
1 answer
6k views

how to install custom packages on amazon EMR bootstrap action in code?

need to install some packages and binaries on the amazon EMR bootstrap action but I can't find any example that uses this. Basically, I want to install python package, and specify each hadoop node to ...
KJW's user avatar
  • 15.3k
1 vote
2 answers
2k views

Unable to paginate EMR cluster using boto

I have about 55 EMR clusters (all of them were terminated) and have been trying to retrieve the entire 55 EMR clusters using the list_clusters method in boto. I've been searching for examples about ...
Nicholas Key's user avatar
  • 1,429
1 vote
1 answer
483 views

EMR Job Failing

Folks, The following python script is terminating with job state = FAILED and Last State Change: Access denied checking streaming input path: s3n://elasticmapreduce/samples/wordcount/input/ ...
Cmag's user avatar
  • 15.7k
1 vote
1 answer
2k views

Copying/using Python files from S3 to Amazon Elastic MapReduce at bootstrap time

I've figured out how to install python packages (numpy and such) at the bootstrapping step using boto, as well as copying files from S3 to my EC2 instances, still with boto. What I haven't figured ...
user2105469's user avatar
  • 1,433
4 votes
1 answer
1k views

How can I use s3 object names as inputs to an MRJob mapper, but not the s3 objects themselves?

I'm missing something obvious about Yelp's mrjob job library. Setting up an MRJob class is almost trivially easy. Running it over a file or stdin also so. But how can I change the input to the job ...
Christopher's user avatar
  • 44.2k
1 vote
1 answer
1k views

Map Reduce multiple outputs in python boto

I am trying to partition an input file using AWS EMR. I use a streaming step to read from stdin. I want to split this file into 2 files based on the values of specific fields from each line of stdin ...
Zihs's user avatar
  • 347
0 votes
1 answer
2k views

Splitting a file using Map Reduce

I would like to split the content of a text file into 2 different files using EMR. The input file, as well as the mapper and reducer scripts are all stored in AWS' S3. Currently, my mapper reformats ...
Zihs's user avatar
  • 347
0 votes
1 answer
1k views

Backup DynamoDB Table with dynamic columns to S3

I have read several other posts about this and in particular this question with an answer by greg about how to do it in Hive. I would like to know how to account for DynamoDB tables with variable ...
user1601333's user avatar
0 votes
0 answers
338 views

AWS Elastic Mapreduce optimizing Pig job

I am using boto 2.8.0 to create EMR jobflows over large log file stored in S3. I am relatively new to Elastic Mapreduce and am getting the feel for how to properly handle jobflows from this issue. ...
Mark Grey's user avatar
  • 10.2k
9 votes
1 answer
6k views

Elastic Map Reduce: difference between CANCEL_AND_WAIT and CONTINUE?

I just found that using Amazon's Elastic Map Reduce, I can specify a step to have one of three ActionOnFailure choices: TERMINATE_JOB_FLOW CANCEL_AND_WAIT CONTINUE TERMINATE_JOB_FLOW is the default ...
Suman's user avatar
  • 9,561
8 votes
1 answer
2k views

Setting hadoop parameters with boto?

I am trying to enable bad input skipping on my Amazon Elastic MapReduce jobs. I am following the wonderful recipe described here: http://devblog.factual.com/practical-hadoop-streaming-dealing-with-...
slavi's user avatar
  • 401
2 votes
1 answer
1k views

EMR + DynamoDB workflow setup throws Hive.createTable NoSuchMethodError JsonErrorResponseHandler

I am trying to set up EMR workflow (with DynamoDB and Hive) using boto Python API. I could run the script manually using Amazon EMR Console. However with boto it fails at creating tables. Here's the ...
Taher Saeed's user avatar
2 votes
3 answers
3k views

Boto: how to keep EMR job flow running after completion/failure?

How can I add steps to a waiting Amazon EMR job flow using boto without the job flow terminating once complete? I've created an interactive job flow on Amazon's Elastic Map Reduce and loaded some ...
Matt Hampel's user avatar
  • 5,227
8 votes
1 answer
3k views

Python client support for running Hive on top of Amazon EMR

I've noticed that neither mrjob nor boto supports a Python interface to submit and run Hive jobs on Amazon Elastic MapReduce (EMR). Are there any other Python client libraries that supports running ...
poiuy's user avatar
  • 500
4 votes
1 answer
4k views

boto ElasticMapReduce throttling and rate limiting

I've run into rate limting from Amazon EMR a few times via boto API with the following: boto.exception.EmrResponseError: EmrResponseError: 400 Bad Request <ErrorResponse xmlns="http://...
poiuy's user avatar
  • 500
2 votes
1 answer
250 views

Get the number of completed steps in an Amazon Elastic MapReduce jobflow via boto

To avoid the overhead of setting up instances everytime I submit a job, I use a jobflow that's always in waiting mode after each job completion. However, according to this page, "a maximum of 256 ...
poiuy's user avatar
  • 500
2 votes
3 answers
1k views

What is wrong with my boto elastic mapreduce jar jobflow parameters?

I am using the boto library to create a job flow in Amazons Elastic MapReduce Webservice (EMR). The following code should create a step: step2 = JarStep(name='Find similiar items', jar='...
Thomas's user avatar
  • 10.6k