All Questions
Tagged with boto amazon-emr
44 questions
1
vote
0
answers
637
views
EMR Cluster: AutoScaling Policy For Instance Group Could Not Attach And Failed
I am trying to automate the EMR cluster creation through boto3. Unfortunately, I'm getting the following warning:
The Auto Scaling policy for instance group ig-MI0ANZ0C3WNN in Amazon EMR cluster j-...
0
votes
1
answer
333
views
Arguments for jar file incorrect - spinning up EMR cluster using Boto3
I am writing a Python code using library Boto3 to spin up an EMR cluster. During the Steps part, I have my jar file listed. This jar file is a Scala script that takes arguments like this:
-l '...
0
votes
1
answer
3k
views
Understanding EMR autoscaling
I have the following code, which is working fine:
def emr_client():
config = get_aws_config()
return boto3.client(
'emr',
region_name=config['aws_region'],
...
1
vote
2
answers
2k
views
How to get the list of instances for AWS EMR?
Why is the list for EC2 different from the EMR list?
EC2: https://aws.amazon.com/ec2/spot/pricing/
EMR: https://aws.amazon.com/emr/pricing/
Why are not all the types of instances from the EC2 ...
7
votes
4
answers
9k
views
How to wait for a step completion in AWS EMR cluster using Boto3
Given a step id I want to wait for that AWS EMR step to finish. How can I achieve this? Is there a built-in function?
At the time of writing, the Boto3 Waiters for EMR allow to wait for Cluster ...
5
votes
1
answer
4k
views
Failing to find script-runner.jar
Here's the code to install and run hive over EMR
args = ['s3://' + zone_name + '.elasticmapreduce/libs/hive/hive-script',
'--base-path', 's3://' + zone_name + '.elasticmapreduce/libs/hive/',
'...
0
votes
1
answer
219
views
BOTO throwing UnboundLocalError while doing get_bucket
I am trying to upload a big file of ~46Gb file to S3 from EMR using boto.
The code I wrote is
>>> import math, os
>>> import boto
>>> from filechunkio import FileChunkIO
#...
0
votes
1
answer
2k
views
Create EMR Cluster and Terminate after running Python script from S3 using boto3
Is it possible to use boto3 to create an emr cluster and read a python script in s3 and then terminate. I know this could be done with creating cluster and then manually copying the script from s3 to ...
4
votes
2
answers
494
views
How can I launch an EMR using SPOT Block using boto?
How can I launch an EMR using spot block (AWS) using boto ? I am trying to launch it using boto but I cannot find any parameter --block-duration-minutes in boto, I am unable to find how to do this ...
7
votes
2
answers
4k
views
Boto3 EMR - Hive step
Is it possible to carry out hive steps using boto 3? I have been doing so using AWS CLI, but from the docs (http://boto3.readthedocs.org/en/latest/reference/services/emr.html#EMR.Client....
0
votes
2
answers
1k
views
Error while connecting to a region under a profile
As per document region is also a parameter of boto.emr.EmrConnection class however, I get the follwoing error while making the connection:
conn = boto.emr.EmrConnection(profile_name='profile_name', ...
10
votes
1
answer
6k
views
ClusterID vs JobFlowID on AWS EMR
I am a bit confused about the APIs available and the two identifiers.
I am using boto, but don't think that is the problem here : my question regards any api (but not cli).
I start a JobFlow with ...
2
votes
0
answers
612
views
Elastic MapReduce with boto - InstanceProfile is required for creating cluster
Im trying to do a elastic mapreduce job with code below, but when I try this I get an error: InstanceProfile is required for creating cluster
Someone knows why Im getting this error?
def createmrjob(...
1
vote
1
answer
2k
views
EMR/boto - How to get cluster id and step id using boto?
There are some describe_* functions in boto.emr need step_id. But the document does not describe very clearly how to obtain the step_id after submitting steps.
How can I get these step_ids after ...
3
votes
1
answer
10k
views
How to get data from s3 and do some work on it? python and boto
I have a project task to use some output data I have already produced on s3 in an EMR task. So previously I have ran an EMR job that produced some output in one of my s3 buckets in the form of ...
2
votes
1
answer
254
views
Creating EMR using Boto fails
I am trying to create emr cluster from python using the boto library,
I tried a few things but the end result is "Shut down as step failed"
I tried running an example code that amazon supplies about ...
0
votes
1
answer
235
views
boto-emr job error: python broken pipeline error and java.lang.OutOfMemoryError
I've prepared a streaming boto jobflow on AWS/EMR that runs perfectly well using the familiar test pipe:
sed -n '0~10000p' Big.csv | ./map.py | sort -t$'\t' -k1 | ./reduce.py
The boto emr job run ...
1
vote
0
answers
181
views
EmrResponseError: 505 HTTP Version Not Supported
I got the following error when I ran the python file from ec2 machine.
Hadoop version: 2.4.0
ami version : 3.5.0
Boto Version : 2.32.0
Traceback (most recent call last):
File "/home/ec2-user/...
0
votes
1
answer
66
views
AWS EMR: how to get the first element out of describe_jobflows() API call result
I cannot figure out how to get the first element of the result from calling one of the boto emr APIs:
describe_jobflows()
i know it returns a list of jobflows, but when I'm trying to access it by ...
0
votes
0
answers
69
views
AWS: sort all emr (Elastic MapReduce) jobflows based on its terminated time in python boto
One easy question for gurus, but I just cannot figure it out:
I'm using boto python API, the relevant code is:
terminatedjobflows = emr_connection.decribe_jobflows(states=["TERMINATED"], ...
0
votes
1
answer
656
views
boto does not like EMR BootstrapAction paramater
I'm trying to launch AWS EMR cluster using boto library, everything works well.
Because of that I need to install required python libraries, tried to add bootstrap action step using boto.emr....
1
vote
1
answer
931
views
Hue install bootstrap error using AWS EMR with Boto
With the release of AMI 3.3.0, AWS supports Hue as an installable "app" in EMR, like Hive/Pig. Using the EMR web UI, creating a cluster with Hue works fine for me, however when adding a Hue ...
0
votes
1
answer
42
views
ElasticMapReduce streaming compressed output
I'm running streaming jobs, with python scripts for the map and reduce. The job flow I create with the boto library.
I'm using gzip input files. How can I create gzip output files, though?
17
votes
2
answers
3k
views
AWS EMR perform "bootstrap" script on all the already running machines in cluster
I have one EMR cluster which is running 24/7. I can't turn it off and launch the new one.
What I would like to do is to perform something like bootstrap action on the already running cluster, ...
23
votes
4
answers
25k
views
How to launch and configure an EMR cluster using boto
I'm trying to launch a cluster and run a job all using boto.
I find lot's of examples of creating job_flows. But I can't for the life of me, find an example that shows:
How to define the cluster to ...
2
votes
3
answers
4k
views
How to access EMR master private ip address using pure python / boto
I've searched on this site and google but have not been able to get an answer for this.
I have code running from an EC2 instance which creates and manager EMR clusters using boto.
I can use this ...
1
vote
1
answer
2k
views
Boto EMR creation gives "Log Uri is not in the required format" error
The following code gives the error message
EmrResponseError: EmrResponseError: 400 Bad Request <ErrorResponse xmlns="http://elasticmapreduce.amazonaws.com/doc/2009-03-31"> <Error>
&...
5
votes
1
answer
6k
views
how to install custom packages on amazon EMR bootstrap action in code?
need to install some packages and binaries on the amazon EMR bootstrap action but I can't find any example that uses this.
Basically, I want to install python package, and specify each hadoop node to ...
1
vote
2
answers
2k
views
Unable to paginate EMR cluster using boto
I have about 55 EMR clusters (all of them were terminated) and have been trying to retrieve the entire 55 EMR clusters using the list_clusters method in boto. I've been searching for examples about ...
1
vote
1
answer
483
views
EMR Job Failing
Folks,
The following python script is terminating with
job state = FAILED
and
Last State Change: Access denied checking streaming input path: s3n://elasticmapreduce/samples/wordcount/input/
...
1
vote
1
answer
2k
views
Copying/using Python files from S3 to Amazon Elastic MapReduce at bootstrap time
I've figured out how to install python packages (numpy and such) at the bootstrapping step using boto, as well as copying files from S3 to my EC2 instances, still with boto.
What I haven't figured ...
4
votes
1
answer
1k
views
How can I use s3 object names as inputs to an MRJob mapper, but not the s3 objects themselves?
I'm missing something obvious about Yelp's mrjob job library. Setting up an MRJob class is almost trivially easy. Running it over a file or stdin also so. But how can I change the input to the job ...
1
vote
1
answer
1k
views
Map Reduce multiple outputs in python boto
I am trying to partition an input file using AWS EMR.
I use a streaming step to read from stdin.
I want to split this file into 2 files based on the values of specific fields from each line of stdin ...
0
votes
1
answer
2k
views
Splitting a file using Map Reduce
I would like to split the content of a text file into 2 different files using EMR.
The input file, as well as the mapper and reducer scripts are all stored in AWS' S3.
Currently, my mapper reformats ...
0
votes
1
answer
1k
views
Backup DynamoDB Table with dynamic columns to S3
I have read several other posts about this and in particular this question with an answer by greg about how to do it in Hive. I would like to know how to account for DynamoDB tables with variable ...
0
votes
0
answers
338
views
AWS Elastic Mapreduce optimizing Pig job
I am using boto 2.8.0 to create EMR jobflows over large log file stored in S3. I am relatively new to Elastic Mapreduce and am getting the feel for how to properly handle jobflows from this issue.
...
9
votes
1
answer
6k
views
Elastic Map Reduce: difference between CANCEL_AND_WAIT and CONTINUE?
I just found that using Amazon's Elastic Map Reduce, I can specify a step to have one of three ActionOnFailure choices:
TERMINATE_JOB_FLOW
CANCEL_AND_WAIT
CONTINUE
TERMINATE_JOB_FLOW is the default ...
8
votes
1
answer
2k
views
Setting hadoop parameters with boto?
I am trying to enable bad input skipping on my Amazon Elastic MapReduce jobs. I am following the wonderful recipe described here:
http://devblog.factual.com/practical-hadoop-streaming-dealing-with-...
2
votes
1
answer
1k
views
EMR + DynamoDB workflow setup throws Hive.createTable NoSuchMethodError JsonErrorResponseHandler
I am trying to set up EMR workflow (with DynamoDB and Hive) using boto Python API.
I could run the script manually using Amazon EMR Console. However with boto it fails
at creating tables.
Here's the ...
2
votes
3
answers
3k
views
Boto: how to keep EMR job flow running after completion/failure?
How can I add steps to a waiting Amazon EMR job flow using boto without the job flow terminating once complete?
I've created an interactive job flow on Amazon's Elastic Map Reduce and loaded some ...
8
votes
1
answer
3k
views
Python client support for running Hive on top of Amazon EMR
I've noticed that neither mrjob nor boto supports a Python interface to submit and run Hive jobs on Amazon Elastic MapReduce (EMR). Are there any other Python client libraries that supports running ...
4
votes
1
answer
4k
views
boto ElasticMapReduce throttling and rate limiting
I've run into rate limting from Amazon EMR a few times via boto API with the following:
boto.exception.EmrResponseError: EmrResponseError: 400 Bad Request
<ErrorResponse xmlns="http://...
2
votes
1
answer
250
views
Get the number of completed steps in an Amazon Elastic MapReduce jobflow via boto
To avoid the overhead of setting up instances everytime I submit a job, I use a jobflow that's always in waiting mode after each job completion. However, according to this page, "a maximum of 256 ...
2
votes
3
answers
1k
views
What is wrong with my boto elastic mapreduce jar jobflow parameters?
I am using the boto library to create a job flow in Amazons Elastic MapReduce Webservice (EMR). The following code should create a step:
step2 = JarStep(name='Find similiar items',
jar='...