Distributed Computing with Python
()
About this ebook
- You'll learn to write data processing programs in Python that are highly available, reliable, and fault tolerant
- Make use of Amazon Web Services along with Python to establish a powerful remote computation system
- Train Python to handle data-intensive and resource hungry applications
This book is for Python developers who have developed Python programs for data processing and now want to learn how to write fast, efficient programs that perform CPU-intensive data processing tasks.
Related to Distributed Computing with Python
Related ebooks
Mastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsLearning Flask Framework Rating: 4 out of 5 stars4/5Building Web Applications with Flask Rating: 0 out of 5 stars0 ratingsFlask Blueprints Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsLearning Functional Data Structures and Algorithms Rating: 0 out of 5 stars0 ratingsNumPy: Beginner's Guide - Third Edition Rating: 4 out of 5 stars4/5Mastering IPython 4.0 Rating: 0 out of 5 stars0 ratingsModular Programming with Python Rating: 0 out of 5 stars0 ratingsScala Functional Programming Patterns Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsMachine Learning Systems: Designs that scale Rating: 0 out of 5 stars0 ratingsMachine Learning Bookcamp: Build a portfolio of real-life projects Rating: 4 out of 5 stars4/5Pandas in Action Rating: 0 out of 5 stars0 ratingsNumPy Cookbook Rating: 5 out of 5 stars5/5Julia for Data Science Rating: 0 out of 5 stars0 ratingsParallel Programming with Python Rating: 0 out of 5 stars0 ratingsJulia High Performance Rating: 4 out of 5 stars4/5Building Python Real-Time Applications with Storm Rating: 0 out of 5 stars0 ratingsExtending Docker Rating: 0 out of 5 stars0 ratingsLearning ClojureScript Rating: 0 out of 5 stars0 ratingsGet Programming with Scala Rating: 0 out of 5 stars0 ratingsFunctional Programming in Scala Rating: 4 out of 5 stars4/5Practical C++ Backend Programming Rating: 0 out of 5 stars0 ratingsTest-Driven Machine Learning Rating: 0 out of 5 stars0 ratingsNumPy Beginner's Guide Rating: 5 out of 5 stars5/5Flask By Example Rating: 0 out of 5 stars0 ratingsPractices of the Python Pro Rating: 0 out of 5 stars0 ratingsPython High Performance - Second Edition Rating: 0 out of 5 stars0 ratings
Computers For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Algorithms to Live By: The Computer Science of Human Decisions Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5The Alignment Problem: How Can Machines Learn Human Values? Rating: 4 out of 5 stars4/5Good Code, Bad Code: Think like a software engineer Rating: 5 out of 5 stars5/5Get Into UX: A foolproof guide to getting your first user experience job Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5Practical Data Analysis Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit Rating: 4 out of 5 stars4/5Product Operations: How successful companies build better products at scale Rating: 0 out of 5 stars0 ratingsFundamentals of Programming: Using Python Rating: 5 out of 5 stars5/5The Jobs To Be Done Playbook: Align Your Markets, Organization, and Strategy Around Customer Needs Rating: 5 out of 5 stars5/5UX/UI Design Playbook Rating: 4 out of 5 stars4/5Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning Rating: 5 out of 5 stars5/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5
Reviews for Distributed Computing with Python
0 ratings0 reviews
Book preview
Distributed Computing with Python - Francesco Pierfederici
Table of Contents
Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. An Introduction to Parallel and Distributed Computing
Parallel computing
Distributed computing
Shared memory versus distributed memory
Amdahl's law
The mixed paradigm
Summary
2. Asynchronous Programming
Coroutines
An asynchronous example
Summary
3. Parallelism in Python
Multiple threads
Multiple processes
Multiprocess queues
Closing thoughts
Summary
4. Distributed Applications – with Celery
Establishing a multimachine environment
Installing Celery
Testing the installation
A tour of Celery
More complex Celery applications
Celery in production
Celery alternatives – Python-RQ
Celery alternatives – Pyro
Summary
5. Python in the Cloud
Cloud computing and AWS
Creating an AWS account
Creating an EC2 instance
Storing data in Amazon S3
Amazon elastic beanstalk
Creating a private cloud
Summary
6. Python on an HPC Cluster
Your typical HPC cluster
Job schedulers
Running a Python job using HTCondor
Running a Python job using PBS
Debugging
Summary
7. Testing and Debugging Distributed Applications
The big picture
Common problems – clocks and time
Common problems – software environments
Common problems – permissions and environments
Common problems – the availability of hardware resources
Challenges – the development environment
A useful strategy – logging everything
A useful strategy – simulating components
Summary
8. The Road Ahead
The first two chapters
The tools
The cloud and the HPC world
Debugging and monitoring
Where to go next
Index
Distributed Computing with Python
Distributed Computing with Python
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2016
Production reference: 1060416
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-969-1
www.packtpub.com
Credits
Author
Francesco Pierfederici
Reviewer
James King
Commissioning Editor
Veena Pagare
Acquisition Editor
Aaron Lazar
Content Development Editor
Parshva Sheth
Technical Editor
Abhishek R. Kotian
Copy Editor
Neha Vyas
Project Coordinator
Nikhil Nair
Proofreader
Safis Editing
Indexer
Rekha Nair
Graphics
Disha Haria
Production Coordinator
Melwyn Dsa
Cover Work
Melwyn Dsa
About the Author
Francesco Pierfederici is a software engineer who loves Python. He has been working in the fields of astronomy, biology, and numerical weather forecasting for the last 20 years.
He has built large distributed systems that make use of tens of thousands of cores at a time and run on some of the fastest supercomputers in the world. He has also written a lot of applications of dubious usefulness but that are great fun. Mostly, he just likes to build things.
I would like to thank my wife, Alicia, for her unreasonable patience during the gestation of this book. I would also like to thank Parshva Sheth and Aaron Lazar at Packt Publishing and the technical reviewer, James King, who were all instrumental in making this a better book.
About the Reviewer
James King is a software developer with a broad range of experience in distributed systems. He is a contributor to many open source projects including OpenStack and Mozilla Firefox. He enjoys mathematics, horsing around with his kids, games, and art.
www.PacktPub.com
eBooks, discount offers, and more
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Preface
Parallel and distributed computing is a fascinating subject that only a few years ago developers in only a very few large companies and national labs were privy to. Things have changed dramatically in the last decade or so, and now everybody can build small- and medium-scale distributed applications in a variety of programming languages including, of course, our favorite one: Python.
This book is a very practical guide for Python programmers who are starting to build their own distributed systems. It starts off by illustrating the bare minimum theoretical concepts needed to understand parallel and distributed computing in order to lay the basic foundations required for the rest of the (more practical) chapters.
It then looks at some first examples of parallelism using nothing more than modules from the Python standard library. The next step is to move beyond the confines of a single computer and start using more and more nodes. This is accomplished using a number of third-party libraries, including Celery and Pyro.
The remaining chapters investigate a few deployment options for our distributed applications. The cloud and classic High Performance Computing (HPC) clusters, together with their strengths and challenges, take center stage.
Finally, the thorny issues of monitoring, logging, profiling, and debugging are touched upon.
All in all, this is very much a hands-on book, teaching you how to use some of the most common frameworks and methodologies to build parallel and distributed systems in Python.
What this book covers
Chapter 1, An Introduction to Parallel and Distributed Computing, takes you through the basic theoretical foundations of parallel and distributed computing.
Chapter 2, Asynchronous Programming, describes the two main programming styles used in distributed applications: synchronous and asynchronous programming.
Chapter 3, Parallelism in Python, shows you how to do more than one thing at the same time in your Python code, using nothing more than the Python standard library.
Chapter 4, Distributed Applications – with Celery, teaches you how to build simple distributed applications using Celery and some of its competitors: Python-RQ and Pyro.
Chapter 5, Python in the Cloud, shows how you can deploy your Python applications on the cloud using Amazon Web Services.
Chapter 6, Python on an HPC Cluster, shows how to deploy your Python applications on a classic HPC cluster, typical of many universities and national labs.
Chapter 7, Testing and Debugging Distributed Applications, talks about the challenges of testing, profiling, and debugging distributed applications in Python.
Chapter 8, The Road Ahead, looks at what you have learned so far and which directions interested readers could take to push their development of distributed systems further.
What you need for this book
The following software and hardware is recommended:
Python 3.5 or later
A laptop or desktop computer running Linux or Mac OS X
Ideally, some extra computers or some extra virtual machines to test your distributed applications
All software mentioned in this book is free of charge and can be downloaded from the Internet with the exception of PBS Pro, which is commercial. Most of the PBS Pro functionality, however, is available in its close sibling Torque, which is open source.
Who this book is for
This book is for developers who already know Python and want to learn how to parallelize their code and/or write distributed systems. While a Unix environment is assumed, most if not all of the examples should also work on Windows systems.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: Import the concurrent.futures module.
A block of code is set as follows:
class Foo:
def __init__(self):
Docstring
self.bar = 42
# A comment
return
Any command-line input or output is written as follows:
bookuser@hostname$ python3.5 ./foo.py
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Clicking the Next button moves you to the next screen.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
Chapter 1. An Introduction to Parallel and Distributed Computing
The first modern digital computer was invented in the late 30s and early 40s (that is, arguably, the Z1 from Konrad Zuse in 1936), probably before most of the readers of this book—let alone