Real World Django
Real World Django
Real World Django
Jacob Kaplan-Moss
OSCON 2009
http://jacobian.org/TN
Jacob Kaplan-Moss
http://jacobian.org / [email protected] / @jacobian
2
Shameless plug:
http://revsys.com/
3
Hat tip:
James Bennett (http://b-list.org)
4
So you’ve written a
Django site…
5
… now what?
6
• API Metering • Distributed Log storage, analysis
• Backups & Snapshots • Graphing
• Counters • HTTP Caching
• Cloud/Cluster Management Tools • Input/Output Filtering
• Instrumentation/Monitoring • Memory Caching
• Failover • Non-relational Key Stores
• Node addition/removal and hashing • Rate Limiting
• Auto-scaling for cloud resources • Relational Storage
• CSRF/XSS Protection • Queues
• Data Retention/Archival • Rate Limiting
• Deployment Tools • Real-time messaging (XMPP)
• Multiple Devs, Staging, Prod • Search
• Data model upgrades • Ranging
• Rolling deployments • Geo
• Multiple versions (selective beta) • Sharding
• Bucket Testing • Smart Caching
• Rollbacks • Dirty-table management
• CDN Management
• Distributed File Storage
http://randomfoo.net/2009/01/28/infrastructure-for-modern-web-sites
7
The bare minimum:
• Test.
• Structure for deployment.
• Use deployment tools.
• Design a production environment.
• Monitor.
• Tune.
8
Testing
9
“ Tests are the
Programmer’s stone,
transmuting fear into
boredom.
”
— Kent Beck
10
Hardcore TDD
11
“
I don’t do test driven
development. I do stupidity
driven testing… I wait until
I do something stupid, and
then write tests to avoid
”
doing it again.
— Titus Brown
12
Whatever happens, don’t let
your test suite break thinking,
“I’ll go back and fix this later.”
13
Unit testing unittest
doctest
Functional/behavior
testing
django.test.Client, Twill
15
Testing Django
16
Unit tests
• “Whitebox” testing
• Verify the small functional units of your
app
• Very fine-grained
• Familier to most programmers (JUnit,
NUnit, etc.)
• Provided in Python by unittest
17
django.test.TestCase
• Fixtures.
• Test client.
• Email capture.
• Database management.
• Slower than unittest.TestCase.
18
class StoryAddViewTests(TestCase):
fixtures = ['authtestdata', 'newsbudget_test_data']
urls = 'newsbudget.urls'
def test_story_add_get(self):
r = self.client.get('/budget/stories/add/')
self.assertEqual(r.status_code, 200)
…
def test_story_add_post(self):
data = {
'title': 'Hungry cat is hungry',
'date': '2009‐01‐01',
}
r = self.client.post('/budget/stories/add/', data)
self.assertEqual(r.status_code, 302)
…
19
Doctests
• Easy to write & read.
• Produces self-documenting code.
• Great for cases that only use assertEquals.
• Somewhere between unit tests and
functional tests.
• Difficult to debug.
• Don’t always provide useful test failures.
20
class Choices(object):
"""
Easy declarative "choices" tool::
>>> STATUSES = Choices("Live", "Draft")
# Acts like a choices list:
>>> list(STATUSES)
[(1, 'Live'), (2, 'Draft')]
# Easily convert from code to verbose:
>>> STATUSES.verbose(1)
'Live'
# ... and vice versa:
>>> STATUSES.code("Draft")
2
"""
…
21
****************************************************
File "utils.py", line 150, in __main__.Choices
Failed example:
STATUSES.verbose(1)
Expected:
'Live'
Got:
'Draft'
****************************************************
22
Functional tests
• a.k.a “Behavior Driven Development.”
• “Blackbox,” holistic testing.
• All the hardcore TDD folks look down on
functional tests.
• But they keep your boss happy.
• Easy to find problems; harder to find the
actual bug.
23
Functional testing
tools
• django.test.Client
• webunit
• Twill
• ...
24
django.test.Client
25
class StoryAddViewTests(TestCase):
fixtures = ['authtestdata', 'newsbudget_test_data']
urls = 'newsbudget.urls'
def test_story_add_get(self):
r = self.client.get('/budget/stories/add/')
self.assertEqual(r.status_code, 200)
…
def test_story_add_post(self):
data = {
'title': 'Hungry cat is hungry',
'date': '2009‐01‐01',
}
r = self.client.post('/budget/stories/add/', data)
self.assertEqual(r.status_code, 302)
…
26
Web browser testing
27
Browser testing tools
• Selenium
• Windmill
28
“Exotic” testing
29
30
Further resources
31
Structuring
applications for reuse
32
Designing for reuse
33
1.
Do one thing, and do it well.
34
Application == encapsulation
35
Focus
36
Good focus
37
Bad focus
38
Warning signs
• Lots of files.
• Lots of modules.
• Lots of models.
• Lots of code.
39
Small is good
40
Approach features skeptically
41
2.
Don’t be afraid of many apps.
42
The monolith anti-pattern
43
(I blame Rails)
44
The Django mindset
45
Django encourages this
• INSTALLED_APPS
• Applications are just Python packages,
not some Django-specific “app” or
“plugin.”
• Abstractions like django.contrib.sites
make you think about this as you develop.
46
Spin off a new app?
47
The ideal:
48
I need a contact form
49
urlpatterns = ('',
…
(r'^contact/', include('contact_form.urls')),
…
)
50
Done.
(http://bitbucket.org/ubernostrum/django-contact-form/)
51
But… what about…
52
53
3.
Write for flexibility.
54
Common sense
• Sane defaults.
• Easy overrides.
• Don’t set anything in stone.
55
Forms
56
Templates
57
Form processing
58
def edit_entry(request, entry_id):
form = EntryForm(request.POST or None)
if form.is_valid():
form.save()
return redirect('entry_detail', entry_id)
return render_to_response('entry/form.html', {…})
59
def edit_entry(request, entry_id,
form_class=EntryForm,
template_name='entry/form.html',
post_save_redirect=None):
form = form_class(request.POST or None)
if form.is_valid():
form.save()
if post_save_redirect:
return redirect(post_save_redirect)
else:
return redirect('entry_detail', entry_id)
return render_to_response([template_name, 'entry/form.html'], {…})
60
URLs
61
4.
Build to distribute (even private code).
62
What the tutorial teaches
myproject/
settings.py
urls.py
myapp/
models.py
mysecondapp/
views.py
…
63
from myproject.myapp.models import …
from myproject. myapp.models import …
myproject.settings
myproject.urls
64
Project coupling
kills re-use
65
Projects in real life.
• A settings module.
• A root URLConf.
• Maybe a manage.py (but…)
• And that’s it.
66
Advantages
67
You don’t even need a project
68
ljworld.com:
• worldonline.settings.ljworld
• worldonline.urls.ljworld
• And a whole bunch of apps.
69
Where apps really live
• Single module directly on Python path
(registration, tagging, etc.).
• Related modules under a top-level
package (ellington.events,
ellington.podcasts, etc.)
• No projects (ellington.settings
doesn’t exist).
70
Want to distribute?
71
General best practices
• Establish dependancy rules.
• Establish a minimum Python version
(suggestion: Python 2.5).
• Establish a minimum Django version
(suggestion: Django 1.0).
• Test frequently against new versions
of dependancies.
72
Document obsessively.
73
5.
Embrace and extend.
74
Don’t touch!
75
But this application
wasn’t meant to be
extended!
76
Python Power!
77
Extending a view
78
Extending a model
79
Extending a form
• Subclass it.
• There is no step 2.
80
Other tricks
81
If you must make
changes to
external code…
82
Keep changes to a minimum
83
Stay up-to-date
84
Use a good VCS
• Subversion vendor branches don’t cut it.
• DVCSes are perfect for this:
• Mercurial queues.
• Git rebasing.
• At the very least, maintain a patch queue
by hand.
85
Be a good citizen
86
Further reading
87
Deployment
88
Deployment should...
• Be automated.
• Automatically manage dependencies.
• Be isolated.
• Be repeatable.
• Be identical in staging and in production.
• Work the same for everyone.
89
Dependency
Isolation Automation
management
pip Puppet/Chef/…
zc.buildout
90
Dependancy management
91
Vendor packages
• APT, Yum, …
• The good: familiar tools; stability; handles
dependancies not on PyPI.
• The bad: small selection; not (very)
portable; hard to supply user packages.
• The ugly: installs packages system-wide.
92
easy_install
93
pip
http://pip.openplans.org/
94
zc.buildout
http://buildout.org/
95
Package isolation
• Why?
• Site A requires Foo v1.0; site B requires
Foo v2.0.
• You need to develop against multiple
versions of dependancies.
96
Package isolation tools
• Virtual machines (Xen, VMWare, EC2, …)
• Multiple Python installations.
• “Virtual” Python installations.
• virtualenv
http://pypi.python.org/pypi/virtualenv
• zc.buildout
http://buildout.org/
97
Why automate?
• “I can’t push this fix to the servers until
Alex gets back from lunch.”
• “Sorry, I can’t fix that. I’m new here.”
• “Oops, I just made the wrong version of
our site live.”
• “It’s broken! What’d you do!?”
98
Automation basics
99
Capistrano
http://capify.org/
100
Fabric
http://fabfile.org/
101
Configuration management
102
Recommendations
Pip, Virtualenv, and Fabric
103
Production
environments
104
net.
LiveJournal Backend: Today
(Roughly.)
BIG-IP
perlbal (httpd/proxy) Global Database
bigip1 mod_perl
bigip2 proxy1 master_a master_b
web1
proxy2 web2
proxy3 web3 Memcached
slave1 slave2 ... slave5
djabberd proxy4 mc1
web4
djabberd proxy5
... mc2 User DB Cluster 1
djabberd
webN mc3 uc1a uc1b
mc4 User DB Cluster 2
... uc2a uc2b
gearmand
Mogile Storage Nodes gearmand1 mcN User DB Cluster 3
sto1 sto2 gearmandN uc3a uc3b
Mogile Trackers
... sto8
tracker1 tracker3 User DB Cluster N
ucNa ucNb
MogileFS Database “workers”
gearwrkN Job Queues (xN)
mog_a mog_b theschwkN jqNa jqNb
slave1 slaveN
http://danga.com/words/
Brad Fitzpatrik, http://danga.com/words/2007_06_usenix/
3
105
django
database
media
server
106
Application servers
• Apache + mod_python
• Apache + mod_wsgi
• Apache/lighttpd + FastCGI
• SCGI, AJP, nginx/mod_wsgi, ...
107
Use mod_wsgi
108
WSGIScriptAlias / /home/mysite/mysite.wsgi
109
import os, sys
# Add to PYTHONPATH whatever you need
sys.path.append('/usr/local/django')
# Set DJANGO_SETTINGS_MODULE
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
# Create the application for mod_wsgi
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
110
“Scale”
111
Does this scale?
django
database
media
server
Maybe!
112
Things per secong
Number of things
113
Real-world example
Database A
175 req/s
Database B
75 req/s
114
Real-world example
http://tweakers.net/reviews/657/6
115
django
media
web server
database
database server
116
Why separate hardware?
• Resource contention
• Separate performance concerns
• 0 → 1 is much harder than 1 → N
117
DATABASE_HOST = '10.0.0.100'
FAIL 118
Connection middleware
• Proxy between web and database layers
• Most implement hot fallover and
connection pooling
• Some also provide replication, load
balancing, parallel queries, connection
limiting, &c
• DATABASE_HOST = '127.0.0.1'
119
Connection middleware
• PostgreSQL: pgpool
• MySQL: MySQL Proxy
• Database-agnostic: sqlrelay
• Oracle: ?
120
django media
web server media server
database
database server
121
Media server traits
• Fast
• Lightweight
• Optimized for high concurrency
• Low memory overhead
• Good HTTP citizen
122
Media servers
• Apache?
• lighttpd
• nginx
• S3
123
The absolute minimum
django media
web server media server
database
database server
124
The absolute minimum
django media
database
web server
125
proxy media
database
database server
126
Why load balancers?
127
Load balancer traits
128
Load balancers
• Apache + mod_proxy
• perlbal
• nginx
• Varnish
• Squid
129
CREATE POOL mypool
POOL mypool ADD 10.0.0.100
POOL mypool ADD 10.0.0.101
CREATE SERVICE mysite
SET listen = my.public.ip
SET role = reverse_proxy
SET pool = mypool
SET verify_backend = on
SET buffer_size = 120k
ENABLE mysite
130
you@yourserver:~$ telnet localhost 60000
pool mysite add 10.0.0.102
OK
nodes 10.0.0.101
10.0.0.101 lastresponse 1237987449
10.0.0.101 requests 97554563
10.0.0.101 connects 129242435
10.0.0.101 lastconnect 1237987449
10.0.0.101 attempts 129244743
10.0.0.101 responsecodes 200 358
10.0.0.101 responsecodes 302 14
10.0.0.101 responsecodes 207 99
10.0.0.101 responsecodes 301 11
10.0.0.101 responsecodes 404 18
10.0.0.101 lastattempt 1237987449
131
proxy proxy proxy media media
132
“Shared nothing”
133
BALANCE = None
def balance_sheet(request):
global BALANCE
if not BALANCE:
bank = Bank.objects.get(...)
BALANCE = bank.total_balance()
...
FAIL 134
Global variables are
right out
135
from django.cache import cache
def balance_sheet(request):
balance = cache.get('bank_balance')
if not balance:
bank = Bank.objects.get(...)
balance = bank.total_balance()
cache.set('bank_balance', balance)
...
WIN 136
def generate_report(request):
report = get_the_report()
open('/tmp/report.txt', 'w').write(report)
return redirect(view_report)
def view_report(request):
report = open('/tmp/report.txt').read()
return HttpResponse(report)
FAIL 137
Filesystem?
What filesystem?
138
Further reading
139
Monitoring
140
Goals
• When the site goes down, know it immediately.
• Automatically handle common sources of
downtime.
• Ideally, handle downtime before it even happens.
• Monitor hardware usage to identify hotspots and
plan for future growth.
• Aid in postmortem analysis.
• Generate pretty graphs.
141
Availability monitoring
principles
• Check services for availability.
• More then just “ping yoursite.com.”
• Have some understanding of dependancies.
• Notify the “right” people using the “right”
methods, and don’t stop until it’s fixed.
• Minimize false positives.
• Automatically take action against common
sources of downtime.
142
Availability monitoring tools
• Internal tools
• Nagios
• Monit
• Zenoss
• ...
• External monitoring tools
143
Usage monitoring
144
Usage monitoring tools
• RRDTool
• Munin
• Cacti
• Graphite
145
146
147
Logging
148
Logging tools
• print
• Python’s logging module
• syslogd
149
Log analysis
• grep | sort | uniq ‐c | sort ‐rn
• Load log data into relational databases,
then slice & dice.
• OLAP/OLTP engines.
• Splunk.
• Analog, AWStats, ...
• Google Analytics, Mint, ...
150
What to monitor?
• Everything possible.
• The answer to “should I monitor this?” is
always “yes.”
151
Performance
And when you should care.
152
Ignore performance
Step 1: write your app.
Step 2: make it work.
Step 3: get it live.
Step 4: get some users.
…
Step 94,211: tune.
153
Ignore performance
154
Low-hanging fruit
• Lots of DB queries.
• Rule of thumb: O(1) queries per view.
• Very complex queries.
• Read-heavy vs. write-heavy.
155
Anticipate bottlenecks
156
“It’s slow!”
157
Define “slow”
158
159
YSlow
http://developer.yahoo.com/yslow/
160
Server-side
performance tuning
161
Tuning in a nutshell
• Cache.
• Cache some more.
• Improve your caching strategy.
• Add more cache layers.
• Then, maybe, tune your code.
162
Caching is magic
163
Caching is about
trade-offs
164
Caching questions
• Cache for everybody? Only logged-in users?
Only non-paying users?
• Long timeouts/stale data? Short timeouts/
worse performance?
• Invalidation: time-based? Data based? Both?
• Just cache everything? Or just some views?
Or just the expensive parts?
• Django’s cache layer? Proxy caches?
165
Common caching strategies
• Are most of your users anonymous? Use
CACHE_MIDDLEWARE_ANONYMOUS_ONLY
• Are there just a couple of slow views? Use
@cache_page.
• Need to cache everything? Use a site wide
cache.
• Everything except a few views? Use
@never_cache.
166
Site-wide caches
167
External caches
168
Conditional view
processing
169
GET / HTTP/1.1
Host: www2.ljworld.com/
HTTP/1.1 200 OK
Server: Apache
Expires: Wed, 17 Jun 2009 18:17:18 GMT
ETag: "93431744c9097d4a3edd4580bf1204c4"
…
GET / HTTP/1.1
Host: www2.ljworld.com/
If‐None‐Match: "93431744c9097d4a3edd4580bf1204c4"
HTTP/1.1 304 NOT MODIFIED
…
GET / HTTP/1.1
Host: www2.ljworld.com/
If‐Modified‐Since: Wed, 17 Jun 2009 18:00:00 GMT
HTTP/1.1 304 NOT MODIFIED
…
170
Etags
171
When caching fails…
172
“I think I need a bigger box.”
173
Where to spend money
174
No money?
175
Web server
improvements
• Start with simple improvements: turn off
Keep-Alive, tweak MaxConnections; etc.
• Use a better application server
(mod_wsgi).
• Investigate light-weight web servers
(nginx, lighttpd).
176
Database tuning
177
Build a toolkit
• profile, cProfile
• strace, SystemTap, dtrace.
• Django debug toolbar
http://bit.ly/django-debug-toolbar
178
More…
http://jacobian.org/r/django-cache
http://jacobian.org/r/django-conditional-views
179
Final thoughts
• Writing the code is the easy part.
• Making it work in the Real World is that
part that’ll make you lose sleep.
• Don’t worry too much: performance
problems are good problems to have.
• But worry a little bit: “an ounce of
prevention is worth a pound of cure.”
180
Fin.
Contact me: [email protected] / @jacobian
181