Skip to content
This repository has been archived by the owner on Jul 13, 2023. It is now read-only.

:hash interpolation for generic obfuscated URLs and :timestamp bugfix #416

Merged
1 commit merged into from
Feb 16, 2011

Conversation

jmileham
Copy link
Contributor

Hi there,

I'm building a site that needs to securely obfuscate attachments in a public S3 bucket. The solution I arrived at seems like it might be useful to the community as a general-purpose tool that remains customizable enough to allow developers with different requirements to implement it differently without having to hack into Paperclip core. It is a set of storage-agnostic extensions to Attachment and Interpolations and supporting tests.

It features:

  • No extra data model required
  • Choice of HMAC digest algorithm on a per-attachment basis via openssl (modeled after ActiveSupport::MessageVerifier, default algo is SHA1)
  • Choice of fields to include in the hash on a per-attachment basis, which permits many developer-selected tradeoffs between security and flexibility -- some examples:
    • :hash_data => ":class/:attachment/:id" - Unique per asset, long-lived filename with deducible style URLs (e.g. you want a 3rd party site to be able to generate style URLs from a single URL trunk that you provide once and always get the latest attachment)
    • :hash_data => ":class/:attachment/:id/:style" - Adds non-deducible style URLs (e.g. you want to store uncompressed :originals but don't want end users to be able to find them and kill your bandwidth bill)
    • :hash_data => ":class/:attachment/:id/:style/:updated_at" - Adds non-deducible versions (e.g. if a user's private profile photo URL makes it out into the wild via a malicious friend, they can unfriend, upload a new photo and the leak is plugged) -- this is the default

Since the hashes are extremely unlikely to collide, you can also remove other elements of the :path that were previously required for global uniqueness (like :id), which can give your users plausible deniability of ownership, or even completely obliterate the directory structure (:path => ":hash" -- or less controversially :path => ":hash.:extension") such that an attacker can't even prove what kind of attachment he/she has gained access to. Paired with a large :hash_secret and HTTPS, your users are left with a similar security guarantee to the analog hole -- attackers would need an authorized-user mole, or would need to compromise either your server-side secret or an authorized user's machine in order to gain access to protected attachments, and then would be able to freely distribute them (sadly, using your hosting infrastructure as an accomplice).

The :timestamp bug

While building this, I ran across what I think is a bug: The :timestamp interpolation is non-deterministic in the presence of per-thread time_zones, e.g. displaying time zones per user location, which would affect any such site that uses :timestamp in an attachment :path or :url (or if I had used :timestamp in :hash). While using :timestamp in paths and URLs is probably a little-used feature given the general wordiness of Time#to_s, and the intersection of sites that attempt to use it as well as implement per-user time zones is a vanishingly small slice of Paperclip users, I think it's worth taking a look at.

I took a stab at dealing with it by adding a new attachment config parameter :use_default_time_zone (defaulted to true) that explicitly churns out :timestamp interpolations in the server-wide default time zone. This is not perfect, because it's possible that a site owner would want to change the default time zone via config.time_zone=, which would in turn invalidate all of their existing attachment URLs. One option available to such an implementor would be to redefine my new method Attachment#time_zone explicitly to meet their requirements. Or I could add a :time_zone option on Attachment, defaulted to null, and falling through to the present Attachment#time_zone implementation. Such an option could even optionally take a block a la options[:url] and options[:path].

While the ideal path for :timestamp is unclear to me as I'm not deeply familiar with the design philosophy behind Paperclip or the direction it wants to head in (aggressively fixing holes like this or supporting users who may rely on legacy behavior?), it seemed natural to me to expose the integer Attachment#updated_at value as an interpolation as an alternative to :timestamp. Epoch seconds don't carry any time zone information and are therefor immune to this issue, plus they're presumably faster to convert to than text, and fewer characters to hash to boot.

So my present implementation has a provisionally "improved" behavior for :timestamp (which should be backward compatible for everybody who didn't run into the bug personally), and a new :updated_at interpolation that merely exposes the existing Attachment instance method. I'd be eager for feedback on what if anything should be done with that stuff.

The caveats:

  • I'm pretty new to testing, and this was definitely my first time writing tests using shoulda and test/unit. Feel free to slap me for transgressions in style and abuse of mocks/stubs/fakes/dummies.
  • I tried to follow house style to the extent I could derive it, but I'm sure I got it wrong in places.
  • I'm not sure what went on with the appraisal gemfile.locks reverting to an earlier version dependency on Rails -- perhaps this is expected? Again, never used appraisal before either. Tips would be more than welcome.

I'd be very excited if you see some or all of these changes as valuable to the Paperclip mainstream, and am happy to iterate this into something worthy of pulling.

Thanks a lot!

-john

@jyurek
Copy link

jyurek commented Feb 16, 2011

Thanks for this patch! I've pulled it into master.

@eyvoro
Copy link

eyvoro commented Mar 31, 2011

Very good!!!

@owahab
Copy link

owahab commented Jul 6, 2011

Amazing feature with no tangible documentation. Any chance you can help the community using this amazing feature?

@pjungwir
Copy link

This is a great feature, and I'm glad to see it pulled into master. But I'm having trouble with including :updated_at in the :hash parameter. My image gets stored on S3 at one path, but then when I say user.photo(:original) I get a different path. I guess this is because Paperclip is storing the file /before/ the model is saved, so that the path is immediately wrong because further requests for the path will depend on a new updated_at. But I find this hard to believe, because I guess the feature works for other folks?

Incidentally, it seems that another tangle about using updated_at is that if you alter anything else in the model, you'll need to remember to move the file(s) on S3. Is that right?

@jmileham
Copy link
Contributor Author

The updated at uses the attachment's updated at column so unrelated model changes shouldn't affect it. We've been using this code in production at ImpulseSave for about as long as the pull request has been in the wild and it manages file names without breaking references without problems as long as you don't go behind paperclip's back when updating your models.

I would make sure that the model instance you're asking the URL of is fresh and aware of the latest save and that you're storing an :original. If your hash_data includes style, that means that each style will have a distinct hash too of course, so different hashes don't necessarily mean out-of-date, possibly just for a different attachment style.

If you'd like to throw up a gist of what you're working with I'd be happy to take a quick look.

@firecall
Copy link

firecall commented Jan 6, 2013

Can this work with existing models? I tried it, but it obfuscates the existing normal file name and breaks any existing paths.

Any way to just apply it to new uploads?

@jmileham
Copy link
Contributor Author

jmileham commented Jan 6, 2013

Migrating will probably take a little bit of creativity... Your best net might be to start with a new paperclip attachment on the same model that uses hashes and either migrate the assets across in bulk (maybe via a rake task?) or use the new attachment moving forward and fall back to the old on a read-only basis.

@firecall
Copy link

firecall commented Jan 6, 2013

Thanks for the quick response :-) I'll work something up in the morning when I'm freshly caffeinated.

Copy link

I have a problem with the file name in the _file_name DB field. I do this:

has_attached_file :video,
:storage => :s3,
:s3_credentials => "#{Rails.root}/config/s3.yml",
:path => "/videos/:hash.:extension",
:hash_secret => "[my secret string]"

When a file is uploaded the _file_name field get set to the original name of the file and not the hashed one. Am I doing something wrong?

@jmileham
Copy link
Contributor Author

The generated URL is never persisted to the database. my_model.video.url should return what you're looking for.

Copy link

Thanks - but this gives the whole URL. How about adding something like this:

module Paperclip
class Attachment
def file_name
File.basename(url)
end
end
end

@jmileham
Copy link
Contributor Author

The filename is not really meaningful to Paperclip without the rest of its URL -- depending on how you configure the :path, the filename might be identical for every instance, and might only be distinct based on the rest of the path. If presenting the filename on its own makes sense in your application, you probably just want to put that in a helper method.

@guyisra
Copy link

guyisra commented May 13, 2013

anyone figured out how to easily move attachments with old path to new obfuscated path?

@quentindemetz
Copy link

quentindemetz commented Apr 14, 2017

I am having a problem with this feature and am losing my images.

Consider the following scenario for a likely explanation:
I have in my initializer:

Paperclip::Attachment.default_options[:hash_data]    = ':class/:attachment/:id/:style/:updated_at'

and my Person model:

class Person < ApplicationRecord
  has_attached_file :avatar,
      styles: { tiny: ['70x70#', :jpg] },
      hash_secret: ENV['PAPERCLIP_HASH_SECRET'], 
  [...]
end

Everything works fine! I'm happy, my image urls are obfuscated, looks cool and shiny.

Tomorrow I add a new style:

styles: { tiny: ['70x70#', :jpg], medium: ['100x100#', :jpg }

I follow the paperclip documentation to process all my existing images for the new medium style. I create the relevant .yml file with the following contents:

---
:Person:
  :avatar:
    - :tiny

And I run the following rake task (defined in https://github.com/thoughtbot/paperclip/blob/master/lib/tasks/paperclip.rake)

rake paperclip:refresh:missing_styles

According to the rake task definition, it only generates the styles that are missing.
In my case, it generates only the ':medium' style by calling person.avatar.reprocess!(:medium)

However, this operation in turn updates the avatar_updated_at column. This "invalidates" all my previous images (including the original one) as the hash data has changed but the images have not been renamed. Life sucks.

tl, dr: don't use the updated_at in hash_data

@jmileham
Copy link
Contributor Author

Howdy, just FYI, this feature predates refresh:missing_styles by 6 months and at the time of its addition to Paperclip, the best practice (and in fact only way to get new styles rendered) would have been to refresh all the styles when new style definitions were added, which would work fine. Anyway, as described, it does sound like a gnarly bug and should definitely be addressed.

While it's tempting to include a tldr with simple fix that seems like a no-brainer (why was it even designed this way to begin with?), there's a tradeoff here that's worth acknowledging. What you suggest might be the right move all-in, but it's worth noting that when you remove updated_at from the mix, the filename will not change if the file is replaced with a new file.

For example, if you're on a social network with profile photos backed by the hash feature and a stalker manages to get the URL of your profile photo (say through a one-time compromise of a laptop of a friend), the stalker will forever be able to get your latest profile photo. That's a big difference in contract, and probably a surprising one to an app developer.

Another alternative that wouldn't change the security properties here would be simply to disable the refresh:missing_styles rake task when using hash_data that includes avatar_updated_at, or possibly suppress touching avatar_updated_at when refreshing missing styles (not sure this is possible or desirable, but it might actually be more semantically correct).

Or a third path might be to add an optional self-incrementing file version number column to the hash that indicates that a new file has been uploaded. It would provide you with the new secret URL for each distinct file uploaded. It would keep from unnecessarily changing the hash during operations that are fundamentally no-ops on the avatar itself. This would require the app developer to carry around more columns in their schema, though, which seems painful.

I'm sure the debate and a pull request would be welcome to help fix it. Sadly I think these two features came in from completely different contributors who weren't aware of the potential interplay. Hashing, being a lossy operation, can end up in these kinds of knife-edge situations, and it's too bad that an unrelated feature ended up having such a scary edge case.

@jmileham
Copy link
Contributor Author

A quick dive reveals that refresh:missing_styles calls refresh:thumbnails which calls reprocess! which calls assign which calls assign_attributes which calls assign_timestamps which blows away updated_at, which is only a problem if you're not updating all the styles at once.

It seems like reprocessing isn't really an assignment and probably shouldn't be exercising that complete codepath unmodified. That'd by my recommendation for resolution.

@NeoElit
Copy link

NeoElit commented Apr 21, 2017

I'm sorry to comment on this closed issue. But isn't this issue worth mentioning in the Readme? that refresh:missing_styles will break the urls if updated_at is included in the hash_data which is the case by default

@jmileham
Copy link
Contributor Author

I'd be in favor of some resolution, for sure. A readme comment seems like potentially admitting defeat rather than making the behavior less surprising, but could be a good stopgap. @jyurek any thoughts?

@FinnWoelm
Copy link

Would a simple fix be to just "skip" updating updated_at when refresh:missing_styles is called? Given that updated_at should ideally tell us the last time the user updated their profile picture/.../..., we might not want updated_at to change anyway!

I do not understand the code layout well enough to check if this would be an easy fix or not. Maybe one of you who is better informed, can have a look :)

You would want to use (Rails 5)

 attachment.save(:touch => false)

or (Rails pre-5, I think)

 attachment.save(:updated_at => attachment.updated_at)

See: http://blog.bigbinary.com/2016/05/09/rails-5-allows-updating-without-updating-timestamps.html

@FinnWoelm
Copy link

Or what if the value for :hash_data defaulted to :class/:attachment/:id/:style/:file_size. There's a chance that you'll have exactly the same file size for two images -- but if I'm not mistaken, the odds are rather low?

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.