-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a file_digest() function in hashlib #89313
Comments
I am proposing the addition of a very simple helper to return the hash of a file. |
Hey Tarek, long time no see!
In a perfect world, the hash and hmac objects should get an "update_file" method. The OpenSSL-based hashes could even release the GIL and utilize OpenSSL's BIO layer to avoid any Python overhead. |
Hey Christian, I hope things are well for you! |
Tarek, Are you still working on this? Would you like me to take over? Aur |
@aur, go for it, I started to implement it and got lost into the details for each backend.. |
OK, I'll give it a go. |
PR contains a draft implementation, would appreciate some review before I implement the same interface on all builtin hashes as well as OpenSSL hashes. |
The rationale behind If you want to read from a buffered file object, sure, just call |
Forgot an important warning: this is the first time I write C code against the Python API, and I didn't thoroughly read the guide (or at all, to be honest). I think I did a good job, but please suspect my code of noob errors. I'm especially not confident that it's OK to not do any special handling of signals. Can read() return 0 if it was interrupted by a signal? This will stop the hash calculation midway and behave as if it succeeded. Sounds suspiciously like something we don't want. Also, I probably should support signals because such a long operation is something the user definitely might want to interrupt? May I have some guidance please? Would it be enough to copy the code from fileutils.c _Py_Read() and addi an outer loop so we can do many reads with the GIL released and still call PyErr_CheckSignals when needed with the GIL taken? |
Added an attempt to handle signals. I don't think it's working, because when I press Ctrl+C while hashing a long file, it only raises KeyboardInterrupt after waiting the amount of time it usually takes the C code to return, but maybe that's not a good test? |
Before we continue hacking on an implementation, let's discuss some API design.
|
I don't think HMAC of a file is a common enough use case to support, but I have absolutely no problem conceding this point, the cost of supporting it is very low. I/O in C is a world of pain in general. In the specific case of Now, we could just be happy with In all other cases you just call For the same reason I think the fast path should only support hash names and not constructors/functions/etc', which would complicate it because new-object-can-be-accessed-without-GIL wouldn't necessarily apply. Does this make sense? |
@tiran Can you made a PR adding the |
Automerge-Triggered-By: GH:pablogsal
) Automerge-Triggered-By: GH:pablogsal (cherry picked from commit 0b329f4) Co-authored-by: Christian Heimes <[email protected]>
Automerge-Triggered-By: GH:pablogsal (cherry picked from commit 0b329f4) Co-authored-by: Christian Heimes <[email protected]>
Hey folks. I know this is closed and perhaps I should simply file a new request... but would you consider to extend the interface of that function to (efficiently) calculate a file's hashsum for multiple algorithms (i.e. without reading it once for every algo)? One could perhaps do so by accepting some array for Cheers, |
Please file a new feature request issue here for that. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: