-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbmmodule bool function is performance prohibitive #46412
Comments
We've been using Python 2.4 to build the new package management software It turns out that we were continually invoking a function in the Looking at dbmmodule.c, the source for dbm.so, is instructive: This is dbm_length, the function that we're _always_ calling. static int
dbm_length(dbmobject *dp)
{
if (dp->di_dbm == NULL) {
PyErr_SetString(DbmError, "DBM object has already been
closed");
return -1;
}
if ( dp->di_size < 0 ) {
datum key;
int size;
size = 0;
for ( key=dbm_firstkey(dp->di_dbm); key.dptr;
key = dbm_nextkey(dp->di_dbm))
size++;
dp->di_size = size;
}
return dp->di_size;
} It's a knock-off of function shown in ndbm(3C) that traverses the Further examination of dbmmodule shows that dbm_length has been assigned static PyMappingMethods dbm_as_mapping = { It looks like dbm_length stashes the size of the database, so it doesn't One of the problem parts of the code is this line in catalog.py: if fmri_list:
if not self.searchdb:
self.searchdb = \
dbm.open(self.searchdb_file, "c") This if not triggers the PyObject_IsTrue that invokes the inquiry operator if fmri_list:
if self.searchdb is None:
self.searchdb = \
dbm.open(self.searchdb_file, "c") We were able to work around the problem by using the is None check, |
Proposed patch is inline. |
I think that "-1" is a sanity check. If the count is updated in the Anybody using "Berkeley DB" related databases knows that "length" is Checking for empty databases should be fast, nevertheless (just iterate |
Somebody posted a similar issue in pybsddb. The patch I proposed would Opinions? See http://mailman.argo.es/pipermail/pybsddb/2008-April/000028.html |
Assigning anything not related to Py3k design to me is a mistake; I |
johansen, could you be happy returning True of False according to |
Yes, True/False should be sufficient for our purposes. IIRC, we were |
I haven't been able to find any of the patches listed in the comments, |
The performance is still an issue in python 3. Measuring ndbm time for a falsey check on an open db with 100 entries. gdbm performance is similar. Also, added a couple tests to verify bool(db) returns the correct value. |
Uploading patch with minor test changes (dbm_bool_b.patch). Backwards compatibility note: Result of running bool(db) on a db that has been closed: I think this is desireable, but we could have bool(db) raise an exception when the db is closed if that is preferred for backwards compatibility. |
Make the changes backward compatible after getting input on possible problems from r.david.murray Now, the only change should be faster performance for bool(db). |
New patch with PEP-7 fix - no c++ // style comments. -Thanks johansen. |
First, Python 2.4 has been out of support for a really long time. Deleting. Eric, let me clarify the situation, because this report is old and I forgot the details. I think current situation is this, when doing something like "if db : DO_SOMETHING": a) If the database is closed, raise an exception. b) If database is empty, returns False. c) If database has any entry, returns True. Takes time proportional to database length, because it is going to go thru it. The patch you are proposing: a) If the database is closed, raise an exception. b) If database is empty, returns 0. c) If database has any entry, returns 1. Fast and simply checking if the database has at least a single record. Why 0/1 instead of True/False?. This is going to be a 3.5 patch, True/False are really there, they are not aliases. PS: When done, I will probably port this patch to current "pybsddb" work, although I am not sure that Berkeley DB has "firstkey" for all database types it support. Would you allow this porting, Eric? https://pypi.python.org/pypi/bsddb3/6.0.1 |
Eric, would you mind to clarify the points I raised in the last message?. Lets move this forward. |
Thank you for the feedback. Sorry I didn't see your previous response until today. I will take a look and respond tonight. |
Hi Jesús, I believe the patch should have this behavior: The 0/1 appears to be converted to Py_TRUE/Py_FALSE in PyObject_IsTrue() in boolobject.c. For background, here's the code path I'm seeing: bool_new(...) (in Objects/boolobject.c) Looking at PyObject_IsTrue is informative. It shows that since tp_as_number->nb_bool is defined (as dbm_bool), it is used instead of the old default of tp_as_mapping->mp_length. I'm still learning CPython, and I'm not sure if everything goes through bool_new(), but I think this makes sense. If you see more places I can/should look for details here, let me know. |
Also, I'm happy to allow the code to be ported to pybsddb. As long as it doesn't cause any problems with CPython licensing - and I can't think of any way it could. |
I would like a bit more comfortable if you return True/False. Maybe I am missing something, I am not familiar with this either, but looks like more... sensible, instead of counting on implicit and magical 0/1 -> False/True conversion. What do you think?. Maybe I am missing something, I am not familiar with the details either. |
BTW, would you mind to sign a contributor form?. |
Oh, I see that your already did. Sorry for the noise. |
I did try the suggestion to return Py_False, but that gives the wrong result since Py_False is not 0 and gets returned as Py_True. I looked for similar code, and this looks like the convention for handling "if obj". PyObject_IsTrue() is called on the object. What it returns can be affected by an nb_bool function that returns 0 or 1 (or -1). Here are a few similar examples that need to implement nb_bool to handle converting an obj to a bool: For many types, there is no nb_bool, and other things are tried such as getting the length. |
OK, you did your homework. I checked "PyObject_Is_True()" function and I agree. This actually looks like a "leak" when True/False were added to Python. Python3 is inheriting it :-). OK. I see three issues in the code: a) You are getting a key from the database, and you are not freeing it. So, this code leaks memory. b) You should check database errors too. Documentation says "When the end of the database is reached, the dptr member of the key is a null pointer. If an error is detected, the dptr member of the key shall be a null pointer and the error condition of the database shall be set.". Raise an exception with the proper error. Would be nice to test that too, but it is probably nos possible. c) Please, use NULL instead of "0". Thanks for your effort, Eric. |
Hi, a) Good find, I added the free() for gdbm. ndbm doesn't need free(). Let me know if you see anything else. |
This looks worthy. I propose to make a new patch that works with 3.11. Also Python code that uses “if db:” should probably use “if db is not None:”, else an empty db will be reopened over and over. |
Based on patches uploaded to BPO by BPO user eolson (Eric Olson). No known email or GitHub username for attribution. (The bug was opened in 2008, not quite a record but still impressive.)
I have a new PR (#96692). It's based on eolson's (Eric Olson's) last patch, from 2014. Alas, I don't have an email or GH username to attribute. |
This will be in 3.12. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: