-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the close method for ElementTree.iterparse() object #69893
Comments
If ElementTree.iterparse() is called with file names, it opens a file. When resulting iterator is not exhausted, the file lefts not closed. >>> import xml.etree.ElementTree as ET
>>> import gc
>>> ET.iterparse('/dev/null')
<xml.etree.ElementTree._IterParseIterator object at 0xb6f9e38c>
>>> gc.collect()
__main__:1: ResourceWarning: unclosed file <_io.BufferedReader name='/dev/null'>
34 Martin Panter proposed in bpo-25688 to add an explicit way to clean it up, like a generator.close() method. |
I am unable to reproduce the issue on Windows 7 with 3.5.0; I have tried opening a small (non-empty) text. Here's the result: >>> import xml.etree.ElementTree as ET
>>> import gc
>>> ET.iterparse("E:/New.txt")
<xml.etree.ElementTree._IterParseIterator object at 0x0023ABB0>
>>> gc.collect()
59 |
You have to enable deprecation warnings. Run the interpreter with the -Wa option. |
Oh, my bad. Ignore my last message, behaviour is identical then. Thanks for clearing that up. |
I don't think there is a need for a close() method. Instead, the iterator should close the file first thing when it's done with it, but only if it owns it. Therefore, the fix in bpo-25688 seems correct. Closing can also be done explicitly in a finaliser of the iterator, if implicit closing via decref is too lax. |
Implicit closing an exhausted iterator helps only the iterator is iterated to the end. If the iteration has been stopped before the end, we get a leak of the file descriptor. Closing the file descriptor in the finalizer can be deferred to undefined term, especially in implementations without reference counting. Since file descriptors are limited resource, this can cause troubles in real programs. Reasons for close() in iterparse objects are the same as for close in files and generators. Maybe we will need to implement the full generator protocol (send() and throw()) in the iterparse objects, but currently I do not know use cases for this. |
Ok, I think it's reasonable to make the resource management explicit for the specific case of letting iterparse() open the file. That suggests that there should also be context manager support, given that safe usages would often involve a try-finally. Since it might not always be obvious for users when they need to close the iterator or not, I would also suggest to not let it raise an error on a double-close, i.e. if .close() was already called or the iterator was already exhausted (and the file closed automatically), calling .close() should just do nothing. |
Python 3.8.2 (default, Apr 8 2020, 14:31:25)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as ET
>>> import gc
>>> ET.iterparse('/dev/null')
<xml.etree.ElementTree.iterparse.<locals>.IterParseIterator object at 0x7fb96f679d00>
>>> gc.collect()
34 The warning(main:1: ResourceWarning: unclosed file <_io.BufferedReader name='/dev/null'>) is no longer available in python3.8.2 |
I recently did a fully compatible implementation that solves an issue discussed here: class iterparse:
__slots__ = '_source', '_opened', 'root', '_next'
def __init__(self, /, source, events = None, parser = None):
# If source cannot be opened,
# an error is emitted and object is moved to gc,
# gc calls __del__ which calls close().
# opened flag must be set to false before opening,
# so close() won't emit AttributeError.
self._opened = False
if hasattr(source, 'read'):
self._source = source
else:
self._source = open(source, 'rb')
self._opened = True
self.root = None
self._next = self._iterator(XMLPullParser(events=events, _parser=parser)).__next__
def __iter__(self, /):
return self
def _iterator(self, parser: XMLPullParser, /):
source = self._source
try:
data = source.read(16 * 1024)
while data:
parser.feed(data)
yield from parser.read_events()
data = source.read(16 * 1024)
root = parser._close_and_return_root()
yield from parser.read_events() # is it necessary?
self.root = root
finally:
self.close()
def __next__(self, /):
return self._next()
def close(self, /):
if self._opened:
self._source.close()
self._opened = False
def __del__(self, /):
self.close() Features:
Benchmarks: def test_creation(file: str, impls: list[type], /):
code = f'iterparse({file!r})'
return tuple(
repeat(
code,
repeat=50,
number=1000,
globals=dict(iterparse=t),
)
for t in impls
)
def test_traverse(file: str, impls: list[type], /):
setup = f'it = iterparse({file!r})'
code = f'for _ in it: pass'
return tuple(
repeat(
code,
setup=setup,
repeat=50,
number=10,
globals=dict(iterparse=t),
)
for t in impls
)
def main():
file_path = '20220628-FULL-1_1(xsd).xml'
impls = [iterparse_new, iterparse_old]
creation = test_creation(file_path, impls)
for i, time in enumerate(creation):
print(f'Creation time of {impls[i].__name__}:', min(time))
print()
traverse = test_traverse(file_path, impls)
for i, time in enumerate(traverse):
print(f'Traverse time of {impls[i].__name__}:', min(time))
if __name__ == '__main__':
main()
|
Recent improvements (#101438) made explicit close() not needed in CPython. But it can still be useful in alternative implementations like PyPy, and maybe in future Python versions, so it is worth to add it. |
ET.iterparse()
when not exhausted #31696Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: