Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak using async generators via session.stream() #8887

Closed
Henkhogan opened this issue Nov 28, 2022 · 5 comments
Closed

Memory leak using async generators via session.stream() #8887

Henkhogan opened this issue Nov 28, 2022 · 5 comments
Labels
cant reproduce memory issues related to memory use upstream dependency issue issue is with one of SQLAlchemy's direct dependencies, such as greenlet

Comments

@Henkhogan
Copy link

Henkhogan commented Nov 28, 2022

Describe the bug

Running the cope snipped provided, I perceive growing memory consumption that appears not reasonable to me since I don't keep any references to the AsyncResult object nor its' returning records

To Reproduce

import asyncio
import gc
from sqlalchemy.future import select
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.ext.asyncio.base import ReversibleProxy
from sqlalchemy.future import select
from sqlalchemy import Column, Identity, PrimaryKeyConstraint, SmallInteger, String, UniqueConstraint
from sqlalchemy.orm import declarative_base

from memory_profiler import profile

Base = declarative_base()
metadata = Base.metadata
class TUser(Base):
    __tablename__ = 't_user'
    __table_args__ = (
        PrimaryKeyConstraint('id', name='t_user_pkey'),
        UniqueConstraint('email', name='t_user_email_key'),
        {'schema': 'admin'}
    )

    id = Column(SmallInteger, Identity(always=True, start=1, increment=1, minvalue=1, maxvalue=32767, cycle=False, cache=1))
    email = Column(String(128), nullable=False)


@profile
def main():
 
    async def amain(engine):
        for _ in range(100):
            async with AsyncSession(bind=engine) as session:        
                s = await session.stream(select(None).where(TUser.id==1))
                [None async for __ in s]
                del s # just to show that the AsyncResult itself is not leaking
                await session.close()  #just to show the session is not leaking

    engine = create_async_engine('postgresql+asyncpg://predictive:password@localhost:5432/predictive')

    asyncio.run(amain(engine)) 
    gc.collect()

    del engine #just to show the engine is not leaking
    gc.collect()

    print(f'{ReversibleProxy._proxy_objects=}') #I suspected there might be leaked _proxy_objects

if __name__ == '__main__':
    main()

Error

python3 -m memory_profiler memory_leak.py
{}
Filename: memory_leak.py

Line # Mem usage Increment Occurrences Line Contents

41     59.4 MiB     59.4 MiB           1   @profile
42                                         def main():
43                                          
44     63.1 MiB      0.0 MiB           2       async def amain(engine):
45     66.6 MiB      0.0 MiB         101           for _ in range(100):
46     66.6 MiB      0.0 MiB         100               _t = False
47     66.6 MiB      0.3 MiB         300               async with AsyncSession(bind=engine) as session:        
48     66.6 MiB      1.7 MiB         419                   s = await session.stream(select(None).where(TUser.id==1))
49     66.6 MiB      1.5 MiB         400                   [None async for __ in s]
50     66.6 MiB      0.0 MiB         100                   del s # just to show that the AsyncResult itself is not leaking
51                                                     #await session.close() #just to show the session is not leaking
52     66.6 MiB      0.0 MiB         200                   await session.close()  #just to show the session is not leaking
53     66.6 MiB      0.0 MiB         100               continue
54                                         
55     63.1 MiB      3.7 MiB           1       engine = create_async_engine('postgresql+asyncpg://predictive:password@localhost:5432/predictive')
56                                         
57     66.7 MiB      0.1 MiB           1       asyncio.run(amain(engine)) 
58     66.7 MiB      0.0 MiB           1       gc.collect()
59                                         
60     66.7 MiB      0.0 MiB           1       del engine #just to show the engine is not leaking
61     66.7 MiB      0.0 MiB           1       gc.collect()
62                                         
63     66.7 MiB      0.0 MiB           1       print(ReversibleProxy._proxy_objects)

Versions

  • OS: Ubuntu 22.10
  • Python: 3.11.0
  • SQLAlchemy: 1.4.44
  • Database: postgres:15.0
  • DBAPI (asyncpg:0.27.0, psycopg2-binary:2.9.5):

Additional context

No response

@Henkhogan Henkhogan added the requires triage New issue that requires categorization label Nov 28, 2022
@zzzeek
Copy link
Member

zzzeek commented Nov 28, 2022

hi -

First off, it's hard for me to understand what you are claiming is a "leak" since your program runs just 100 iterations and exits. A "memory leak" is not simple growth of memory after running some operations, that's completely normal and SQLAlchemy has lots of internal structures that get built up when operations are first run, as well as an internal statement cache. So "run some operations for the first time, memory is now bigger" is perfectly normal.

this is how you show a "leak":

@profile
def main():

    async def amain(engine):
        async with engine.begin() as conn:
            await conn.run_sync(metadata.create_all)
        while True:
            async with AsyncSession(bind=engine) as session:
                s = await session.stream(select(None).where(TUser.id==1))
                [None async for __ in s]
                del s # just to show that the AsyncResult itself is not leaking
                await session.close()  #just to show the session is not leaking

note the "while True". Then you go into top and look at the memory use of the process overall. On this end, it's fixed and not moving at the values you see below (CPU 98% due to the tight loop):

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                          
2724788 classic   20   0  363464  61496  11980 R  98.3   0.8   1:02.61 python                                                                                                           

So I dont see this program indicating any kind of leak.

Beyond that, there is a real memory leak in greenlet with Python 3.11 that was fixed in Greenlet 2.0.1. Make sure you upgrade greenlet since you are using Python 3.11.

@zzzeek zzzeek added cant reproduce memory issues related to memory use expected behavior that's how it's meant to work. consider the "documentation" label in addition and removed requires triage New issue that requires categorization labels Nov 28, 2022
@CaselIT CaselIT added the awaiting info waiting for the submitter to give more information label Nov 28, 2022
@Henkhogan
Copy link
Author

@zzzeek

"So "run some operations for the first time, memory is now bigger" is perfectly normal."

that's absolutely clear, but here I ran the exact same operation multiple times

"note the "while True". Then you go into top and look at the memory use of the process overall. On this end, it's fixed and not moving at the values you see below (CPU 98% due to the tight loop):"

on my machine the memory consumption was growing constantly until it crashed. Furthermore I would expect that at least some memory is released at latest when the session is closed, but that was not the case

Beyond that, there is a real memory leak in python-greenlet/greenlet#328 that was fixed in Greenlet 2.0.1. Make sure you upgrade greenlet since you are using Python 3.11.

Thanks. Here you saved me! After the update it works just fine

@zzzeek zzzeek added external driver issues the issue involves a misbehavior on the part of the DBAPI itself, probably not SQLAlchemy upstream dependency issue issue is with one of SQLAlchemy's direct dependencies, such as greenlet and removed external driver issues the issue involves a misbehavior on the part of the DBAPI itself, probably not SQLAlchemy expected behavior that's how it's meant to work. consider the "documentation" label in addition awaiting info waiting for the submitter to give more information labels Nov 28, 2022
@Henkhogan
Copy link
Author

Henkhogan commented Nov 28, 2022

FYI I found I was using

greenlet==1.1.3.post0

I am sure it was installed as dependency with any other library I installed.

However, I think at least greenlet!=2.0 should also be included here:

greenlet != 0.4.17;(platform_machine=='aarch64' or (platform_machine=='ppc64le' or (platform_machine=='x86_64' or (platform_machine=='amd64' or (platform_machine=='AMD64' or (platform_machine=='win32' or platform_machine=='WIN32'))))))

@zzzeek
Copy link
Member

zzzeek commented Nov 28, 2022

all greenlet versions prior to 2.0.1 leak memory in Python 3.11, whereas on previous Python platforms, all greenlet versions work fine. the above line would have to be further qualified for python 3.11, but this gets into does a downstream package want to block libraries due to API incompatibilities (greenlet 0.4.17 in our case) or additionally for behavioral bugs (all greenlets prior to 2.0.1 on py311 only).

@zzzeek
Copy link
Member

zzzeek commented Nov 28, 2022

I think it would have to look like this

install_requires =
    greenlet >= 2.0.1;python_version>="3.11" and (platform_machine=='aarch64' or (platform_machine=='ppc64le' or (platform_machine=='x86_64' or (platform_machine=='amd64' or (platform_machine=='AMD64' or (platform_machine=='win32' or platform_machine=='WIN32'))))))
    greenlet != 0.4.17;python_version<"3.11" and (platform_machine=='aarch64' or (platform_machine=='ppc64le' or (platform_machine=='x86_64' or (platform_machine=='amd64' or (platform_machine=='AMD64' or (platform_machine=='win32' or platform_machine=='WIN32'))))))

which is not great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cant reproduce memory issues related to memory use upstream dependency issue issue is with one of SQLAlchemy's direct dependencies, such as greenlet
Projects
None yet
Development

No branches or pull requests

3 participants