3

I'm currently writing a simple board game in Python and I just realized that garbage collection doesn't purge the discarded bitmap data from memory when images are reloaded. It happens only when game is started or loaded or the resolution changes but it multiples the memory consumed so I can't let this problem unsolved.

When images are reloaded all references are transferred to the new image data since it is binded to the same variable as the original image data was binded to. I tried to force the garbage collection by using collect() but it didn't help.

I wrote a small sample to demonstrate my problem.

from tkinter import Button, DISABLED, Frame, Label, NORMAL, Tk
from PIL.Image import open
from PIL.ImageTk import PhotoImage

class App(Tk):
    def __init__(self):
        Tk.__init__(self)
        self.text = Label(self, text = "Please check the memory usage. Then push button #1.")
        self.text.pack()
        self.btn = Button(text = "#1", command = lambda : self.buttonPushed(1))
        self.btn.pack()

    def buttonPushed(self, n):
        "Cycle to open the Tab module n times."
        self.btn.configure(state = DISABLED) # disable to prevent paralell cycles
        if n == 100:
            self.text.configure(text = "Overwriting the bitmap with itself 100 times...\n\nCheck the memory usage!\n\nUI may seem to hang but it will finish soon.")
            self.update_idletasks()
        for i in range(n):      # creates the Tab frame whith the img, destroys it, then recreates them to overwrite the previous Frame and prevous img
            b = Tab(self)
            b.destroy()
            if n == 100:
                print(i+1,"percent of processing finished.")
        if n == 1:
            self.text.configure(text = "Please check the memory usage now.\nMost of the difference is caused by the bitmap opened.\nNow push button #100.")
            self.btn.configure(text = "#100", command = lambda : self.buttonPushed(100))
        self.btn.configure(state = NORMAL)  # starting cycles is enabled again       

class Tab(Frame):
    """Creates a frame with a picture in it."""
    def __init__(self, master):
        Frame.__init__(self, master = master)
        self.a = PhotoImage(open("map.png"))    # img opened, change this to a valid one to test it
        self.b = Label(self, image = self.a)
        self.b.pack()                           # Label with img appears in Frame
        self.pack()                             # Frame appears

if __name__ == '__main__':
    a = App()

To run the code above you will need a PNG image file. My map.png's dimensions are 1062×1062. As a PNG it is 1.51 MB and as bitmap data it is about 3-3.5 MB. Use a large image to see the memory leak easily.

Expected result when you run my code: python's process eats up memory cycle by cycle. When it consumes approximately 500 MB it collapses but starts to eat up the memory again.

Please give me some advice how to solve this issue. I'm grateful for every help. Thank you. in advance.

3
  • First, is it really a problem to consume 500MB? For that matter, is it that 500MB just virtual memory, or physical/resident memory? Python generally doesn't return memory to the OS; it keeps it around to reuse when you need it later. And that usually makes things faster—allocating, freeing, and reallocating dozens of MB over and over again takes a lot of time. Also, what platform are you on? For example, on 64-bit OS X, most processes end up with hundreds of MB of VM, while on 32-bit linux, that's much less common.
    – abarnert
    Commented Jun 27, 2013 at 8:21
  • I do not know whether physical or VRAM it was. I am quite new to programming and I know no tool to check that. Could you recommend some? I used tasklist from command line and taskmanager to track the memory consumption. My OS is Win7 x64. So you say, it is not a problem as long as it collapses sometimes? That would be quite a relieve.
    – bardosd
    Commented Jun 27, 2013 at 8:41
  • TaskManager shows separate numbers for physical and virtual memory, but I can't remember exactly what they're called. Anyway, if you think there might be an actual problem, you have to learn how memory management works in Windows before you'll be able to even investigate. If you don't have any actual problem and you're just worrying and not sure why, just stop worrying.
    – abarnert
    Commented Jun 27, 2013 at 8:48

2 Answers 2

9

First, you definitely do not have a memory leak. If it "collapses" whenever it gets near 500MB and never crosses it, it can't possibly be leaking.


And my guess is that you don't have any problem at all.

When Python's garbage collector cleans things up (which generally happens immediately when you're done with it in CPython), it generally doesn't actually release the memory to the OS. Instead, it keeps it around in case you need it later. This is intentional—unless you're thrashing swap, it's a whole lot faster to reuse memory than to keep freeing and reallocating it.

Also, if 500MB is virtual memory, that's nothing on a modern 64-bit platform. If it's not mapped to physical/resident memory (or is mapped if the computer is idle, but quickly tossed otherwise), it's not a problem; it's just the OS being nice with resources that are effectively free.

More importantly: What makes you think there's a problem? Is there any actual symptom, or just something in Program Manager/Activity Monitor/top/whatever that scares you? (If the latter, take a look at the of the other programs. On my Mac, I've got 28 programs currently running using over 400MB of virtual memory, and I'm using 11 out of 16GB, even though less than 3GB is actually wired. If I, say, fire up Logic, the memory will be collected faster than Logic can use it; until then, why should the OS waste effort unmapping memory (especially when it has no way to be sure some processes won't go ask for that memory it wasn't using later)?


But if there is a real problem, there are two ways to solve it.


The first trick is to do everything memory-intensive in a child process that you can kill and restart to recover the temporary memory (e.g., by using multiprocessing.Process or concurrent.futures.ProcessPoolExecutor).

This usually makes things slower rather than faster. And it's obviously not easy to do when the temporary memory is mostly things that go right into the GUI, and therefore have to live in the main process.


The other option is to figure out where the memory's being used and not keep so many objects around at the same time. Basically, there are two parts to this:

First, release everything possible before the end of each event handler. This means calling close on files, either deling objects or setting all references to them to None, calling destroy on GUI objects that aren't visible, and, most of all, not storing references to things you don't need. (Do you actually need to keep the PhotoImage around after you use it? If you do, is there any way you can load the images on demand?)

Next, make sure you have no reference cycles. In CPython, garbage is cleaned up immediately as long as there are no cycles—but if there are, they sit around until the cycle checker runs. You can use the gc module to investigate this. One really quick thing to do is try this every so often:

print(gc.get_count())
gc.collect()
print(gc.get_count())

If you see huge drops, you've got cycles. You'll have to look inside gc.getobjects() and gc.garbage, or attach callbacks, or just reason about your code to find exactly where the cycles are. For each one, if you don't really need references in both directions, get rid of one; if you do, change one of them into a weakref.

1
  • Thank you for your answer. I worried because I pay attention to use as little resources as I can but still it just grown and grown. Now I see it is OK. I am grateful for your problem solver hints too. They will come in handy if I will have real memory problems. :)
    – bardosd
    Commented Jun 27, 2013 at 9:38
0

Saving 500MB is worth, saving 100MB is worth, saving 10MB is worth. Memory has price of gold and many suggests to waste it. Definitely, it is your decision, if you want to waste it on your Mac, do it... And absolutely, it is very sad advice how to write very poor software.

Use https://pypi.org/project/memory-profiler/ to track your Python memory allocations. Use

x = someRamConsumingObject()
# do the stuff here ...
# remove the refrences
del x
x = None
gc.Collect() # try to force garbage collector to collect

Away from philosophical discussions, real examples from industrial Edge computing gives us exact reasons why this shall be improved. If running Python in containers one will soon hit the wall, especially having multiple containers running on the Edge under heavy production load.

And even if Edge has 16GiB, you will hit wall soon, especially using data analytics tools like Pandas.

Then, my friend, you will recognise what is the hell of garbage collectors and what means "not having memory under control".

C++ rocks!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.