Ticket #214 (closed: fixed)
too many inline images can't be handled
| Reported by: | volker | Owned by: | volker |
|---|---|---|---|
| Priority: | immediate | Component: | mwlib.rl |
| Severity: | Keywords: | ||
| Cc: |
Description
reportlab does not seem to close the filehandles of inline images while constructing the document. therefore rendering can fail if the system limit of max open files per user are exceeded. (check by ulimit -n)
Attachments
Change History
comment:3 Changed 2 years ago by heiko
- Status changed from new to assigned
please check with the reportlab folks if there is anyway to solve this issue.
comment:4 Changed 2 years ago by ralf
since we distribute reportlab in mwlib.ext we could also fix this ourselves.
comment:5 Changed 2 years ago by heiko
example:
mw-render -w rl -o test.pdf --keep-zip test.zip -c http://en.labs.wikimedia.org/w/ --collectionpage 'User:He!ko/Collections/This quantum world'
comment:6 Changed 2 years ago by ralf
If I add a return None in mathutils._renderMathTexvc:
{{ def _renderMathTexvc(latex, output_path, output_mode='png'):
"""only render mode is png""" return None
}}
I can render with ulimit -n 32 (i.e. 32 open files). Note that I don't have texvc installed, so this function returns None anyway on my system.
Why did you think this has something to do with reportlab? (It could still be a second bug..)
comment:7 Changed 2 years ago by ralf
I've added the zip file from heiko's collection and double checked with python 2.5 (i'm using 2.6).
returning None makes it work with ulimit -n 32. Without, it fails with my standard limit of 1024 file descriptors.
comment:8 Changed 2 years ago by ralf
"Popen pipe file descriptor leak on OSError in init": http://bugs.python.org/issue1751245
comment:9 Changed 2 years ago by ralf
here's another bug affecting our use of subprocess.Popen: http://bugs.python.org/issue2791
comment:10 Changed 2 years ago by ralf
I'm back to other work, you should have enough info to fix it.
comment:11 follow-up: ↓ 12 Changed 2 years ago by volker
I am pretty sure this is a reportlab issue. The number of open file handles before and after rendering the formula with renderMath does not change - there does not seem to be an fd leak here.
With the following sample code the problem can be reduced to reportlab inline images:
from reportlab.platypus.paragraph import Paragraph
from reportlab.platypus.doctemplate import SimpleDocTemplate
from reportlab.lib.styles import ParagraphStyle
import os
doc = SimpleDocTemplate('test.pdf')
elements = []
for count in range(1024):
print len(os.listdir("/proc/%d/fd" % os.getpid()))
p = Paragraph('text <img src="math.png" width="2cm" height="0.5cm" />' , ParagraphStyle('Normal'))
elements.append(p)
doc.build(elements)
ouput:
... 1021 1022 1023 1024 Traceback (most recent call last): File "test_fd.py", line 13, in <module> OSError: [Errno 24] Too many open files: '/proc/32376/fd'
I submitted this issue to the reportlab mailinglist
comment:12 in reply to: ↑ 11 Changed 2 years ago by ralf
Replying to volker:
I am pretty sure this is a reportlab issue. The number of open file handles before and after rendering the formula with renderMath does not change - there does not seem to be an fd leak here.
then we have two issues here. it leaks filedescriptors when texvc is not installed.
comment:13 Changed 2 years ago by ralf
btw. os.listdir("/proc/self/fd")
comment:14 Changed 2 years ago by ralf
please try and test the above patch. I can render fd-leak.zip with ulimit -n 64 with it.
comment:15 Changed 2 years ago by volker
- Status changed from assigned to closed
- Resolution set to fixed
works now


Looks like reportlab.lib.utils.ImageReader does never close the file handle which is referenced with self.fp in __init__().