Apache OpenOffice (AOO) Bugzilla – Issue 120124
optimize graphic cache to improve odt saving performance
Last modified: 2012-08-01 11:33:57 UTC
From profiling data, a document(with many graphics) is to be found that graphics will be swapped out and swapped in so frequently when saving, GraphicCache can be used to optimized this problem, graphics can be cached a while when haven't exceed the cache size.
Created attachment 78607 [details] patch for optimizing graphic cache
I have several questions and remarks: - How much faster does saving get with this cache? - There is not much documentation of the new code. Only unnecessary comments that state the issue id, information that is available via svn annotate. Can we get rid of these? - The mnMaxCacheSize is set to an arbitrary magic number of 2800000. Why this number? - The mnMaxCacheSize value is not coordinated with other caches or configurable via Tools->Options - There are two new methods GetUsedCacheSize2() and GetMaxCacheSize2() that just forward the results from their counterparts without the ...2 suffix. Why? - The actual caching logic seems to be the lines in the last hunk of the diff (in sw/source/filter/xml/xmltexte.cxx, lines 229 to 233). They seem to prevent caching of the first created objects that fit into the 2.8MB cache. They will never get swapped in or out. Every object created once the cache is full will always be swapped in or out. That does not look like a good caching strategy and should be improved.
1.following is test result for sample file sw_complex_100p.odt(manual test), about 18% improvement. old: 3.88 3.71 3.79 3.85 3.87 3.78 3.85 3.81 3.91 3.9 avg:3.84 new: 3.15 3.06 3.06 3.35 3.03 3.34 3.12 3.03 3.1 3.03 avg:3.13 2.We can get rid of these comments if unnecessary. 3.We set graphic cache to 0x2800000, that is 40M. Choose this value because we found quanties of .odt sample files don't have graphics exceed 40M. 4.The mnMaxCacheSize can't be configured via Tools->Options. 5.We need to get used cache size and max cache size in( sw/source/filter/xml/xmltexte.cxx, lines 229 to 233) and the two values can accessed by GraphicManager(pGrfNd->GetGrfObj().GetGraphicManager()). The two values are stored in GraphicCache, but its instantiation is a private member in GraphicManager, so add two member functions to access them. 6.The cache is 40M, but not 2.8M. When one graphic object is swapped in, its size is added to mnUsedCacheSize. When .odt document is saved, we need not to swap not every graphic, we only swap out graphic when mnUsedCacheSize exceed mnMaxCacheSize. for more information, you can refer to http://wiki.services.openoffice.org/wiki/ODT_saving_performance_improvement. (In reply to comment #2) > I have several questions and remarks: - How much faster does saving get > with this cache? - There is not much documentation of the new code. Only > unnecessary comments that state the issue id, information that is available > via svn annotate. Can we get rid of these? - The mnMaxCacheSize is set to > an arbitrary magic number of 2800000. Why this number? - The > mnMaxCacheSize value is not coordinated with other caches or configurable > via Tools->Options - There are two new methods GetUsedCacheSize2() and > GetMaxCacheSize2() that just forward the results from their counterparts > without the ...2 suffix. Why? - The actual caching logic seems to be the > lines in the last hunk of the diff (in sw/source/filter/xml/xmltexte.cxx, > lines 229 to 233). They seem to prevent caching of the first created > objects that fit into the 2.8MB cache. They will never get swapped in or > out. Every object created once the cache is full will always be swapped in > or out. That does not look like a good caching strategy and should be > improved.
ALG: There is already a graphic cache used in the GraphicManager/Graphic combination. For that cache, the user can change the to be used cache size in tools/optioions/memory, default is 20MB. I see no reason to use a hard-coded additional cache size of 40MB which is uncontrollable and hidden from the user. It may be discussed to change that default to 40MB, though. The current mechanism for swapping out graphics controlled can be foud by grepping for 'SwapOut' in the code. This is mainly used for cases where it is known that the graphic was swapped in for a single reason, .e.g. to print or export it. There are several places which use SwapOut in that situations to controlled free this again. This of course may have already influenced the cache content, so it does seen from the cache content not necessarily lead back to the situation before swapping in the graphic. For speeding up saving it should be enough to just remove that hard SwapOut again and just let the existing cache do it's work. This can from my POV also be done for printing or other usages. Places for this are: sw\source\core\doc\notxtfrm.cxx(1096) sw\source\core\graphic\ndgrf.cxx(689) sw\source\filter\ww8\writerhelper.cxx(742) sw\source\filter\ww8\wrtww8gr.cxx(744) sw\source\filter\xml\xmltexte.cxx(229) For this task it is enough from my POV to remove SwapOut() in xmltexte.cxx, maybe increase default graphic cache size from 20 to 40MB (or even more) with the argument that memory (and graphic sizes, think about modern user cameras and jpegs), and maybe also handle other places when at it.
Comment on attachment 78607 [details] patch for optimizing graphic cache No answer to questions after more than a week. Rejecting patch until the questions are answered.
ALG: Testwise commented out sw\source\filter\xml\xmltexte.cxx(229), savetime of sw_complex_100p.odt went from 6s to 5s...