Closed Bug 113354 Opened 23 years ago Closed 23 years ago

mork memory growth when accessing global history

Categories

(Core Graveyard :: History: Global, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.8

People

(Reporter: dbaron, Assigned: Bienvenu)

References

Details

(Keywords: memory-footprint, memory-leak)

Attachments

(2 files)

I did a comparison of our memory usage before and after a run of jrgm's tests to find the sources of memory growth. The second biggest source of increase, at 830K out of the 7MB total increase (behind memory cache growth but ahead of font cache growth and wallet loading its tables) was structures allocated through orkinHeap::Alloc while looking up links in global history to determine whether they were visited. I'll attach a perl-script-generated tree showing the differences in allocations by mork. This tree is the nonzero branches of tree showing allocation size by stack, with the "after" log counted as positive numbers and the "before" log counted as negative numbers. The before and after logs contain the stack traces for the calls to malloc for all the live allocations. Note that any static functions won't have their symbols looked up correctly. I can also generated line numbers from the dll/offset data, although I'd have to regenerate the log since I did it using a build from a few days ago.
Keywords: footprint
Summary: mork memory growth when accessing global history → mork memory growth when accessing global history
the calls to open the database and to add rows are expected to add memory. The calls to FindRow will also allocate memory, though that memory should get freed if the history code releases the row.
When does the history code release the row? (And who owns the backend part of global history, anyway?)
Well, it looks like nsGlobalHistory::IsVisited releases the row immediately.
Never mind that -- I misunderstood your comment. It looks like most of the increase is due to looking up URLs to see whether they are visited. Why does that allocate memory?
Not sure who owns the backend of the history code. I would think the history code would release the row right away, but I'd have to look at it. Does this 830K include all the memory it takes to load the history db, or have you factored that out? I'm not sure why FindRow would allocate more memory than just the memory to load the db - I'd have to look at the code.
I have to confess I can't read that tree very well, at least the numbers associated with the function calls. I understand the stack traces and there very well could be some sort of bloat going on...
we should fix this within next 2 milestones. :-) setting to 0.9.8
Target Milestone: --- → mozilla0.9.8
Blocks: 92580
All of the allocations that aren't from loading the history db or adding pages (which I assume are the vast majority of the allocations), the only other allocations are for tokenizing the strings the history code is passing into FindRow. I doubt that those are a significant percentage of the memory allocated, but I'd need to know how big the history.dat file is, how many times we call FindRow, etc. Also, the first time you call FindRow, Mork creates an index on that column so that subsequent lookups will be fast, which I would think we'd want to keep.
we don't even do any work - i.e. tokenizing any strings - to add it to the database, or call FindRow. We do stuff the current date/time into mork, however. About the only real work we do is notifying RDF in AddPage() my one thought is that NotifyFindAssertion should really be bailing early if there are no observers... it does some excess work that really isn't necessary... however, all of NotifyFindAssertion's work is done on the stack - the only possible allocation is from nsAutoString overflowing its buffer, but it probably only does that once or twice per call to AddPage()
if I'm reading the attachment correctly, most of the work is from calling FindRow - 567k of allocations. My theory is this is the mork index, and its bloaty.
We call FindRow for every link in the page to see if we've visited it or not - what's happening is that all of those links are getting tokenized and stuck in the token map, by mork. If you visit a ton of unique pages with tons of unique links, this will bloat. I looked at changing mork so that FindRow didn't tokenize its input, but it's not easy - the tokens get put into a memory structure that things can't be freed from (basically, a pool), so that there's no per-object overhead. And the key to the map is the token id, so you need to tokenize to find atoms in the map. If I'm understanding what's going on inside mork correctly, I might be able to short-circuit this by checking if the string exists in the atom table at all, and if not, we can assume there's no row that could possibly have that string. The other possibility is for history to occasionally close and open the history db - this would get rid of all these temporary objects (I'm assuming that these temporary tokens aren't getting written out to the db, because no one refers to them, but that's something to check as well).
I really like that first option (shortcircuiting) - and it seems like we can make an argument that is correct behavior: If mork MUST use a tokenized string in order to find a row in the db with a given cell value, then we can assume that each cell has to have been tokenized before hand. If each cell has been tokenized, then its value is in the table. Therefore, if the value is NOT in the table, then the value has NOT been tokenized. Since the value has not been tokenized, it is not in a cell, which means there is no such row. Q.E.D. :) I'll bet this will signifigantly reduce bloat on sites that use very unique, session-specific URLs like amazon.com and cnet.com.
thanks for the proof, Alec :-) That's what I was trying to say. I've coded it up and I'm trying it now.
Attached patch proposed fix (deleted) — Splinter Review
I added a parameter to YarnToAtom which says whether to create the atom if it's missing. By default, it's true (the current behaviour), and I've changed FindRow to pass in false for the create parameter. Coloring of links seems to work fine, and I haven't seen any other problems. I can't swear that it'll fix the bloat, but it's worth trying. requesting r/sr's from dbaron and alecf.
Comment on attachment 60333 [details] [diff] [review] proposed fix damn, that was easy. sr=alecf
Comment on attachment 60333 [details] [diff] [review] proposed fix r=sspitzer, sr=alecf
Attachment #60333 - Flags: superreview+
Attachment #60333 - Flags: review+
calling findrow for a url that's not in history shouldn't bloat now.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
mass-verifying claudius' Fixed bugs which haven't changed since 2001.12.31. if you think this particular bug is not fixed, please make sure of the following before reopening: a. retest with a *recent* trunk build. b. query bugzilla to see if there's an existing, open bug (new, reopened, assigned) that covers your issue. c. if this does need to be reopened, make sure there are specific steps to reproduce (unless already provided and up-to-date). thanks! [set your search string in mail to "AmbassadorKoshNaranek" to filter out these messages.]
Status: RESOLVED → VERIFIED
Keywords: mlk
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: