Closed Bug 453199 Opened 16 years ago Closed 16 years ago

_cairo_xlib_remove_close_display_hook is O(n) slower than it should be

Categories

(Core :: Graphics, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: karlt, Assigned: karlt)

References

Details

(Keywords: memory-footprint, perf)

Attachments

(1 file, 1 obsolete file)

cairo_xlib_display_t uses a linked list for close_display_hooks, and so _cairo_xlib_remove_close_display_hook searches through each link in the list for the hook to remove. This was apparently designed for a small number of hooks, because, when the number of links in the list becomes large, this becomes expensive. However the number of hooks does become large because there is a hook for each xlib surface (and sometimes a second when there is a scaled font on the surface). imgContainers and nsWindows hold references to many surfaces. During talos Tp tests, the number of xlib surfaces can exceed 2000. The proportion of CPU_CLK_UNHALTED events for the Firefox process occurring in _cairo_xlib_remove_close_display_hooks is 7.5% during talos's repeat iterations. The number of xlib surfaces may also be large when a session contains a large number of tabs. Further, removed hooks are kept in a freelist and the memory is not freed until the cairo_xlib_display_t is destroyed.
Depends on: 453200
Keywords: footprint, perf
Attached patch use a hash table instead of a linked list (obsolete) (deleted) — Splinter Review
Callers provide the memory for the hook, thus avoiding additional memory allocations. For this change to be effective, lookup of entries in cairo_hash_tables should be O(1) (bug 453200). Tp savings are about 5% with an O(1) hash table lookup (if we can believe the tryserver).
Attachment #336396 - Flags: review?(vladimir)
That is horrible, nice find :( Let me get behdad on ickle to review that.
Karl, if you embed the hook struct in the object itself, with prev and next pointers (that is, kernel-style linked lists), you don't need any additional structure and free is O(1).
Comment on attachment 336396 [details] [diff] [review] use a hash table instead of a linked list Behdad's suggestion is better.
Attachment #336396 - Flags: review?(vladimir)
The changes to _cairo_xlib_call_close_display_hooks are to ensure that close_display_hooks remains a valid list even after the memory for a hook is released in hook->func.
Attachment #336396 - Attachment is obsolete: true
Attachment #336434 - Flags: review?(mozilla)
Comment on attachment 336434 [details] [diff] [review] use a doubly-linked list of hooks provided by the caller I like this! Do you think we should do the same thing about the workqueue?
Attachment #336434 - Flags: review?(mozilla) → review+
(In reply to comment #6) > Do you think we should do the same thing about the workqueue? It would probably be nice from a memory allocation POV, but it looks like there may not always be an existing object in which to embed the job struct. It looks like random access (removal or otherwise) is not required with the workqueue. I haven't studied workqueue usage patterns to know whether there is ever a large number of jobs in the queue.
No longer depends on: 453200
Thankyou, that's a good improvement. The only alteration I made to the patch as I rebased it against master (939b836bfa95df759aca96936bb9a6d89d3130b8) was to drop the data member from the hook, and use the container_of macro to recover the original pointer to the structure. (And be explicit about the associated data for cairo_xlib_surface_font_private_t!) (In reply to comment #7) > (In reply to comment #6) > > Do you think we should do the same thing about the workqueue? > > It would probably be nice from a memory allocation POV, but it looks like there > may not always be an existing object in which to embed the job struct. Aye, this is the case as the workqueue is, typically, associated with an XID that needs to be freed by some random function. Having said that, this is now only true for the XRenderPictures freed during surface finalization, so we could keep the cairo_xlib_surface_t alive until we run the workqueue... Except at the moment it would take some reorganisation of the higher levels to stop cairo_surface_destroy() from freeing the surface. > It looks like random access (removal or otherwise) is not required with the > workqueue. Correct, once put onto the workqueue the resource is considered finalized by the xlib backend, and so the list is only consumed upon notification. > I haven't studied workqueue usage patterns to know whether there is ever a > large number of jobs in the queue. From my tests (which admittedly have completely different usage patterns from yours! - it would be handy if you can capture profiling data using cairo-trace so that we have some realistic benchmarks upon which to work...) frequently there are no items in the list, and we can save a measurable amount of time by checking whether the list if empty before taking the lock - I think I can justify why that is safe! (However since it does require justification I've considered it too dirty to include without someone pointing to a profile and complaining.)
(In reply to comment #8) > Thankyou, that's a good improvement. The only alteration I made to the patch as > I rebased it against master (939b836bfa95df759aca96936bb9a6d89d3130b8) Just in case anyone wonders, that's cairo master!
(In reply to comment #8) Thanks, Chris. > drop the data member from the hook, and use the container_of macro to recover > the original pointer to the structure. Nice. > From my tests (which admittedly have completely different usage patterns from > yours! - it would be handy if you can capture profiling data using cairo-trace > so that we have some realistic benchmarks upon which to work...) frequently > there are no items in the list, and we can save a measurable amount of time by > checking whether the list if empty before taking the lock. cairo-trace looks fancy. FWIW _cairo_xlib_display_notify consumes < 0.01% of CPU resources in profiles that I've seen, though I think we might make the lock a noop in Mozilla.
(In reply to comment #8) > there are no items in the list, and we can save a measurable amount of time by > checking whether the list if empty before taking the lock - I think I can > justify why that is safe! (However since it does require justification I've > considered it too dirty to include without someone pointing to a profile and > complaining.) It'll be safe if you do an atomic read of the pointer. But lets not fix something that is not currently a problem.
(In reply to comment #10) > FWIW _cairo_xlib_display_notify consumes < 0.01% of CPU resources in profiles > that I've seen, though I think we might make the lock a noop in Mozilla. Note that Firefox compiles cairo with no mutex support. Would be interesting to compile firefox with system cairo and see how much time is spent in various mutexes.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Depends on: 453765
Resolution: --- → FIXED
Chris landed the modified fix upstream (comment 8) and bug 453765 landed the fix in Mozilla.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: