Closed Bug 883253 Opened 11 years ago Closed 3 years ago

track the merging of volatile ranges in the Linux Kernel

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: dhaval.giani, Unassigned)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release) Build ID: 20130605070403 Steps to reproduce: This is a bug to track the progress of the volatile ranges feature in the Linux Kernel. John Stultz and Minchan Kim are leading the effort on the LKML with patches currently floating about. Latest update: Patches posted on the LKML https://lwn.net/Articles/554098/ git tree: https://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=refs/heads/dev/vrange-poc https://wiki.linaro.org/WorkingGroups/Kernel/AndroidUpstreaming also provides some background on the work done on this feature What would be useful would be a collection of how firefox wants to use this feature (maybe another bug on which this one is blocked?)
OS: Mac OS X → All
Hardware: x86 → All
Assignee: nobody → dhaval.giani
IRC discussion with jlebar and mwu Concerns on how volatile ranges would work with LMK. Major concern, it shouldn;t be the case that vranges help us with only the OOMkiller. They should work with LMK. This might mean there is a way to purge memory on demand, it might also mean availability of watermarks. taras says: It might be a good idea to bring in oomkiller (dhaval:maybe LMK priorities instead) into purging logic. (dhaval doesn't like this idea one bit)
Dhaval, can you post a link to your testcases and other related work to github?
taras, i am polishing scripts around the test case to make it easier to run and i will push out immediately after that.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Use case: Firefox tab switching Multiple tabs, image heavy workload. Today Tab switch causes all the decoded images in the previous tab to be expired and all the images in the new tab to be decoded. This *always* happens. With vranges Tab switch will mark all decoded images in he previous tab as volatile and all the decoded images in the new tab as non-volatile. The worst case happens when a purge has taken place, which will lead to decoding of all the images, otherwise we continue on as it is. This is a non-trivial benefit.
Interface complaints: As of last posting, the interface looks like sys_vrange(void *p, long size, int mode, int *purged) where p -> the starting address, size -> size of the range, and purged is valid onl when we are makring a range as NONVOLATILE, letting the user know if the range was purged while the range was marked volatile. mode takes only two values 0 -> Mark as Volatile 1 -> Mark as Non Volatile When a volatile range is purged, the entire range is purged. This behaviour is just fine for cases such as that of the decoded images in the switch tab use case. However if you take a case like xul.so, you want to mark the while library as volatile (mapped in), but only want to purge the cold pages. Which then means, we probably want to let the kernel know somehow about "Purge page at a time" as opposed to "Purge range at a time"
(In reply to Dhaval Giani from comment #6) > When a volatile range is purged, the entire range is purged. This behaviour So I don't think this isn't the case with the current code. Since we use the LRU page eviction, we purge page by page. With the swapless approach, there may be cases where we have to use the shirinker interfaces to trigger purging, and there I don't think we'll purge entire ranges at a time (althoguh that cdoe is being reworked). In discussions with Minchan I think we both agreed that the entire-range at a time behavior doesn't benefit the SIGBUS usage much, so I think we'll try to keep it a page-by-page thing. That said (and maybe I'm misunderstanding your point here), if any page is purged in a range, the entire range will be considered purged when it is marked non-volatile (since some data has been lost and we don't have a way to say exactly which page).
(In reply to john.stultz from comment #7) > (In reply to Dhaval Giani from comment #6) > > When a volatile range is purged, the entire range is purged. This behaviour > > That said (and maybe I'm misunderstanding your point here), if any page is > purged in a range, the entire range will be considered purged when it is > marked non-volatile (since some data has been lost and we don't have a way > to say exactly which page). It's inefficient(eg we are not maximizing memory savings) to only purge some pages if the whole range will be flagged as purged.
(In reply to Taras Glek (:taras) from comment #8) > It's inefficient(eg we are not maximizing memory savings) to only purge some > pages if the whole range will be flagged as purged. I don't think I agree. The kernel reclaims via purging only what is needed. If more memory is needed, more can be purged. In the case where we mark a lot of memory as volatile, and then use the SIGBUS notification to inform us of purged pages, instead of marking it non-volatile before access. This allows for us to be able to continue to traverse over hot volatile pages without a SIGBUS, while allowing the kernel to reclaim cold ones in the same range. Even so, the purging behavior is a kernel internal mechanism (much like paging), which may be tweaked and tuned in the future.
> > That said (and maybe I'm misunderstanding your point here), if any page is > purged in a range, the entire range will be considered purged when it is > marked non-volatile (since some data has been lost and we don't have a way > to say exactly which page). Right, so essentially what happens is that the application has to regenerate "everything" in that range as opposed to only the bits that were lost. I am just thinking about a worst case where you keep losing just a page in the range and keep regenerating everything. Is that something good? This might however not fit into the interface discussion though
(In reply to john.stultz from comment #9) > (In reply to Taras Glek (:taras) from comment #8) > > It's inefficient(eg we are not maximizing memory savings) to only purge some > > pages if the whole range will be flagged as purged. > > I don't think I agree. The kernel reclaims via purging only what is needed. > If more memory is needed, more can be purged. > > In the case where we mark a lot of memory as volatile, and then use the > SIGBUS notification to inform us of purged pages, instead of marking it > non-volatile before access. This allows for us to be able to continue to > traverse over hot volatile pages without a SIGBUS, while allowing the kernel > to reclaim cold ones in the same range. > We can't use SIGBUS everywhere. There are a few cases where the application cannot fix up behaviour from the signal handler. (The image cache is an example, where once they start drawing they can't back out, and having the page lost doesn't help. They can fix it all up before they start drawing, but once they start, it has to be non-volatile) > Even so, the purging behavior is a kernel internal mechanism (much like > paging), which may be tweaked and tuned in the future.
(In reply to Dhaval Giani from comment #6) > When a volatile range is purged, the entire range is purged. This behaviour > is just fine for cases such as that of the decoded images in the switch tab > use case. However if you take a case like xul.so, you want to mark the while > library as volatile (mapped in), but only want to purge the cold pages. > Which then means, we probably want to let the kernel know somehow about > "Purge page at a time" as opposed to "Purge range at a time" In the xul.so case, you also don't want to fill the entire volatile range until you're compelled to. In fact, the way it works currently, it starts with just nothing in there, and starts filling on the first SIGSEGV (since we're using SIGSEGV until we can actually use volatile ranges)
With the help of Joe, I now have a build of firefox that marks decoded images as volatile as opposed to freeing them on expiry. It does not crash. Tested by trapping firefox in a 256MB cgroup and firefox is seen to consume around 290-300MB RSS without OOMing. Adding some telemetry probes to see the cost of marking pages as (non) volatile.
Latest version of volatile ranges, includes swapless behaviour. 1. Doesn't work very well with memory cgroups 2. Low Memory system, firefox which was OOMing previously, is able to run (tested till 4 tabs) 2.1 However, lost image data isn't being redecoded. still to be debugged
Minchan's git tree is available at git URL: git://git.kernel.org/pub/scm/linux/kernel/git/minchan/linux.git branch: vrange-working

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: dhaval.giani → nobody
Status: ASSIGNED → NEW

This looks like a proposal for some kind of experiment. I think we can close this bug now, and somebody can refile if they have a more concrete plan.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.