Closed
Bug 1342397
Opened 8 years ago
Closed 8 years ago
Crash in InvalidArrayIndex_CRASH | mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate
Categories
(Toolkit :: Safe Browsing, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | fixed |
People
(Reporter: calixte, Assigned: hchang)
References
(Blocks 1 open bug)
Details
(Keywords: crash, Whiteboard: [clouseau])
Crash Data
Attachments
(1 file)
This bug was filed from the Socorro interface and is
report bp-56ecd061-96d7-408d-80f8-b49642170223.
=============================================================
There are 4 crashes in nightly 54 with buildid 20170222030329 and 20170223030204.
In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1338970.
[1] https://hg.mozilla.org/mozilla-central/rev?node=2c1fc455e81e0c7744e38a36b6df8657f462a369
Flags: needinfo?(hchang)
Reporter | ||
Comment 1•8 years ago
|
||
There are 2 crashes with signature "nsTArray_Impl<T>::RemoveElementsAt | nsUrlClassifierPrefixSet::~nsUrlClassifierPrefixSet". The backtrace shows that the files modified to fix bug 1338970 are involved too, it's why I'm adding this signature.
Crash Signature: [@ InvalidArrayIndex_CRASH | mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate] → [@ InvalidArrayIndex_CRASH | mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate]
[@ nsTArray_Impl<T>::RemoveElementsAt | nsUrlClassifierPrefixSet::~nsUrlClassifierPrefixSet ]
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → hchang
Flags: needinfo?(hchang)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•8 years ago
|
||
I can think of one case which may lead to the crash signature [@ InvalidArrayIndex_CRASH | mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate]:
In [1] we iterate mLookupCache and create new LookupCache for update accordingly.
However, creating new LookupCache (i.e. GetLookupCacheForUpdate()) may cause
mLookupCache to be cleared [2].
My solution is to only remove update intermediary when GetLookupCacheForUpdate() is failed
(no matter it's due to file corruption or not. For example, if it's due to OOM, we still
have to remove every thing temporary for update.)
So, after Bug 1339760,
- GetLookupCacheForUpdate() will only on update thread and may only remove "mNewLookupCaches" and "safebrowsing-update";
- GetLookupCache() will only on worker thread and may only remove "mLookupCache" and "safebrowsing".
[1] http://searchfox.org/mozilla-central/rev/60ae6514e4c559c0c234f0e7aefccb101b8beb2e/toolkit/components/url-classifier/Classifier.cpp#1012
[2] http://searchfox.org/mozilla-central/rev/60ae6514e4c559c0c234f0e7aefccb101b8beb2e/toolkit/components/url-classifier/Classifier.cpp#1352
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8840939 -
Flags: review?(gpascutto)
Comment 7•8 years ago
|
||
The line you point to in the above comment, and the added comments in this patch point to Reset() "Not including any intermediary for update.". Was that meant as a comment on its (existent) behavior, or is something missing from this patch?
Assignee | ||
Comment 8•8 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)
> The line you point to in the above comment, and the added comments in this
> patch point to Reset() "Not including any intermediary for update.". Was
> that meant as a comment on its (existent) behavior, or is something missing
> from this patch?
That's the existent behavior and I just wanted to make Reset() more clear.
I have ever considered to extend Reset() to cover update intermediaries
but then realized that would be a mess when we introduce more concurrency
between update and lookup.
Assignee | ||
Comment 9•8 years ago
|
||
Note that my patch can only explain the crash inside CopyInUseLookupCacheForUpdate().
What the patch cannot explain are:
1) Why the copied files will be corrupted (and only on window and android)?
If the source is corrupted, it should have been removed. Or does it imply
"copy then open too soon" is problematic on windows and android?
2) The double delete issue of the other crash signature (from CloseDb())
Comment 10•8 years ago
|
||
1) Can have many reasons. There are many differences in how Windows handles basic file access, varying from not updating access/modification times in the same manner (this can be observed in safebrowsing directory time), things like not being able to open files multiple times (I forgot exact conditions), etc. Also, there are much more users on Windows so they will trigger more problems. Android by design will kill Firefox randomly so it's much more likely to surface any unsoundness in the update procedure.
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8840939 [details]
Bug 1342397 - Properly reset stuff when failing to create LookupCache for update.
https://reviewboard.mozilla.org/r/115326/#review116838
It's late and I'm tired, but given the
::: toolkit/components/url-classifier/Classifier.cpp:1327
(Diff revision 4)
> LookupCache *
> Classifier::GetLookupCache(const nsACString& aTable, bool aForUpdate)
> {
> - if (aForUpdate) {
> - return GetLookupCacheFrom(aTable, mNewLookupCaches, mUpdatingDirectory);
> - }
> + nsTArray<LookupCache*>& lookupCaches = aForUpdate ? mNewLookupCaches
> + : mLookupCaches;
> + nsIFile* rootStoreDirectory = aForUpdate ? mUpdatingDirectory
If it cannot be nullptr one can try to make it a reference.
::: toolkit/components/url-classifier/Classifier.cpp:1364
(Diff revision 4)
> + // GetLookupCache for update and for other usage will run on update thread
> + // and worker thread respectively (Bug 1339760). Removing stuff only in
> + // their own realms potentially increases the concurrency.
> +
> + if (aForUpdate) {
> + // Remove intermediaries no matter it's due to file corruption or not.
nit: no matter if it's due
Attachment #8840939 -
Flags: review?(gpascutto) → review+
Comment 12•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8840939 [details]
Bug 1342397 - Properly reset stuff when failing to create LookupCache for update.
https://reviewboard.mozilla.org/r/115326/#review116838
Oops, Mozreview saved this from last week :) I asked the question I was going to ask here in a separate comment.
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)
> 1) Can have many reasons. There are many differences in how Windows handles
> basic file access, varying from not updating access/modification times in
> the same manner (this can be observed in safebrowsing directory time),
> things like not being able to open files multiple times (I forgot exact
> conditions), etc. Also, there are much more users on Windows so they will
> trigger more problems. Android by design will kill Firefox randomly so it's
> much more likely to surface any unsoundness in the update procedure.
Looking again the crash reports, all |InvalidArrayIndex_CRASH| are on Windows.
(I slightly remember there were some on Android. Maybe it was also too late for me...)
From the current reports I cannot tell the corruption rate is high or low.
If it is high, even if my patch fixed the crash, the update will still fail:
If the corruption is due to the source file corruption, then
1) If the file is never, (usually in the very early update), we have nothing to do.
2) If the file is opened, (which means we already have a LookupCache for it), we can get around
the "copied file corruption" issue by copying LookupCache internal. (what we are doing right
now is to re-open the copied file.) See Bug 1341183.
As for the other crash signature, it seems to be a long-standing issue: We happened to
have two pointers for one LookupCache instance, which then leads to double delete issue.
Bug 1338970 changes the call sequence in regard to "removing LookupCaches".
Comment hidden (mozreview-request) |
Comment 15•8 years ago
|
||
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/313bfccbbd9f
Properly reset stuff when failing to create LookupCache for update. r=gcp
Comment 16•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Updated•8 years ago
|
Assignee | ||
Comment 17•8 years ago
|
||
The patch has landed for almost one week and the crash signature
"InvalidArrayIndex_CRASH | mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate"
never occurs on nightly which has this patch. Good news :)
Updated•8 years ago
|
Whiteboard: [clouseau]
Comment 18•8 years ago
|
||
(In reply to Henry Chang [:henry][:hchang] from comment #17)
> The patch has landed for almost one week and the crash signature
> "InvalidArrayIndex_CRASH |
> mozilla::safebrowsing::Classifier::CopyInUseLookupCacheForUpdate"
> never occurs on nightly which has this patch. Good news :)
Good job! Thanks, Henry!
You need to log in
before you can comment on or make changes to this bug.
Description
•