Closed
Bug 1247982
Opened 9 years ago
Closed 9 years ago
crash in nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead regressing in Firefox 43
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
People
(Reporter: philipp, Assigned: dragana)
References
Details
(Keywords: crash, regression, Whiteboard: [necko-active])
Crash Data
Attachments
(2 files, 3 obsolete files)
(deleted),
patch
|
dragana
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
[Tracking Requested - why for this release]:
this is a recent regression & a mid-level crasher
This bug was filed from the Socorro interface and is
report bp-ab1a8ec1-0a63-4e60-9868-513812160127.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 xul.dll std::_Atomic_fetch_add_4(unsigned long volatile*, unsigned long, std::memory_order) c:/tools/vs2013/vc/include/xatomic.h:1674
1 xul.dll nsTArray_Impl<mozilla::net::nsHttpHeaderArray::nsEntry, nsTArrayInfallibleAllocator>::DestructRange(unsigned int, unsigned int) xpcom/glue/nsTArray.h
2 xul.dll nsTArray_Impl<mozilla::net::nsHttpHeaderArray::nsEntry, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) xpcom/glue/nsTArray.h
3 xul.dll mozilla::net::nsHttpRequestHead::~nsHttpRequestHead() netwerk/protocol/http/nsHttpRequestHead.cpp
4 xul.dll mozilla::net::HttpBaseChannel::~HttpBaseChannel() netwerk/protocol/http/HttpBaseChannel.cpp
5 xul.dll mozilla::net::nsHttpChannel::`scalar deleting destructor'(unsigned int)
6 xul.dll nsInputStreamChannel::Release() netwerk/base/nsInputStreamChannel.cpp
7 xul.dll NS_ProxyRelease(nsIEventTarget*, nsISupports*, bool) xpcom/glue/nsProxyRelease.cpp
8 xul.dll nsTransportEventSinkProxy::~nsTransportEventSinkProxy() netwerk/base/nsTransportUtils.cpp
9 xul.dll nsTransportEventSinkProxy::`scalar deleting destructor'(unsigned int)
10 xul.dll nsTransportEventSinkProxy::Release() netwerk/base/nsTransportUtils.cpp
11 xul.dll nsCOMPtr_base::~nsCOMPtr_base() xpcom/glue/nsCOMPtr.h
12 xul.dll mozilla::net::nsHttpTransaction::~nsHttpTransaction() netwerk/protocol/http/nsHttpTransaction.cpp
13 xul.dll mozilla::net::nsHttpTransaction::`scalar deleting destructor'(unsigned int)
14 xul.dll mozilla::net::DeleteHttpTransaction::Run() netwerk/protocol/http/nsHttpTransaction.cpp
15 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp
16 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp
17 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
18 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc
19 xul.dll nsThreadManager::nsThreadManager() xpcom/threads/nsThreadManager.h
20 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp
21 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp
22 xul.dll nsAppStartup::Run() toolkit/components/startup/nsAppStartup.cpp
23 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp
24 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp
25 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp
this windows crash signature and regressing since firefox 43, and there are no obvious correlations to any addons/modules causing it.
on release it is currently the #28 crasher making up 0.40% of all crashes.
Updated•9 years ago
|
Component: General → Networking: HTTP
Comment 1•9 years ago
|
||
Daniel, can you help here?
This is top crash #23 (windows only)
thanks
(I don't see a trace of a single crash in 47)
Comment 2•9 years ago
|
||
Really tricky thing, this. Thee chain nsHttpRequestHead, HttpBaseChannel and nsInputStreamChannel have all remained unmodified for a while even before this so I'd say this crash is more likely to be caused by a change in behavior in some foundation code.
Is there a way for us to do some range checks, as in can we figure out when this problem first showed up?
It is so very few version 46 crashes too so I fear the lack of 47 crashes could just be just bad/good luck so far.
Flags: needinfo?(daniel)
Comment 3•9 years ago
|
||
Thanks Daniel!
Kairo, could you help Daniel with the range checks? Danke!
Flags: needinfo?(kairo)
Comment 4•9 years ago
|
||
(In reply to Daniel Stenberg [:bagder] from comment #2)
> Is there a way for us to do some range checks, as in can we figure out when
> this problem first showed up?
As the crash signature happens back to at least Firefox 39 and mid-last-year, I don't know what to look for there.
Flags: needinfo?(kairo)
Comment 5•9 years ago
|
||
fixing this would move the needle on crashes
Assignee: nobody → daniel
Whiteboard: [necko-active]
Assignee | ||
Updated•9 years ago
|
Comment 7•9 years ago
|
||
I think I might have an idea for this, but I'm not sure if would work yet.
As I mentioned in bug 1250816 comment 4, it seems the crash occurs when because mHeaders in nsHttpRequestHead is corrupted.
So we create a class to act as a memory guard, on which we set PROT_NONE, which we place around mHeaders or mRequestHead.
This way, whenever the memory around mHeaders is corrupted we get a nice stack trace, so we can find the cause.
Then we need to send the build to all the reporters who mentioned this happens often - I assume we can do that.
Comment 8•9 years ago
|
||
Added signature from dupe. No ideas personally but know it is one of the top 5 signatures pushing up e10s crashes.
Starting with std::_Atomic_fetch_add_4 reminds me of semi recent fix https://hg.mozilla.org/mozilla-central/rev/3b0dafa67477 (Bug most definitely not related.)
Control group from e10s b45.2 experiment [1]. the experiment failed to filter out accessibility, about 10% more users were added extra that e10s group didn't contain.
18/31 are with accessibility. (~nsHttpRequestHead signature)
Experiment group [2]. No accessibility.
20 (~nsHttpRequestHead signature)
110 (OnStartRequest signature)
[1] https://crash-stats.mozilla.com/search/?build_id=20160215141016&process_type=browser&process_type=content&ActiveExperiment=%3De10s-beta45-withoutaddons%40experiments.mozilla.org&date=%3E%3D2016-02-16&date=%3C2016-02-23&ActiveExperimentBranch=%3Dcontrol-no-addons&signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
[2] https://crash-stats.mozilla.com/search/?build_id=20160215141016&process_type=browser&process_type=content&ActiveExperiment=%3De10s-beta45-withoutaddons%40experiments.mozilla.org&date=%3E%3D2016-02-16&date=%3C2016-02-23&ActiveExperimentBranch=%3Dexperiment-no-addons&signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Crash Signature: [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead] → [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead]
[@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest]
tracking-e10s:
--- → ?
Updated•9 years ago
|
Blocks: e10s-crashes
Comment 9•9 years ago
|
||
Just to note, it seems there are a bunch of crashes with a similar signature (nsTArray_Impl<T>::RemoveElementsAt) that are not in necko. There is a high probability that the problem is somewhere in nsTArray.
https://crash-stats.mozilla.com/search/?signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Comment 10•9 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #9)
> Just to note, it seems there are a bunch of crashes with a similar signature
> (nsTArray_Impl<T>::RemoveElementsAt) that are not in necko. There is a high
> probability that the problem is somewhere in nsTArray.
>
> https://crash-stats.mozilla.com/search/
> ?signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_column
> s=date&_columns=signature&_columns=product&_columns=version&_columns=build_id
> &_columns=platform#facet-signature
Hm, three out of the top four crashes in that list are Necko crashes:
https://crash-stats.mozilla.com/signature/?signature=nsTArray_Impl%3CT%3E%3A%3ADestructRange+|+nsTArray_Impl%3CT%3E%3A%3ARemoveElementsAt+|+mozilla%3A%3Anet%3A%3AnsHttpResponseHead%3A%3A~nsHttpResponseHead&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
https://crash-stats.mozilla.com/signature/?signature=nsTArray_Impl%3CT%3E%3A%3ADestructRange+|+nsTArray_Impl%3CT%3E%3A%3AReplaceElementsAt%3CT%3E+|+mozilla%3A%3Anet%3A%3AHttpChannelChild%3A%3AOnStartRequest&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
https://crash-stats.mozilla.com/signature/?signature=nsTArray_Impl%3CT%3E%3A%3ADestructRange+|+nsTArray_Impl%3CT%3E%3A%3ARemoveElementsAt+|+mozilla%3A%3Anet%3A%3AnsHttpRequestHead%3A%3A~nsHttpRequestHead&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
Virtually all of the crashes are at address (size_t)-8, which indicates we have a null pointer somewhere.
Let's look at one of those crashes:
https://crash-stats.mozilla.com/report/index/5f286bee-a18a-4bb3-9a08-a6ab32160229
Downloading the dump file and disassembling the DestructRange function:
// This block is setting up pointers for the current element and the next element.
// 14h == 0x14 == 20 is sizeof(nsHttpHeaderArray::nsEntry).
// esi is the current element, ebx the pointer just beyond the last element.
xul!nsTArray_Impl<mozilla::net::nsHttpHeaderArray::nsEntry,nsTArrayInfallibleAllocator>::DestructRange:
2006 6bce359e 6b54240414 imul edx,dword ptr [esp+4],14h
2007 6bce35a3 53 push ebx
2008 6bce35a4 6b5c240c14 imul ebx,dword ptr [esp+0Ch],14h
2008 6bce35a9 56 push esi
2008 6bce35aa 8b31 mov esi,dword ptr [ecx]
2008 6bce35ac 83c208 add edx,8
2008 6bce35af 03f2 add esi,edx
2008 6bce35b1 03de add ebx,esi
2009 6bce35b3 3bf3 cmp esi,ebx
2009 6bce35b5 742d je (6bce35e4) Branch
// Save ebp because we're going to use it in the code below. Note that
// we're only going to save it if there's actually elements to destroy,
// which is a nice demonstration of shrink-wrapping by the compiler.
DestructRange+0x19:
2009 6bce35b7 55 push ebp
// This block and subsequent bits is is nsSubstring.cpp:ReleaseData, inlined.
// The bare 4 is nsSubstring::F_SHARED. The layout of nsTSubstring is:
//
// char* mData (offset 0)
// unsigned mLength (offset 4)
// unsigned mFlags (offset 8)
//
// But esi isn't pointing an an nsTSubstring; it's pointing at an nsEntry,
// and the string we're interested in is at offset 4 inside an nsEntry structure.
// Hence, given an nsEntry pointer, |mData| of |nsEntry::value| would be at
// offset 4, |mLength| at offset 8, and so forth.
//
// The |test| instruction is checking |mFlags|; the |mov| instruction is
// loading the |mData| pointer.
DestructRange+0x1a:
2010 6bce35b8 f6460c04 test byte ptr [esi+0Ch],4
2010 6bce35bc 8b4604 mov eax,dword ptr [esi+4]
2010 6bce35bf 7428 je (6bce35e9) Branch
// This is the F_SHARED case of ReleaseData. We need to do an atomic
// decrement of nsStringBuffer's reference count. So we convert the |mData|
// pointer into an nsStringBuffer (nsStringBuffer::FromData); this is the
// |lea| instruction below. We push (in program order) the memory ordering,
// the amount we're going to add (-1), and the pointer to the refcount, which
// is identical to the pointer to the string buffer itself.
DestructRange+0x23
2010 6bce35c1 6a05 push 5
2010 6bce35c3 8d68f8 lea ebp,[eax-8]
2010 6bce35c6 6aff push 0FFFFFFFFh
2010 6bce35c8 55 push ebp
2010 6bce35c9 e8c884d7ff call xul!std::_Atomic_fetch_add_4 (6ba5ba96)
2010 6bce35ce 83c40c add esp,0Ch
2010 6bce35d1 48 dec eax
2010 6bce35d2 7508 jne (6bce35dc) Branch
There's other bits of that function that I'm not going to annotate, because we have enough to see what's going on: some |value| member of an nsEntry had a nullptr |mData| field, which got translated into an |(size_t)-8| address to pass to _Atomic_fetch_add_4.
How did mData get to be nullptr? That is not at all clear to me. We have a couple cases in the string code where mData starts out as nullptr, but those appear to be transient conditions. We guard against assigning nullptr to an nTSubstring in nsTSubstring::Assign...Is it possible that somehow we have a null entry in the array? I don't see any way that could happen from the nsHttpHeaderArray code, but perhaps I'm not looking hard enough (nsHttpHeaderArray::SetEmptyHeader *does* initialize the new entry's |value|, fwiw).
I don't see how this is an nsTArray problem, but perhaps I'm not being imaginative enough. ;)
Flags: needinfo?(valentin.gosu)
Comment 11•9 years ago
|
||
That means the initial assumption of memory corruption is correct, or somehow we end up nulling out the pointer backing nsEntry::value via one of the string APIs - which is less likely.
Flags: needinfo?(valentin.gosu)
Comment 12•9 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #11)
> That means the initial assumption of memory corruption is correct, or
> somehow we end up nulling out the pointer backing nsEntry::value via one of
> the string APIs - which is less likely.
Memory corruption where every instance is writing zero in a single field is pretty convenient (note that it's not the entire string getting zero'd out, or |mFlags| would be zero too). Are these header arrays used on multiple threads?
Comment 13•9 years ago
|
||
It seems that the nsHttpHeaderArray methods are called on multiple threads.
Comment 14•9 years ago
|
||
[Tracking Requested - why for this release]:
Too late for 45 but we have enough crashes on 46 to discuss about a tracking.
tracking-firefox46:
--- → ?
Comment 15•9 years ago
|
||
(In reply to Valentin Gosu [:valentin] from comment #13)
> It seems that the nsHttpHeaderArray methods are called on multiple threads.
Is that with appropriate locking? I admit that I haven't seen nsHttpHeaderArray come up in TSan races, but maybe circumstances are different out in the wild...
Comment 17•9 years ago
|
||
IIRC our plan here it to try adding locking to these calls and see if the issue goes away. Is that right Daniel?
Flags: needinfo?(daniel)
Comment 18•9 years ago
|
||
I was hoping Valentin would clarify after Nathan's question. What (unlocked?) accesses from multiple threads were you talking about?
Flags: needinfo?(daniel) → needinfo?(valentin.gosu)
Comment 19•9 years ago
|
||
In nsHttpHeaderArray I see the methods getting called from different threads, and there's no locking as far as I can tell.
From what I understand, the crash is due to the pointer backing nsEntry::value is null.
One way I see this happening due to threading would be if one thread calls mHeaders.AppendElement(); and the other is iterating through the array before the constructor for that element gets called to initialize the objects. I can't verify if this correct though, and it would be a bit odd if we were still appending to the array while another array was destroying it.
Flags: needinfo?(valentin.gosu)
I don't see any crashes on 46 for these signatures or for Nathan's search. But I could be missing something here. KaiRo can you tell if this affects 46? Thanks!
Comment 21•9 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #20)
> KaiRo can you tell if this affects 46? Thanks!
According to https://mozilla.github.io/bug-signatures-status/#/bug/1247982 it was seen in 46 Dev Edition but not (yet) in 46.0b1
Comment 23•9 years ago
|
||
nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest
has probably been replaces by
mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::net::PHttpChannelChild::FatalError | mozilla::net::PHttpChannelChild::OnMessageReceived in 46 beta.
Comment 24•9 years ago
|
||
(In reply to Jonathan Howard from comment #23)
> nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> |
> mozilla::net::HttpChannelChild::OnStartRequest
> has probably been replaces by
> mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError |
> mozilla::net::PHttpChannelChild::FatalError |
> mozilla::net::PHttpChannelChild::OnMessageReceived in 46 beta.
The signature looks very different.
Comment 25•9 years ago
|
||
Maybe the wrong one, (that was only in 46b1 and gone from b2.)
ReleaseData signature possible more likely bug 1145613 comment 16
I'm not seeing this crash for 46, so I don't think we need to track it.
Assignee | ||
Comment 27•9 years ago
|
||
I have look a bit into this:
We have a function that returns a raw pointer to the RequestHeaders, a bit scary. But it is not done anything wrong with it. It is only use to access function like GetHeader, GetMethod... so it is safe.
I little bit scary is that is used over nsHttpTransaction which holds a raw pointer to nsHttpRequestHeader and
nsHttpTransaction changes request headers too.
e.g:
https://crash-stats.mozilla.com/report/index/ab1a8ec1-0a63-4e60-9868-513812160127
in this crash nsHttpTransaction is being destroyed so the only possibility is that nsHttpTransaction is used in the same time as it is being destroyed.
There are other crashes where:
https://crash-stats.mozilla.com/report/index/b3a152f8-bb8a-4533-b7eb-c53892160405
here only nsHttpChannel is being destroyed.
Anyway nsHttpRequestHead is not kept as a pointer in nsHttpChannel so it will be gone when channel is gone:
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.h#375
So we are using nsHttpTransction when it is being freed and/or we still have pointer to nsHttpRequestHead in nsHttpTransaction when the channel is being destroyed. Putting a lock will not solve the problem.
It is possible that I have missed something in this analysis :)
Assignee | ||
Comment 28•9 years ago
|
||
Something to add:
One more possibility - at some point, we add elements to the array at the same time from two different threads and that is why array is corrupted. (I am not sure if this is possible)
Comment 29•9 years ago
|
||
this is a weird structure - it is one of the few pieces of data shared between the socket and main threads. It has never had locking and it has never had any kind of memory barrier to ensure consistent views. Its model is really point in time - the main thread owns it before setting up the http transaction, then the http transaction owns it until onstartrequest() is called and then the main thread owns it again.
its been sketchy since the days before phoenix afaict. and its certainly prone to regression.
Dragana, it would be ok to lock all these methods. It would be interesting to implement a wrapper around AutoLock for it that contains an Atomic Bool with the state of the lock and DIAGNOSTIC_ASSERT (?) it to be free just before obtaining it. It would miss finding the offender in one race condition, but it wouldn't have a false positive problem.
Assignee | ||
Comment 30•9 years ago
|
||
There are crashes in e10s :)
https://crash-stats.mozilla.com/report/index/c49f0b5b-2dfe-4d9c-abfa-dd2e82160331
Updated•9 years ago
|
Comment 31•9 years ago
|
||
This may be the same as e10s crash bug 1145613, which was our #4 top crasher in content for our beta 46 experiment. That crash always seems to happen near the startup of the content process. There's some additional debug info and crashstats links in that bug, although you can ignore everything prior to bug 1145613, comment 15.
Assignee | ||
Comment 32•9 years ago
|
||
Daniel, do you want to continue working on this, it is assigned to you? or should I take it?
We could add also a check if the right thread is accessing it at specific time - from comment 29 "the main thread owns it before setting up the http transaction, then the http transaction owns it until onstartrequest() is called and then the main thread owns it again"
e10s cases - I m not sure it is the same issue.
Flags: needinfo?(daniel)
Comment 33•9 years ago
|
||
Please have a go at it, Dragana. I've got stuck elsewhere.
Assignee: daniel → dd.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(daniel)
Comment 34•9 years ago
|
||
Dragana, curious if I can get an update on this bug. Have you had a chance to look at it? If not, when do you think you'll be able to?
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 35•9 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #34)
> Dragana, curious if I can get an update on this bug. Have you had a chance
> to look at it? If not, when do you think you'll be able to?
I am working on it. I decided not to just add a lock, but to figure out what went wrong and that takes longer since nsHttpRequestHead is used from 4 different threads :)
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 36•9 years ago
|
||
I tried to make control from which thread headers are access, but it got complicated so going for an easier way.
I still think that nsHttpTransaction::Restart() is the problem, but I do not have a prove. Just before Restart is called, mActivityDistributor->ObserveActivity is called.
So I added lock on header array and a check hopefully it will be triggered.
To add a lock on nsHttpRequestHead will need rewrite a big part of nsHttpRequestHead so lets try without that.
Assignee | ||
Comment 37•9 years ago
|
||
mWriteInProgress is check out of the lock, so it is possible the assert will not be triggered always when it should be, but this is just for a diagnostic so it is enough to be triggered once.
The lock is to be sure of proper writing into the array and hopefully no more crashes.
Attachment #8745263 -
Flags: review?(mcmanus)
Assignee | ||
Comment 38•9 years ago
|
||
Attachment #8745263 -
Attachment is obsolete: true
Attachment #8745263 -
Flags: review?(mcmanus)
Attachment #8745264 -
Flags: review?(mcmanus)
Assignee | ||
Comment 39•9 years ago
|
||
Comment 40•9 years ago
|
||
Comment on attachment 8745264 [details] [diff] [review]
bug_1247982_lockheaderArray.patch
Review of attachment 8745264 [details] [diff] [review]:
-----------------------------------------------------------------
I think we ought to lock all methods of this class based on the data we have.. a read with a write is also bad news and it isn't meant to ever be used simultaneously on different threads so we won't be serializing multiple readers
Attachment #8745264 -
Flags: review?(mcmanus) → feedback+
Assignee | ||
Comment 41•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #40)
> Comment on attachment 8745264 [details] [diff] [review]
> bug_1247982_lockheaderArray.patch
>
> Review of attachment 8745264 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I think we ought to lock all methods of this class based on the data we
> have.. a read with a write is also bad news and it isn't meant to ever be
> used simultaneously on different threads so we won't be serializing multiple
> readers
I tried to make a control from which thread mHttpRequestHead is being accessed:
It starts with main thread until asyncOpen is called, then it can be cache2I/O, cacheI/O or mainthread but only a single function (actually 2, but one is onle cache(2)I/O) is called so it is possible to control it. The cache part can be skipped.
Then it is socketThread and this is a bit problematic because we call mActivityDistributor->ObserveActivity couple of times and there mHttpRequestHead is access from the main thread. It is fine to assume socketThread until headers are written out but nsHttpTransaction::Restart is call after that and it changes header array. So if I put MOZ_DIAGNOSTIC_ASSERT(!mHeaders->mWriteInProgress); to all functions we will crash very often.
I will rewrite part where nsHttpTransaction::Restart is called. And try to make a patch that will return error if the thread is not the right one, e.g. calling GetHeader from js after asyncopen is called. (I think we do not have guards like that)
Assignee | ||
Comment 42•9 years ago
|
||
One note:
there is a bit not nice way how I am passing parameters to SendOnStartRequest. I do not want to do an extra copying of headers. To avoid copying there is an option to pass nsHttpRequetHead as a parameter and to lock in
static void ParamTraits<mozilla::net::nsHttpRequestHead>::Write(Message* aMsg, const mozilla::net::nsHttpRequestHead& aParam)
but here headers are past as a const so only option is a hack or rewriting ParamTraits. (I am not sure it is worth it.)
Attachment #8745264 -
Attachment is obsolete: true
Attachment #8747034 -
Flags: review?(mcmanus)
Assignee | ||
Comment 43•9 years ago
|
||
Comment 44•9 years ago
|
||
Comment on attachment 8747034 [details] [diff] [review]
bug_1247982.patch
Review of attachment 8747034 [details] [diff] [review]:
-----------------------------------------------------------------
this is tremendous. Thank you. We might need to rename you bulldog.
the review is pretty much nits.. and I have a general feeling we can get some of those consts back.. but if not we can live with it.
::: netwerk/protocol/http/nsHttpRequestHead.cpp
@@ +30,5 @@
> {
> MOZ_COUNT_DTOR(nsHttpRequestHead);
> }
>
> +// Don't use this function. It is only used by HttpChannelarent to avoid
typo (Parent)
@@ +74,5 @@
> + return mHeaders.Count();
> +}
> +
> +nsresult
> +nsHttpRequestHead::VisitHeaders(nsIHttpHeaderVisitor *visitor,
I think this is worth documenting in the .h file that it lets you iterate all the headers atomically under one lock
::: netwerk/protocol/http/nsHttpRequestHead.h
@@ +46,5 @@
> + void Method(nsACString &aMethod);
> + nsHttpVersion Version();
> + void RequestURI(nsACString &RequestURI);
> + void Path(nsACString &aPath);
> + void SetHTTPS(bool val);
tailing whitespace
@@ +56,3 @@
>
> + nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m=false);
> + nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m,
trailing whitespace
@@ +57,5 @@
> + nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m=false);
> + nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m,
> + nsHttpHeaderArray::HeaderVariety variety);
> + nsresult SetEmptyHeader(nsHttpAtom h);
> + nsresult GetHeader(nsHttpAtom h, nsACString &v);
this lost a const.. on purpose? findheadervalue() and HasHeaderValue() and IsSafeMethod() too..
@@ +93,3 @@
>
> + bool IsOptions();
> + bool IsConnect();
so these is*() things that just call EqualsMethod can stay in the .h file, right?
Attachment #8747034 -
Flags: review?(mcmanus) → review+
Assignee | ||
Comment 45•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #44)
> Comment on attachment 8747034 [details] [diff] [review]
> bug_1247982.patch
>
> Review of attachment 8747034 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> @@ +57,5 @@
> > + nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m=false);
> > + nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m,
> > + nsHttpHeaderArray::HeaderVariety variety);
> > + nsresult SetEmptyHeader(nsHttpAtom h);
> > + nsresult GetHeader(nsHttpAtom h, nsACString &v);
>
> this lost a const.. on purpose? findheadervalue() and HasHeaderValue() and
> IsSafeMethod() too..
>
We want to lock this functions too, in const function we cannot call mLock.Lock();
> @@ +93,3 @@
> >
> > + bool IsOptions();
> > + bool IsConnect();
>
> so these is*() things that just call EqualsMethod can stay in the .h file,
> right?
I moved everything at first, and forgot to move some back :)
Assignee | ||
Comment 46•9 years ago
|
||
typos fix.
Attachment #8747034 -
Attachment is obsolete: true
Attachment #8748068 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 47•9 years ago
|
||
Keywords: checkin-needed
Comment 48•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Reporter | ||
Updated•8 years ago
|
status-firefox48:
--- → ?
Comment 50•8 years ago
|
||
This is marked as a regression in 47; can we uplift to beta?
Flags: needinfo?(rkothari)
Updated•8 years ago
|
Comment 51•8 years ago
|
||
andrew, I don't agree with the regression tag. This is a day one core race condition problem and the crash signatures have probably shifted along the way depending on what else was going on. It was originally filed with 43, but is probably an issue longer than that.
it's also a fairly complicated patch. If dragana felt confident in an uplift to 48 I would support that, but I would not support 47 at this point.
(In reply to Andrew Overholt [:overholt] from comment #50)
> This is marked as a regression in 47; can we uplift to beta?
Hi Andrew, based on Patrick's assessment, this may be too risk to uplift at this stage, RC1 gtb today. We do want to fix recent regressions but we may have lost the runway to do that.
I also looked at the crash volume on this one and it doesn't seem high enough to warrant uplifting at this stage. What do you think?
Flags: needinfo?(rkothari)
This is not a new regression and Patrick suggested we avoid uplifting to 47 as the patch is fairly risky to take this late in 47 cycle (RC week). This is now a wontfix for Fx47.
Assignee | ||
Comment 54•8 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #51)
> andrew, I don't agree with the regression tag. This is a day one core race
> condition problem and the crash signatures have probably shifted along the
> way depending on what else was going on. It was originally filed with 43,
> but is probably an issue longer than that.
>
> it's also a fairly complicated patch. If dragana felt confident in an uplift
> to 48 I would support that, but I would not support 47 at this point.
Sorry for a delay, 47 is really risky (and not in question any more).
About 48, we could uplift it. It does touch a lot of header handling code, but I think it is fine to uplift.
Assignee | ||
Comment 55•8 years ago
|
||
We will need to uplift also bug 1269738 (a very small patch) and bug 1274509 (this bug changes an idl - setting header during VisitRequestHeader was an unsafe operation, really hope no one did it, now it is going to return an error)
Comment 56•8 years ago
|
||
Uplifting bug 507571 as the same time as this is probably worth doing IMO. (I'm assuming without verifying it has no dependencies.)
Comment 57•8 years ago
|
||
Dragana, can you request and coordinate the necessary 48 uplifts? It would be excellent if we got it done while that was still dev-channel
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 58•8 years ago
|
||
Comment on attachment 8748068 [details] [diff] [review]
bug_1247982.patch
Approval Request Comment
This is approval request for 48 (in case it changes channels)
[Feature/regressing bug #]: A long existing bug
[User impact if declined]:Crash, but not very high volume.
[Describe test coverage new/current, TreeHerder]: This code is heavily used, but this patch touches almost every aspect of request header handling (it does not change header parsing which is good). It is hard to test all corner cases
[Risks and why]: It is hard to test all corner cases.
[String/UUID change made/needed]:
We will need to uplift also bug 1269738 (a very small patch) and bug 1274509 (it is a low risk, the bug changes an idl - setting header during VisitRequestHeader was an unsafe operation, really hope no one did it, now it is going to return an error).
Order of uplift:
this patch,
bug 1269738
bug 1274509
Flags: needinfo?(dd.mozilla)
Attachment #8748068 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 59•8 years ago
|
||
Comment on attachment 8748068 [details] [diff] [review]
bug_1247982.patch
needed a rebase :)
Attachment #8748068 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 60•8 years ago
|
||
Approval Request Comment
This is approval request for 48 (in case it changes channels)
[Feature/regressing bug #]: A long existing bug
[User impact if declined]:Crash, but not very high volume.
[Describe test coverage new/current, TreeHerder]: This code is heavily used, but this patch touches almost every aspect of request header handling (it does not change header parsing which is good). It is hard to test all corner cases
[Risks and why]: It is hard to test all corner cases.
[String/UUID change made/needed]:
We will need to uplift also bug 1269738 (a very small patch) and bug 1274509 (it is a low risk, the bug changes an idl - setting header during VisitRequestHeader was an unsafe operation, really hope no one did it, now it is going to return an error).
Order of uplift:
this patch,
bug 1269738
bug 1274509
Attachment #8759933 -
Flags: approval-mozilla-aurora?
Comment 61•8 years ago
|
||
Comment on attachment 8759933 [details] [diff] [review]
bug_1247982_aurora.patch
Well, it is a very patch just before merge day, not taking it right now at least.
Comment on attachment 8759933 [details] [diff] [review]
bug_1247982_aurora.patch
Switching the approval flag so that it's for 48
Attachment #8759933 -
Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Updated•8 years ago
|
Crash Signature: [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead]
[@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest] → [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead]
[@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest]
…
Comment 63•8 years ago
|
||
Comment on attachment 8759933 [details] [diff] [review]
bug_1247982_aurora.patch
Has been in m-c for a month, also now on aurora, fix an interesting volume of crashes on release, taking it.
Attachment #8759933 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
This got uplifted earlier, never made it to the bug, apparently: https://hg.mozilla.org/releases/mozilla-beta/rev/810fa61b0d7347ab50e7dc930d3f0d95a8f8bda4
You need to log in
before you can comment on or make changes to this bug.
Description
•