Closed Bug 1038854 Opened 10 years ago Closed 10 years ago

App does not always start

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla34

Project Flags:

blocking-b2g

2.0+

Tracking Flags:

Tracking

Status

firefox32

---

wontfix

firefox33

---

wontfix

firefox34

---

fixed

b2g-v1.2

---

unaffected

b2g-v1.3

---

fixed

b2g-v1.3T

---

fixed

b2g-v1.4

---

fixed

b2g-v2.0

---

fixed

b2g-v2.1

---

fixed

Tracking

Status

firefox32

firefox33

firefox34

b2g-v1.2

b2g-v1.3

b2g-v1.3T

b2g-v1.4

b2g-v2.0

b2g-v2.1

relnote-firefox

thunderbird_esr102

thunderbird_esr115

firefox-esr102

firefox-esr115

firefox117

firefox118

firefox119

firefox125

firefox126

firefox123

firefox124

People

(Reporter: sotaro, Assigned: sinker)

References

Details

(Whiteboard: [caf priority: p2][CR 698221])

Attachments

(7 files, 10 obsolete files)

test.py 10 years ago Sotaro Ikeda [:sotaro] (deleted), text/plain		Details
screen shot when the problem happens 10 years ago Sotaro Ikeda [:sotaro] (deleted), image/png		Details
logcat 10 years ago Sotaro Ikeda [:sotaro] (deleted), text/plain		Details
preallocated processes backtrace 10 years ago Ting-Yu Chou [:ting] (away) (deleted), text/plain		Details
patch 10 years ago Ting-Yu Chou [:ting] (away) (deleted), patch	khuey : review+	Details \| Diff \| Splinter Review
patch 10 years ago Ting-Yu Chou [:ting] (away) (deleted), patch	ting : review+	Details \| Diff \| Splinter Review
Forward fix to the patch 10 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch	khuey : review-	Details \| Diff \| Splinter Review
Alternative fix 10 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), patch		Details \| Diff \| Splinter Review
fix-pthread_cond_wait.diff 10 years ago Thinker Li [:sinker] (deleted), patch		Details \| Diff \| Splinter Review
fix-pthread_cond_wait.diff 10 years ago Thinker Li [:sinker] (deleted), patch		Details \| Diff \| Splinter Review
fix-pthread_cond_wait.diff 10 years ago Thinker Li [:sinker] (deleted), patch		Details \| Diff \| Splinter Review
Proposed fix 10 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) (deleted), patch	sinker : review+ cyu : feedback+	Details \| Diff \| Splinter Review
Make mutex acquiring being managed and in ordered 10 years ago Thinker Li [:sinker] (deleted), patch		Details \| Diff \| Splinter Review
Make mutex acquiring being managed and in ordered, v2 10 years ago Thinker Li [:sinker] (deleted), patch		Details \| Diff \| Splinter Review
The patch to resync before pthread_cond_wait() 10 years ago Cervantes Yu [:cyu] [:cervantes] (deleted), text/plain		Details
Make mutex acquiring being managed and in ordered, v3 10 years ago Thinker Li [:sinker] (deleted), patch	khuey : feedback- cyu : feedback-	Details \| Diff \| Splinter Review
Proposed patch for 2.0 10 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) (deleted), patch	bajaj : approval-mozilla-b2g28+ bajaj : approval-mozilla-b2g30+	Details \| Diff \| Splinter Review

Sotaro Ikeda [:sotaro]

Reporter

Description

•

10 years ago

Attached file test.py (deleted) — Details

On 237MB RAM flame, there seems a case that application is not started. I faced this problem during investigating Bug 1037360. This problem seems independent from Bug 1037360.

SRT
[1] Run attached script on v2.0 flame with 273MB RAM.

Expected result
- App start and kill forever by the script

Actual result
- App failed to start up. The app's icon is shown on the screen.

Prerequisite
- v2.0 flame with 273MB RAM.
- It seems better to apply Bug 1037360 fix.

Sotaro Ikeda [:sotaro]

Reporter

Comment 1

•

10 years ago

Attached image screen shot when the problem happens (deleted) — Details

Sotaro Ikeda [:sotaro]

Reporter

Updated

•

10 years ago

Depends on: 1037360

Sotaro Ikeda [:sotaro]

Reporter

Comment 2

•

10 years ago

Nominate to "b2g-v2.0+".

blocking-b2g: --- → 2.0?

Sotaro Ikeda [:sotaro]

Reporter

Comment 3

•

10 years ago

This problem might be related to Bug 1036670.

Sotaro Ikeda [:sotaro]

Reporter

Updated

•

10 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1036670

Sotaro Ikeda [:sotaro]

Reporter

Comment 4

•

10 years ago

Attached file logcat (deleted) — Details

Logcat log until the problem happened.

Sotaro Ikeda [:sotaro]

Reporter

Comment 5

•

10 years ago

b2g-ps command result was strange, there were two preallocated process.

----------------------------
sotaro@eideticker-tor-3:/mnt/sdb/sotaro/b2g_v20_flame/B2G$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              0 root      279   1     190668 47740 ffffffff b6ea367c S /system/b2g/b2g
(Nuwa)           0 root      854   279   55004  3080  ffffffff b6eba67c S /system/b2g/plugin-container
Homescreen       2 u0_a936   936   279   108280 19676 ffffffff b6e5867c S /system/b2g/plugin-container
(Preallocated a  0 u0_a11234 11234 854   59164  8584  ffffffff b6eba848 S /system/b2g/plugin-container
(Preallocated a  2 u0_a11247 11247 854   61156  15672 ffffffff b6eba67c S /system/b2g/plugin-container

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

10 years ago

Blocks: CAF-v2.0-FC-metabug

Lucas Adamski [:ladamski]

Comment 6

•

10 years ago

Hi Sotaro... looks like you are working on this?  If so could you please assign it to yourself?

bhavana bajaj [:bajaj]

Comment 7

•

10 years ago

Passing this onto you sotaro, for now, Feel free re-assign as neeeded.
(Just a note from CAF's triage they are seeing this on their side as well, Inder is going to add more details here)

Assignee: nobody → sotaro.ikeda.g

blocking-b2g: 2.0? → 2.0+

Sotaro Ikeda [:sotaro]

Reporter

Comment 8

•

10 years ago

I un-assign myself. I just reported the bug. I do not working for this. I have another bugs and there is no evidence that this is graphic bug now.

Assignee: sotaro.ikeda.g → nobody

bhavana bajaj [:bajaj]

Comment 9

•

10 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #3)
> This problem might be related to Bug 1036670.

NI on ting here to help investigate given it might be related to 1036670

Flags: needinfo?(tchou)

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Component: General → Performance

Ting-Yu Chou [:ting] (away)

Comment 10

•

10 years ago

bug 1036670 haven't entered homescreen yet, which couldn't launch an app, should be different from this one.

Flags: needinfo?(tchou)

Ting-Yu Chou [:ting] (away)

Comment 11

•

10 years ago

Sotaro, could you please teach me how to run the test script on Flame? I couldn't make it works:

ting@sweet:~/w/fx/os/flame$ ~/w/pyenv/bin/python test.py
Opening Phone
Traceback (most recent call last):
  File "test.py", line 15, in <module>
    app = a.launch("phone")
  File "/home/ting/w/fx/os/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 49, in launch
    assert result, "Failed to launch app with name '%s'" % name
AssertionError: Failed to launch app with name 'phone'

Ting-Yu Chou [:ting] (away)

Comment 12

•

10 years ago

NI for comment 11.

Flags: needinfo?(sotaro.ikeda.g)

bhavana bajaj [:bajaj]

Comment 13

•

10 years ago

Passing this on ting, as he is helping investigate this based on our offline discussion.

Assignee: nobody → tchou

Ting-Yu Chou [:ting] (away)

Comment 14

•

10 years ago

Thanks for :askeing's help, marionette doesn't work since I have an previous installed gaia-ui-test package, which does not match to the version of client.

Flags: needinfo?(sotaro.ikeda.g)

Ting-Yu Chou [:ting] (away)

Comment 15

•

10 years ago

The test script iterates over 200 times, but hasn't reproduced it. I am not sure does this relate to bug 1033618. I am using:

  gecko: db6779a27019c0d1ec63fc501afdf250639fd468 (v32.0a2)
  gaia:  404edc9230fb925a31845622a99de84fd35d1c7b

Sotaro, what's your reproduce rate and commit hash?

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Reporter

Comment 16

•

10 years ago

(In reply to Ting-Yu Chou [:ting] from comment #15)
> The test script iterates over 200 times, but hasn't reproduced it. I am not
> sure does this relate to bug 1033618. I am using:
> 
>   gecko: db6779a27019c0d1ec63fc501afdf250639fd468 (v32.0a2)
>   gaia:  404edc9230fb925a31845622a99de84fd35d1c7b
> 
> Sotaro, what's your reproduce rate and commit hash?

I runed the script for several hours.

 gecko: 323e7d8e8884be8da8341c0e2687e81962a774f4
 gaia: 2c6c413ed729d465c52d6c2d5d458e2eee79e956

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Reporter

Comment 17

•

10 years ago

In my case, the problem happened steadily, though it needs time until the problem happens.

:Eli Perelman

Updated

•

10 years ago

Severity: normal → blocker

Keywords: perf

Priority: -- → P1

Whiteboard: [c=memory p= s= u=2.0]

Mike Lee [:mlee]

Updated

•

10 years ago

Status: NEW → ASSIGNED

Ting-Yu Chou [:ting] (away)

Comment 18

•

10 years ago

Attached file preallocated processes backtrace (deleted) — Details

Just reproduced, there were 2 preallocated process as comment 5, one is blocked in HelperThreadState().lock() from js::CancelOffThreadIonCompile(), one is epoll waiting in nsAppShell::ProcessNextNativeEvent().

Will check why is the former one blocks there.

Ting-Yu Chou [:ting] (away)

Comment 19

•

10 years ago

There are 2 HelperThread (Analysis Helper), both are waiting on the condition |producerWakeup| associated to GlobalHelperThreadState's mutex. The mutex should be unlocked, but its value is 2 and blocks the AutoLockHelperThreadState of CancelOffThreadIonCompile().

Am checking how could this happen.

Jason Smith [:jsmith]

Updated

•

10 years ago

Keywords: footprint

Whiteboard: [c=memory p= s= u=2.0] → [c=memory p= s= u=2.0][MemShrink]

Ting-Yu Chou [:ting] (away)

Comment 20

•

10 years ago

o this is a Nuwa issue.

When one freezed HelperThread has been resumed, the second one going to be resumed may get EBUSY from pthread_mutex_trylock() in __wrap_pthread_cond_wait(). It then raises the flag |reacquireMutex| and let main thread to call pthread_mutex_lock() on the mutex. But there's no correspond unlock call and the mutex is not a recursive one, so later a call to pthread_mutex_lock() on the same mutex will be blocked, no matter it's from main thread or any other one.

I am not sure why the mutex needs to be locked by main thread, per spec of pthread_cond_wait() and pthread_cond_timedwait(), the valid state can only be maintained if the recreated thread owns the lock.

I am verifying my patch now, will come out later.

Jason Smith [:jsmith]

Updated

•

10 years ago

Component: Performance → DOM: Content Processes

Product: Firefox OS → Core

Version: unspecified → 32 Branch

Ting-Yu Chou [:ting] (away)

Comment 21

•

10 years ago

(In reply to Ting-Yu Chou [:ting] from comment #20)
> I am not sure why the mutex needs to be locked by main thread, per spec of
> pthread_cond_wait() and pthread_cond_timedwait(), the valid state can only
> be maintained if the recreated thread owns the lock.

Cervantes just told me the reason why, it is for the case when thread A owns mutex M, thread B is recreated and not interrupted by context switch, and goes to pthread_cond_wait() to unlock M. In this case, the state of M for thread A will be incorrect, so thread B asks main thread to lock it.

The implementation also assumes there's no undefined behavior for mutex lock/unlock when the calling thread does not own it.

I need to investigate this furthur for the patch.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

10 years ago

Whiteboard: [c=memory p= s= u=2.0][MemShrink] → [c=memory p= s= u=2.0][MemShrink] [CR 698221]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 22

•

10 years ago

If this is a bug in Nuwa then it's not a MemShrink issue per se.

Whiteboard: [c=memory p= s= u=2.0][MemShrink] [CR 698221] → [c=memory p= s= u=2.0][CR 698221]

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [c=memory p= s= u=2.0][CR 698221] → [caf priority: p2][c=memory p= s= u=2.0][CR 698221]

Ting-Yu Chou [:ting] (away)

Comment 23

•

10 years ago

After double check, the RECREATE_GATE_VIP() in __wrap_pthread_cond_wait() should prevent recreated thread to lock any mutex before all the threads are recreated. So comment 20 is not all correct:

> When one freezed HelperThread has been resumed, the second one going to be
> resumed may get EBUSY from pthread_mutex_trylock() in
> __wrap_pthread_cond_wait().

It is not the first recreated HelperThread locks the mutex to cause EBUSY for the second one.

The first recreated HelperThread actually enters REAL(pthread_cond_wait) and blocks there, the mutex value should be decreased to 0. But later when second HelperThread is recreating, pthread_mutex_trylock() returns EBUSY, which it shouldn't. I thought it may be some other thread locks the mutex, but I didn't see this from logging.

I added this code to pthread.c:

diff --git a/libc/bionic/pthread.c b/libc/bionic/pthread.c
index e30fa9d..f18d2f9 100644
--- a/libc/bionic/pthread.c
+++ b/libc/bionic/pthread.c
@@ -1122,8 +1122,11 @@ int __pthread_cond_timedwait_relative(pthread_cond_t *con
 {
     int  status;
     int  oldvalue = cond->value;
+    int  oldmvalue = mutex->value;
 
     pthread_mutex_unlock(mutex);
+    if (mutex->value >= oldmvalue)
+        raise(SIGTRAP);
     status = __futex_wait_ex(&cond->value, COND_IS_SHARED(cond), oldvalue, relt
     pthread_mutex_lock(mutex);

and I can hit the SIGTRAP, will check this furthur.

Lucas Adamski [:ladamski]

Comment 24

•

10 years ago

This is a 2.0 blocker but I don't see much happening in the last 5 days.  Can I please get an update?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Flags: needinfo?(tchou)

Ting-Yu Chou [:ting] (away)

Comment 25

•

10 years ago

I am sorry, but I still haven't figured out how does it behave incorrectly, am still working on it.

Flags: needinfo?(tchou)

Ting-Yu Chou [:ting] (away)

Comment 26

•

10 years ago

a. The mutex value of helperLock is not synchronized between main thread, HelperThread-A, and HelperThread-B, which causes HelperThread-B failing pthread_mutex_trylock(). This probably relates to memory barrier usage of pthread implementation.
b. Main thread will be blocked for locking reacquireMutex if it is owned by another thread, and may cause deadlock. This is a different issue, will file a bug later.

When a) happens, there are 6 possible sequences for main thread (M) recreating the 2 HelperThread (A, B) from __wrap_pthread_cond_wait(), +/- means mutex lock/unlock:

  1. A+ A- B- M+,
  2. A+ A- M+ B-,
  3. A+ B- A- M+,
  4. A+ B- M+ A-,
  5. A+ M+ A- B-,
  6. A+ M+ B- A-

For case 1 and 3, the mutex state will be locked uncontended after M+, which causes this issue. To fix it, we can either:

  1. make sure mutex value is synchronized and a) is not happeend, or
  2. make sure pthread_mutex_lock() is called before __real_pthread_cond_wait() which goes to pthread_mutex_unlock()

I have tried to add dsb before returning from _normal_unlock(), but it doesn't work for solution 1. And by now, I have two ideas for solution 2:

  replace pthread_mutex_trylock() by pthread_mutex_lock() and get rid of reacquireMutex, or
  delay __real_pthread_cond_wait() until main thread owns reacquireMutex

Ting-Yu Chou [:ting] (away)

Comment 27

•

10 years ago

(In reply to Ting-Yu Chou [:ting] from comment #26)
> doesn't work for solution 1. And by now, I have two ideas for solution 2:
> 
>   replace pthread_mutex_trylock() by pthread_mutex_lock() and get rid of
> reacquireMutex, or
>   delay __real_pthread_cond_wait() until main thread owns reacquireMutex

Talked to Thinker and Cervantes, actually both are not ok as they cause b) of comment 26.

Cervantes came up a better solution which to recreate HelperThread-B after HelperThread-A enters __real_pthread_cond_wait(), I am verifying it now.

Dave Huseby [:huseby]

Comment 28

•

10 years ago

Does this bug still repro when you test it on a Flame set to 319 MB of memory?

Flags: needinfo?(tchou)

Ting-Yu Chou [:ting] (away)

Comment 29

•

10 years ago

Yes, unfortunately :(

Flags: needinfo?(tchou)

Ting-Yu Chou [:ting] (away)

Comment 30

•

10 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

Try: https://tbpl.mozilla.org/?tree=Try&rev=9e7fe0adcc40

Ting-Yu Chou [:ting] (away)

Comment 31

•

10 years ago

Corrections:

(In reply to Ting-Yu Chou [:ting] from comment #26)
> a. The mutex value of helperLock is not synchronized between main thread,
> HelperThread-A, and HelperThread-B, which causes HelperThread-B failing
> pthread_mutex_trylock(). This probably relates to memory barrier usage of
> pthread implementation.

Actually HelperThread-B fails pthread_mutex_trylock() because HelperThread-B is recreated even HelperThread-A hasn't enter wrapped pthread condition wait which HelperThread-A still owns the mutex. Note context switch may be happened after HelperThread-A passes RECREATE_CONTINUE(), and before __real_pthread_cond_wait().

> b. Main thread will be blocked for locking reacquireMutex if it is owned by
> another thread, and may cause deadlock. This is a different issue, will file
> a bug later.

I was wrong, the deadlock is not existed with current implementation, it happens only if I replace trylock by lock. No bug need to be filed.

Ting-Yu Chou [:ting] (away)

Comment 32

•

10 years ago

Comment on attachment 8464458 [details] [diff] [review]
patch

The patch guarantees if a thread is recreating from pthread condition wait wrapper, it will enter wrapped function to unlock the mutex before main thread creating next one, which there won't be sequences like:

  A+ A- B- M+,
  A+ B- A- M+

Unlock and lock a unlocked mutex will make it locked uncontended, which is not what we want.

A+ = HelperThread-A calls pthread_mutex_lock(m)
A- = HelperThread-A calls pthread_mutex_unlock(m)
M+ = Main thread calls pthread_mutex_lock(m)

Attachment #8464458 - Flags: review?(khuey)

Cervantes Yu [:cyu] [:cervantes]

Comment 33

•

10 years ago

Retitle since this unrelated to the memory configuration of the device.

Summary: App does not always start on 273MB flame → App does not always start

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 34

•

10 years ago

Comment on attachment 8464458 [details] [diff] [review]
patch

Review of attachment 8464458 [details] [diff] [review]:
-----------------------------------------------------------------

This is subtle.  Excellent catch.

::: mozglue/build/Nuwa.cpp
@@ +139,5 @@
>    void *recrArg;
>  
>    TLSInfoList tlsInfo;
>  
> +  bool isCondMutexBalanced;

I think condMutexNeedsBalancing might be a more accurate name for this.

@@ +140,5 @@
>  
>    TLSInfoList tlsInfo;
>  
> +  bool isCondMutexBalanced;
> +  pthread_mutex_t *condMutex;

Reverse the order of these and place the following comment above them (based on your comment in the last hunk).

We must ensure that the recreated thread has entered pthread_cond_wait() or similar functions before proceeding to recreate the next one. Otherwise, if the next thread depends on the same mutex, it may be used in an incorrect state.  To do this, the main thread must unconditionally acquire the mutex.  The mutex is unconditionally released when the recreated thread enters pthread_cond_wait().  The recreated thread may have locked the mutex itself (if the pthread_mutex_trylock succeeded) or another thread may have already held the lock.  If the recreated thread did lock the mutex we must balance that with another unlock on the main thread, which is signaled by isCondMutexBalanced/condMutexNeedsBalancing/whatever you name it.

@@ +1416,5 @@
>        RECREATE_WAIT();
> +      if (tinfo->condMutex) {
> +        // Ensure the recreating thread has entered wrapped pthread condition
> +        // wait function before recreating next one. Otherwise if next thread
> +        // depends on the same mutex, it may be used with an incorrect state.

And then here just write

Synchronize with the recreated thread in pthread_cond_wait().

Attachment #8464458 - Flags: review?(khuey) → review+

Ting-Yu Chou [:ting] (away)

Comment 35

•

10 years ago

Attached patch patch (deleted) — Details — Splinter Review

Carry r+ from comment 34.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #34)
> Comment on attachment 8464458 [details] [diff] [review]
> patch
> 
> Review of attachment 8464458 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is subtle.  Excellent catch.

Yes, it's neat, credit goes to Cervantes.

>
> ::: mozglue/build/Nuwa.cpp
> @@ +139,5 @@
> >    void *recrArg;
> >  
> >    TLSInfoList tlsInfo;
> >  
> > +  bool isCondMutexBalanced;
> 
> I think condMutexNeedsBalancing might be a more accurate name for this.

Changed.

>
> @@ +140,5 @@
> >  
> >    TLSInfoList tlsInfo;
> >  
> > +  bool isCondMutexBalanced;
> > +  pthread_mutex_t *condMutex;
> 
> Reverse the order of these and place the following comment above them (based
> on your comment in the last hunk).
> 
> We must ensure that the recreated thread has entered pthread_cond_wait() or
> similar functions before proceeding to recreate the next one. Otherwise, if
> the next thread depends on the same mutex, it may be used in an incorrect
> state.  To do this, the main thread must unconditionally acquire the mutex. 
> The mutex is unconditionally released when the recreated thread enters
> pthread_cond_wait().  The recreated thread may have locked the mutex itself
> (if the pthread_mutex_trylock succeeded) or another thread may have already
> held the lock.  If the recreated thread did lock the mutex we must balance
> that with another unlock on the main thread, which is signaled by
> isCondMutexBalanced/condMutexNeedsBalancing/whatever you name it.

Done.

> 
> @@ +1416,5 @@
> >        RECREATE_WAIT();
> > +      if (tinfo->condMutex) {
> > +        // Ensure the recreating thread has entered wrapped pthread condition
> > +        // wait function before recreating next one. Otherwise if next thread
> > +        // depends on the same mutex, it may be used with an incorrect state.
> 
> And then here just write
> 
> Synchronize with the recreated thread in pthread_cond_wait().

Done.

Attachment #8464458 - Attachment is obsolete: true

Attachment #8465155 - Flags: review+

Ting-Yu Chou [:ting] (away)

Updated

•

10 years ago

Keywords: checkin-needed

Ting-Yu Chou [:ting] (away)

Comment 36

•

10 years ago

(In reply to Ting-Yu Chou [:ting] from comment #35)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #34)
> > This is subtle.  Excellent catch.
> Yes, it's neat, credit goes to Cervantes.

I misread the catch as patch, thanks :)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 37

•

10 years ago

Out of curiosity do you know which lock this was happening to?  That would be useful information for deciding whether we need to backport this to 1.4/1.3t/etc.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 38

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f73cd738c1fe

Keywords: checkin-needed, footprint, perf

Whiteboard: [caf priority: p2][c=memory p= s= u=2.0][CR 698221] → [caf priority: p2][CR 698221]

Ting-Yu Chou [:ting] (away)

Comment 39

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #37)
> Out of curiosity do you know which lock this was happening to?  That would
> be useful information for deciding whether we need to backport this to
> 1.4/1.3t/etc.

From comment 19:

> There are 2 HelperThread (Analysis Helper), both are waiting on the
> condition |producerWakeup| associated to GlobalHelperThreadState's mutex.
> The mutex should be unlocked, but its value is 2 and blocks the
> AutoLockHelperThreadState of CancelOffThreadIonCompile().

GlobalHelperThreadState's mutex is within PRLock helperLock.

Cervantes Yu [:cyu] [:cervantes]

Comment 40

•

10 years ago

The problem happens in worker threads, and we create worker threads based on the number of CPU cores. I think this will happen to previous branches that runs on dual-core CPUs. So I think the answer is: yes we need to backport this patch.

Carsten Book [:Tomcat]

Comment 41

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/f73cd738c1fe

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Ryan VanderMeulen [:RyanVM]

Comment 42

•

10 years ago

https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/36d9f1b26ac2

status-b2g-v2.0: --- → fixed

status-b2g-v2.1: --- → fixed

status-firefox32: --- → wontfix

status-firefox33: --- → wontfix

status-firefox34: --- → fixed

Mike Habicher [:mikeh] (high bugzilla latency)

Updated

•

10 years ago

Depends on: 1046790

Ryan VanderMeulen [:RyanVM]

Comment 43

•

10 years ago

Backed out per IRC.

https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/2631725af366

Status: RESOLVED → REOPENED

status-b2g-v2.0: fixed → affected

status-b2g-v2.1: fixed → affected

status-firefox34: fixed → affected

Resolution: FIXED → ---

Target Milestone: mozilla34 → ---

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 44

•

10 years ago

And the m-c backout: https://hg.mozilla.org/mozilla-central/rev/104254bd1fc8

Jason Smith [:jsmith]

Updated

•

10 years ago

Depends on: 1046997

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 45

•

10 years ago

I suspect the problem here is that if pthread_cond_wait returns before the main thread attempts to lock the mutex in question the recreated thread will end up in RECREATE_GATE_VIP holding the mutex and the main thread will be blocked attempting to lock the mutex.  I should have caught that during review.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 46

•

10 years ago

What if the main thread sets condMutex to null after successfully acquiring the lock.  Then in the RECREATE_GATE_VIP branch on the recreated thread we can check to see if condMutex is null.  If it is *not*, we unlock the mutex in question to allow the main thread to acquire it and we set condMutexNeedsBalancing to false (not in this order!) unconditionally so the main thread does locks but does not unlock the mutex.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Flags: needinfo?(tchou)

Ting-Yu Chou [:ting] (away)

Comment 47

•

10 years ago

Sorry about making troubles, comment 45 should be the case, but please allow me to take some time to think comment 46.

Flags: needinfo?(tchou)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 48

•

10 years ago

Don't worry about it.  I've done worse :P

Status: REOPENED → ASSIGNED

Cervantes Yu [:cyu] [:cervantes]

Comment 49

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #45)
> I suspect the problem here is that if pthread_cond_wait returns before the
> main thread attempts to lock the mutex in question the recreated thread will
> end up in RECREATE_GATE_VIP holding the mutex and the main thread will be
> blocked attempting to lock the mutex.  I should have caught that during
> review.

It sounds like we tried to serialize some actions between the recreated thread and the main thread, but introduced another bug that also needs serialization. I can't think of any condition under which pthread_cond_wait() could return before the main thread attempts to lock the mutex. After all the main thread is recreating thread, and the recreated threads are all gated, waiting for main thread's command to open the gate. It's probably pthread_cond_timedwait(), where the forked process returned immediately from it without really waiting.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 50

•

10 years ago

Yes, it's probably pthread_cond_timedwait().  But waiting on a condvar is allowed to wake up spuriously.  See http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cond_wait.html.  This is why correct usage of a condvar is always:

Thread 1

while (!signalled) {
  condvar.Wait();
}

Thread 2

signalled = true;
condvar.Notify();

So I think whatever solution we use for pthread_cond_timedwait() we should use for all of the functions.

Cervantes Yu [:cyu] [:cervantes]

Comment 51

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #46)
> What if the main thread sets condMutex to null after successfully acquiring
> the lock.  Then in the RECREATE_GATE_VIP branch on the recreated thread we
> can check to see if condMutex is null.  If it is *not*, we unlock the mutex
> in question to allow the main thread to acquire it and we set
> condMutexNeedsBalancing to false (not in this order!) unconditionally so the
> main thread does locks but does not unlock the mutex.

I think this can't fix the problem. The key to this deadlock is that once pthread_cond_wait() wakes up too quickly, the main thread has no chance to unblock from the attempt to acquire condMutex. The deadlock still happens. Moreover, we already break the ground rule of pthread mutexes by lock/unlock in different threads, I don't want to introduce yet another one to fix this.

How about this:
main thread uses pthread_mutex_trylock() with condMutex and check for the case in comment #45. If this happens, we break from the loop and continue to run.

Cervantes Yu [:cyu] [:cervantes]

Comment 52

•

10 years ago

Attached patch Forward fix to the patch (obsolete) (deleted) — Details — Splinter Review

Cervantes Yu [:cyu] [:cervantes]

Comment 53

•

10 years ago

Attached patch Alternative fix (obsolete) (deleted) — Details — Splinter Review

This is the alternative to the patch part 1 and 2, provided by Thinker: we just treat freeze point 2 as if it were freeze point 1 for pthread_cond_wait() and similar functions. Because we don't unlock the mutex by __real_pthread_cond_wait(), we won't mess up the state of the mutex.

Cervantes Yu [:cyu] [:cervantes]

Comment 54

•

10 years ago

Comment on attachment 8466139 [details] [diff] [review]
Alternative fix

This doesn't work deadlock in the recreated process.

Attachment #8466139 - Attachment is obsolete: true

Cervantes Yu [:cyu] [:cervantes]

Comment 55

•

10 years ago

Try push for patch part 1 and 2.

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b06af5a36993

Cervantes Yu [:cyu] [:cervantes]

Updated

•

10 years ago

Attachment #8466132 - Flags: review?(khuey)

Cervantes Yu [:cyu] [:cervantes]

Comment 56

•

10 years ago

Thinker, your approach to move RECREATE_GATE_VIP() up before real pthread_cond_wait() doesn't work because the main thread will have deadlock in acquiring the mutex associated with the condvar. This is why we originally need to distinguish freeze point 1 and 2 and introduce tinfo->reacquireMutex: the mutex will be in different states.

If we can't follow the route to move RECREATE_GATE_VIP() up and pretend it were freeze point 1, we need to take the forward fix although it makes things more complicated.

Flags: needinfo?(tlee)

Ting-Yu Chou [:ting] (away)

Comment 57

•

10 years ago

I am thinking is it possible to let main thread lock the condMutex before recreated thread is running, this way we don't need trylock() in recreated thread, also main thread does not need to balance the mutex afterward.

Thinker Li [:sinker]

Assignee

Comment 58

•

10 years ago

Comment on attachment 8466139 [details] [diff] [review]
Alternative fix

Review of attachment 8466139 [details] [diff] [review]:
-----------------------------------------------------------------

You need to rollback this side-effect.

::: mozglue/build/Nuwa.cpp
@@ +1030,1 @@
>    }

if (freezeCountChg) {
    freezePoint1 = true;
}

@@ +1030,2 @@
>    }
>    rv = REAL(pthread_cond_wait)(cond, mtx);

if (freezcountChg) {
    freezePoint1 = false;
}

Thinker Li [:sinker]

Assignee

Comment 59

•

10 years ago

(In reply to Ting-Yu Chou [:ting] from comment #57)
> I am thinking is it possible to let main thread lock the condMutex before
> recreated thread is running, this way we don't need trylock() in recreated
> thread, also main thread does not need to balance the mutex afterward.

You mean explicit checkpoints.  I don't encourage to use it except no any better solution.

Flags: needinfo?(tlee)

Thinker Li [:sinker]

Assignee

Comment 60

•

10 years ago

(In reply to Thinker Li [:sinker] from comment #58)
> Comment on attachment 8466139 [details] [diff] [review]
> Alternative fix
> 
> Review of attachment 8466139 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> You need to rollback this side-effect.
Sorry!  I missed something.

Thinker Li [:sinker]

Assignee

Comment 61

•

10 years ago

Attached patch fix-pthread_cond_wait.diff (obsolete) (deleted) — Details — Splinter Review

With this patch, we could even remove reacquireMutex.

Thinker Li [:sinker]

Assignee

Comment 62

•

10 years ago

Attached patch fix-pthread_cond_wait.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8466255 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 63

•

10 years ago

Attached patch fix-pthread_cond_wait.diff (obsolete) (deleted) — Details — Splinter Review

Attachment #8466261 - Attachment is obsolete: true

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 64

•

10 years ago

Attached patch Proposed fix (deleted) — Details — Splinter Review

This is my proposal.  I think this is nicer than the trylock/yield loop in Cervantes's patch.  And I have no clue how Thinker's patch is supposed to work.

Attachment #8466296 - Flags: feedback?(tlee)

Attachment #8466296 - Flags: feedback?(cyu)

Thinker Li [:sinker]

Assignee

Comment 65

•

10 years ago

Comment on attachment 8466263 [details] [diff] [review]
fix-pthread_cond_wait.diff

Review of attachment 8466263 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/Nuwa.cpp
@@ +1025,5 @@
> +
> +      rv = REAL(pthread_cond_wait)(cond, &_mtx);
> +      // Acquire the mutex that should be acquired by the above line but
> +      // not.
> +      REAL(pthread_mutex_lock)(mtx);

The idea is the calling, before being frozen, on pthread_cond_wait() had already released |mtx|.  So, we don't need to release it again.  But, pthread_cond_wait() request for passing a locked mutex.  So, this patch gives pthread_cond_wait() another locked mutex, |_mtx|.  So, pthread_cond_wait() would release |_mtx| for entering, and acquire |_mtx| back for returning.

For the callers of the wrapper, |mtx| should be acquired back for returning from pthread_cond_wait().  That is why to lock |mtx| here.

With this model, we don't need to play the reacquire trick any more.

Thinker Li [:sinker]

Assignee

Comment 66

•

10 years ago

Comment on attachment 8466296 [details] [diff] [review]
Proposed fix

Review of attachment 8466296 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/Nuwa.cpp
@@ +1456,1 @@
>          if (tinfo->condMutexNeedsBalancing) {

Since modifying condMutex and testing condMutexNeedsBalancing are not atomic, there is racing.

Attachment #8466296 - Flags: feedback?(tlee)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 67

•

10 years ago

(In reply to Thinker Li [:sinker] from comment #66)
> Comment on attachment 8466296 [details] [diff] [review]
> Proposed fix
> 
> Review of attachment 8466296 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozglue/build/Nuwa.cpp
> @@ +1456,1 @@
> >          if (tinfo->condMutexNeedsBalancing) {
> 
> Since modifying condMutex and testing condMutexNeedsBalancing are not
> atomic, there is racing.

After passing RECREATE_CONTINUE/RECREATE_WAIT condMutexNeedsBalancing is never read or written without holding condMutex.  Similarly after RECREATE_CONTINE/RECREATE_WAIT we do *read* condMutex without any synchronization on the main thread but the main thread is also the only thread that writes to condMutex, which it only does with condMutex held.  And the other thread that reads condMutex, the recreated thread, must hold the lock to do so.  So I don't see how this races.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 68

•

10 years ago

(In reply to Thinker Li [:sinker] from comment #65)
> Comment on attachment 8466263 [details] [diff] [review]
> fix-pthread_cond_wait.diff
> 
> Review of attachment 8466263 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozglue/build/Nuwa.cpp
> @@ +1025,5 @@
> > +
> > +      rv = REAL(pthread_cond_wait)(cond, &_mtx);
> > +      // Acquire the mutex that should be acquired by the above line but
> > +      // not.
> > +      REAL(pthread_mutex_lock)(mtx);
> 
> The idea is the calling, before being frozen, on pthread_cond_wait() had
> already released |mtx|.  So, we don't need to release it again.  But,
> pthread_cond_wait() request for passing a locked mutex.  So, this patch
> gives pthread_cond_wait() another locked mutex, |_mtx|.  So,
> pthread_cond_wait() would release |_mtx| for entering, and acquire |_mtx|
> back for returning.
> 
> For the callers of the wrapper, |mtx| should be acquired back for returning
> from pthread_cond_wait().  That is why to lock |mtx| here.
> 
> With this model, we don't need to play the reacquire trick any more.

Ok, I understand what you are saying.  There's no need for _mtx; you can pass nullptr to pthread_cond_wait.  That would make it much easier to see what's going on.

I haven't completely convinced myself it is correct, but it looks good.  It's 3 AM here though so I'll think about it some more tomorrow.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 69

•

10 years ago

Comment on attachment 8466132 [details] [diff] [review]
Forward fix to the patch

I think we want either Thinker's patch or my patch.

Attachment #8466132 - Flags: review?(khuey) → review-

Thinker Li [:sinker]

Assignee

Comment 70

•

10 years ago

Comment on attachment 8466296 [details] [diff] [review]
Proposed fix

Review of attachment 8466296 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/Nuwa.cpp
@@ +1133,5 @@
>    }
>    rv = REAL(__pthread_cond_timedwait)(cond, mtx, abstime, clock);
>    if (recreated && mtx) {
> +    if (tinfo->condMutex) {
> +      tinfo->condMutexNeedsBalancing = false;

Here 1.1) reads |condMutex| and 1.2) writes |condMutexNeedsBalancing|.  It is possible to preempt the thread between reading and writing.

In RecreateThreads(), it 2.1) writes |condMutex|, then 2.2) reads |condMutexNeedsBalancing|.

The following case would cause problems.

  ... -> 1.1 -> 2.1 -> 2.2 -> 1.2 -> ...

pthread_mutex_unlock() would be called twice on the mutex given by |mtx|.

Do I miss any point?

Thinker Li [:sinker]

Assignee

Comment 71

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #67)
> And the other thread that reads condMutex, the recreated thread, must hold
> the lock to do so.  So I don't see how this races.
Ok!  I get the point, you are right.  pthread_cond_timedwait() need to contest for the mutex with the main thread.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 72

•

10 years ago

(In reply to Thinker Li [:sinker] from comment #70)
> Comment on attachment 8466296 [details] [diff] [review]
> Proposed fix
> 
> Review of attachment 8466296 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozglue/build/Nuwa.cpp
> @@ +1133,5 @@
> >    }
> >    rv = REAL(__pthread_cond_timedwait)(cond, mtx, abstime, clock);
> >    if (recreated && mtx) {
> > +    if (tinfo->condMutex) {
> > +      tinfo->condMutexNeedsBalancing = false;
> 
> Here 1.1) reads |condMutex| and 1.2) writes |condMutexNeedsBalancing|.  It
> is possible to preempt the thread between reading and writing.
> 
> In RecreateThreads(), it 2.1) writes |condMutex|, then 2.2) reads
> |condMutexNeedsBalancing|.
> 
> The following case would cause problems.
> 
>   ... -> 1.1 -> 2.1 -> 2.2 -> 1.2 -> ...
> 
> pthread_mutex_unlock() would be called twice on the mutex given by |mtx|.
> 
> Do I miss any point?

You're missing that both 1.1 and 1.2 happen after pthread_cond_wait returns, so the lock is held, and both 2.1 and 2.2 happen after calling pthread_mutex_lock(tinfo->condMutex), so the lock is held.

Thinker Li [:sinker]

Assignee

Comment 73

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #68)
> Ok, I understand what you are saying.  There's no need for _mtx; you can
> pass nullptr to pthread_cond_wait.
I am not sure we could pass a nullptr here.  At beginning, I also think so, but after checking the man page I am not sure we could do that, since I don't see any about passing a nullptr in the man page.

Cervantes Yu [:cyu] [:cervantes]

Comment 74

•

10 years ago

(In reply to Thinker Li [:sinker] from comment #73)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #68)
> > Ok, I understand what you are saying.  There's no need for _mtx; you can
> > pass nullptr to pthread_cond_wait.
> I am not sure we could pass a nullptr here.  At beginning, I also think so,
> but after checking the man page I am not sure we could do that, since I
> don't see any about passing a nullptr in the man page.

The man page also states that using more than one mutex with the condvar results in undefined behavior (the trick to reacquire condMutex is also undefined behavior though).

Cervantes Yu [:cyu] [:cervantes]

Comment 75

•

10 years ago

Comment on attachment 8466296 [details] [diff] [review]
Proposed fix

Review of attachment 8466296 [details] [diff] [review]:
-----------------------------------------------------------------

I think this patch should work it always passes information between threads with condMutex locked.

Attachment #8466296 - Flags: feedback?(cyu) → feedback+

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 76

•

10 years ago

So after thinking about it some more I am pretty convinced that your patch works.  The only potential problem I see is that we will no longer correctly handle the a case where the thread had passed freeze point 1 but not yet released the mutex and blocked in pthread_cond_wait when fork() was called correctly (because we will try to acquire the mutex twice).  But we have similar issues elsewhere (e.g. if pthread_mutex_lock has returned and acquired the lock but we have not reached freeze point 2) so I'm not concerned about it.  I do wonder if this is the only case where we lock/unlock mutexes from the "wrong" thread.  If so, I guess this doesn't change the amount of undefined behavior we rely on.

As far as I am concerned you own this code, so you can decide if you want to use your approach or Ting's patch plus mine.  If you want to use yours, you need to do at least the following:

1. If pthread_cond_wait requires a mutex, lock and unlock _mtx around the added pthread_cond_wait call.  We should also file a bug on removing our null-checking for mtx.
2. If pthread_cond_wait does not require a mutex, remove _mtx.
3. Remove the existing infrastructure to support reacquireMutex and the comments about it.
4. Implement this in all of the pthread_cond_wait functions.

I would r+ a patch with that (and good comments explaining what we're doing).

Flags: needinfo?(tlee)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Attachment #8466132 - Attachment is obsolete: true

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 77

•

10 years ago

Whatever we end up doing here we're going to want to backport all the way to 1.3.

status-b2g-v1.2: --- → unaffected

status-b2g-v1.3: --- → affected

status-b2g-v1.3T: --- → affected

status-b2g-v1.4: --- → affected

Ting-Yu Chou [:ting] (away)

Comment 78

•

10 years ago

Reassign to Thinker as he is creating another patch, which is different from attachment 8466263 [details] [diff] [review].

Assignee: tchou → tlee

Thinker Li [:sinker]

Assignee

Comment 79

•

10 years ago

Attached patch Make mutex acquiring being managed and in ordered (obsolete) (deleted) — Details — Splinter Review

Thinker Li [:sinker]

Assignee

Comment 80

•

10 years ago

Attached patch Make mutex acquiring being managed and in ordered, v2 (obsolete) (deleted) — Details — Splinter Review

--
Attachment #8466967 [details] [diff] - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Updated

•

10 years ago

Attachment #8466967 - Attachment is obsolete: true

Cervantes Yu [:cyu] [:cervantes]

Comment 81

•

10 years ago

I am thinking about a simple alternative: all the hassles with trylock, reacquiring mutex is that pthread_cond_wait() releases the lock before entering the sleeping state, and that the mutex with the condvar could be later acquired by another thread.

What if we just maintain the assertion that pthread_cond_wait() should be called with the locked mutex if the thread was frozen in pthread_cond_wait()? As long as thread recreation is never blocked, all threads that are depending on the mutex will be resynced using the mutex. The effect will be just like a small hiccup in thread scheduling.

Cervantes Yu [:cyu] [:cervantes]

Comment 82

•

10 years ago

Attached file The patch to resync before pthread_cond_wait() (obsolete) (deleted) — Details

Thinker Li [:sinker]

Assignee

Updated

•

10 years ago

Flags: needinfo?(tlee)

Cervantes Yu [:cyu] [:cervantes]

Comment 83

•

10 years ago

(In reply to Cervantes Yu from comment #82)
> Created attachment 8467012 [details]
> The patch to resync before pthread_cond_wait()

OK, the problem with it is that we have no control over which thread acquires the mutex first. If another thread acquires the mutex and pthread_cond_signal()s the thread that should pthread_cond_wait() but haven't yet, we'll lose the signal. For this to work, we need to further wrap pthread_cond_signal() and pthread_cond_broadcast(), which seems to further complicate the design.

Cervantes Yu [:cyu] [:cervantes]

Updated

•

10 years ago

Attachment #8467012 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 84

•

10 years ago

Attached patch Make mutex acquiring being managed and in ordered, v3 (obsolete) (deleted) — Details — Splinter Review

--
Attachment #8467008 [details] [diff] - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Updated

•

10 years ago

Attachment #8467008 - Attachment is obsolete: true

Thinker Li [:sinker]

Assignee

Comment 85

•

10 years ago

Comment on attachment 8467083 [details] [diff] [review]
Make mutex acquiring being managed and in ordered, v3

This proposal seems less hacky and simple in logic by avoiding the problem caused by reacquiring in a different thread.  I add an explanation of the proposal just before the class |mutex_lock_request| in the patch.

Attachment #8467083 - Flags: feedback?(khuey)

Attachment #8467083 - Flags: feedback?(cyu)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 86

•

10 years ago

Comment on attachment 8467083 [details] [diff] [review]
Make mutex acquiring being managed and in ordered, v3

Review of attachment 8467083 [details] [diff] [review]:
-----------------------------------------------------------------

This is too risky to take for 2.0.  The patch has obvious syntax errors which tells me it's totally untested.  If you want to make these changes let's do it in a bug that is not a blocker.

Attachment #8467083 - Flags: feedback?(khuey) → feedback-

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 87

•

10 years ago

Comment on attachment 8466296 [details] [diff] [review]
Proposed fix

Let's do the simple thing here.

Attachment #8466296 - Flags: review?(tlee)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Comment 88

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #87)
> Comment on attachment 8466296 [details] [diff] [review]
> Proposed fix
> 
> Let's do the simple thing here.

Could you please rebase your patch for v2.0 ?

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Updated

•

10 years ago

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 89

•

10 years ago

Attached patch Proposed patch for 2.0 (deleted) — Details — Splinter Review

This is a combination of Ting-Yu's patch and my patch, rebased for 2.0.

Flags: needinfo?(khuey)

Cervantes Yu [:cyu] [:cervantes]

Comment 90

•

10 years ago

Comment on attachment 8467083 [details] [diff] [review]
Make mutex acquiring being managed and in ordered, v3

The approach is too risky for 2.0 branch.

Since the patch provides a new way to gate the threads that are recreated, it'd better that we revisit the original gating macros, including the VIP version and non-VIP version. Maybe we can remove the gating macros and just use the newer functions. But we need to do it in a separate bug.

Attachment #8467083 - Flags: feedback?(cyu) → feedback-

Thinker Li [:sinker]

Assignee

Comment 91

•

10 years ago

Comment on attachment 8466296 [details] [diff] [review]
Proposed fix

Review of attachment 8466296 [details] [diff] [review]:
-----------------------------------------------------------------

There is good clarification of this patch here.

Attachment #8466296 - Flags: review?(tlee) → review+

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 92

•

10 years ago

I landed Ting-Yu's patch and mine as a single commit.

https://hg.mozilla.org/integration/b2g-inbound/rev/a1c2de94aef4

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 93

•

10 years ago

We should let this bake on trunk for a couple days before uplifting.

Ryan VanderMeulen [:RyanVM]

Comment 94

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/a1c2de94aef4

Status: ASSIGNED → RESOLVED

Closed: 10 years ago → 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

status-b2g-v1.3: affected → wontfix

status-b2g-v1.3T: affected → wontfix

status-b2g-v2.1: affected → fixed

status-firefox34: affected → fixed

Whiteboard: [caf priority: p2][CR 698221] → [caf priority: p2][CR 698221][NO_UPLIFT]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 95

•

10 years ago

Nobody has yelled at me for blowing up the phone, so let's uplift.

Whiteboard: [caf priority: p2][CR 698221][NO_UPLIFT] → [caf priority: p2][CR 698221]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

status-b2g-v1.3: wontfix → affected

status-b2g-v1.3T: wontfix → affected

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Attachment #8467083 - Attachment is obsolete: true

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

Attachment #8466263 - Attachment is obsolete: true

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 96

•

10 years ago

Comment on attachment 8467382 [details] [diff] [review]
Proposed patch for 2.0

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Nuwa
User impact if declined: We might fail to start apps if we lose a race condition.
Testing completed: Baked on m-c for a day or two, I've been using it locally on my phone for 5 days with no issues.  About to go to 2.0 too.
Risk to taking this patch (and alternatives if risky): Medium risk, only because this is scary code.  I'm pretty confident in this patch.  The alternative is to ignore this and potentially deadlock when launching apps.
String or UUID changes made by this patch: None

We should take this on 1.4 and 1.3 for anyone who wants to do builds off of those branches to pick up this fix for a regression introduced in 1.3.

Attachment #8467382 - Flags: approval-mozilla-b2g30?

Attachment #8467382 - Flags: approval-mozilla-b2g28?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 97

•

10 years ago

SPRD probably wants this for 1.4 and 1.3t.

Ryan VanderMeulen [:RyanVM]

Comment 98

•

10 years ago

https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/641b3758537b

status-b2g-v2.0: affected → fixed

bhavana bajaj [:bajaj]

Updated

•

10 years ago

Attachment #8467382 - Flags: approval-mozilla-b2g30?

Attachment #8467382 - Flags: approval-mozilla-b2g30+

Attachment #8467382 - Flags: approval-mozilla-b2g28?

Attachment #8467382 - Flags: approval-mozilla-b2g28+

Ryan VanderMeulen [:RyanVM]

Comment 99

•

10 years ago

https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/ff90d9fc269d
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/5ec370c6581b

status-b2g-v1.3: affected → fixed

status-b2g-v1.4: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

status-b2g-v1.3T: affected → fixed

You need to log in before you can comment on or make changes to this bug.