Closed Bug 1690570 Opened 4 years ago Closed 4 years ago

Testing on Android: Delegate test runner to run on Android

Tracking

()

Status:

RESOLVED FIXED

Milestone:

88 Branch

Tracking Flags:

Tracking

Status

firefox88

---

fixed

People

(Reporter: nbp, Assigned: nbp)

References

Details

Attachments

(2 files)

Bug 1690570 - Fix python style in gdb/run-tests.py . 4 years ago Nicolas B. Pierron [:nbp] (deleted), text/x-phabricator-request		Details
Bug 1690570 - Batch JS Shell tests on Android to run a single ADB command. 4 years ago Nicolas B. Pierron [:nbp] (deleted), text/x-phabricator-request		Details

Nicolas B. Pierron [:nbp]

Assignee

Description

•

4 years ago

Lately, we landed some optimization to execute the test suite faster by sharing the XDR version of the self-hosted code. This optimization had little to no impact on Android test suite execution.

One hypothesis is that the ADB round-trip for spawning each test is too costly, and made the improvement of the self-hosted code neglectable compared to the ADB overhead.

While this bug has the potential for improving the test suite, it is also an interesting data-point on whether we should pursue on-disk caching on the self-hosted code as XDR or not.

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 years ago

Blocks: 1458339

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 years ago

Blocks: 1692096

Nicolas B. Pierron [:nbp]

Assignee

Comment 1

•

4 years ago

First shot at this, suggest that we might be able to save ~24 minutes per android JIT job!
Which would be roughly 24h (= 24min * 10 * 6) in total per full-test.

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 years ago

Depends on: 1692253

Nicolas B. Pierron [:nbp]

Assignee

Comment 2

•

4 years ago

Attached file Bug 1690570 - Fix python style in gdb/run-tests.py . (deleted) — Details

Nicolas B. Pierron [:nbp]

Assignee

Comment 3

•

4 years ago

Attached file Bug 1690570 - Batch JS Shell tests on Android to run a single ADB command. (deleted) — Details

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 years ago

Blocks: 1531175

Nicolas B. Pierron [:nbp]

Assignee

Comment 4

•

4 years ago

Ok, after applying these changes, I see that random tests which are supposed to return immediately, even skipped tests are doing timeout.
I did not see any of these before, and I wonder if this could be some form of shutdown hang in the JS Shell.

I will instrument to dump the stack on SIGTERM.

Lars T Hansen [:lth]

Comment 5

•

4 years ago

FWIW, I've experienced adb just terminating apparently at random when running tests locally.

Nicolas B. Pierron [:nbp]

Assignee

Comment 6

•

4 years ago

(In reply to Lars T Hansen [:lth] from comment #5)

FWIW, I've experienced adb just terminating apparently at random when running tests locally.

With the patch all the tests are run by a single shell script which is executed with a single adb command.
The JS shell is wrapped by a timeout command provided by busybox.

Honestly, I am tempted to land the changes, even if these timeout issue exist, as this would be a huge time saving even if we have to restart a few test failure caused by these timeout.

Nicolas B. Pierron [:nbp]

Assignee

Comment 7

•

4 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #4)

I see that random tests which are supposed to return immediately, even skipped tests are doing timeout.

I managed to instrument a JS Shell in such a way that it prints the stack¹, one of the failure reported the following stack, unfortunately lacking any JS shell symbols:

#01: ???[/data/local/tmp/test_root/bin/js +0x5e904]
#02: ???[/system/lib/libc.so +0x18aa8]
#03: syscall[/system/lib/libc.so +0x18e08]
#04: pthread_join[/system/lib/libc.so +0x47cf0]

Which seems to indicate that one of the Helper thread did not join back after the completion of the JS Shell.

¹ I learnt that inside mozglue with has MozStack* functions which are using various backend implementation and are able to recover this information.

Nicolas B. Pierron [:nbp]

Assignee

Comment 8

•

4 years ago

Doing extra printf debugging, suggests that this could be an issue within the code which is joining the Helper threads.
However, I do not understand:

Why aren't the processes killed after receiving a SIGTERM, which is supposed to be forwarded.
Why the hang seems to get resolved as soon as the signal is received.
What would be the issue in the HelperThread waiting system.

This try link show different tests failing, where the stack is printed when a SIGTERM is received and where fprintf are added next to all pthread_join calls.

The output seems to suggest that the we start joining with the HelperThread, but suddently fail at joining more threads, and wait until the signal handler is called, to resume joining.

Nicolas B. Pierron [:nbp]

Assignee

Comment 9

•

4 years ago

The following try link suggest that the failure is not restricted to HelperThread but can also happen with WorkerThread.

After some discussion with the team, it was decided that the best alternative, given the low likelyhood and the randomness of the failure would be to re-run the tests which are failing due to a timeout, and only report the last execution.

This would reduce other timeout but would most likely mute pthread_join hanging issues.

Pulsebot

Comment 10

•

4 years ago

Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6ce356a95473
Fix python style in gdb/run-tests.py . r=tcampbell
https://hg.mozilla.org/integration/autoland/rev/e400022129be
Batch JS Shell tests on Android to run a single ADB command. r=tcampbell

Cosmin Sabou [:CosminS]

Comment 11

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6ce356a95473
https://hg.mozilla.org/mozilla-central/rev/e400022129be

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox88: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 88 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Testing on Android: Delegate test runner to run on Android

Categories

(Core :: JavaScript Engine, enhancement, P1)

Tracking

()

People

(Reporter: nbp, Assigned: nbp)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type