Closed
Bug 516497
Opened 15 years ago
Closed 15 years ago
Talos swaps reporting of privatebytes and RSS on Mac
Categories
(Release Engineering :: General, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jrmuizel, Assigned: coop)
Details
Attachments
(2 files, 4 obsolete files)
(deleted),
patch
|
anodelman
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details |
Talos currently reports VSZ as RSS and RSS as PrivateBytes because it uses
the wrong indexes into the psData array.
We should fix the reporting and perhaps change the code so that indexing happens during parsing and not in a separate function. This would have avoided this error in the first place.
Something like the following perhaps:
(pid, vss, rss) = line.split() ?
Also, private bytes is not a very good name for virtual size.
Assignee | ||
Comment 1•15 years ago
|
||
Not sure how keen Alice is about disturbing this code more deeply than this. Most of it seems to be Annie-era. The patch gets the indices right at least.
Comment 2•15 years ago
|
||
This'll be a bit of a flag-day event if it lands, right? Data from before the fix will be backwards? Is it feasible to contemplate a graph server DB operation to "fix history" as well?
Assignee | ||
Comment 3•15 years ago
|
||
(In reply to comment #2)
> This'll be a bit of a flag-day event if it lands, right? Data from before the
> fix will be backwards? Is it feasible to contemplate a graph server DB
> operation to "fix history" as well?
Should be simple to do db query to transpose the two numbers from all historical Mac data. Real downtime would make it easier.
Comment 4•15 years ago
|
||
I don't think that this should be landed without a) a downtime and b) a graph server update to repair old data.
Updated•15 years ago
|
Attachment #401037 -
Flags: review?(anodelman) → review+
Assignee | ||
Comment 5•15 years ago
|
||
I didn't actually see how long this took run to completion...killed my screen session by accident this morning. It was still running after 4 hours on the staging db last night though. :(
I'm not really familiar with mysqldb in python, but there were example scripts in the repo already and this seemed like a straightforward application. I'm more familiar with prepared statements in perl, so if there's a better way to cache to the statement in python for re-use, please let me know.
The meat of the script is simple:
* find all the private bytes (VSS) results on mac machines;
* iterate over them one at a time, looking for a matching RSS result set (matched based on date and machine)
* in the executemany, change the VSS result ids to 1, change the RSS result ids to the original VSS result id, change results with id 1 to the original RSS result id
I use 1 as the placeholder ID for the swap here because I assume that older data with that ID will have been cleared out long ago. I can change it pretty trivially if that's not the case in production.
One possible bright spot: justdave gave me 115,385,147 as the number of rows in the values table in production. The staging db has 264,134,321 in the same table.
Attachment #402080 -
Flags: review?(catlee)
Assignee | ||
Updated•15 years ago
|
Attachment #402080 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 6•15 years ago
|
||
Comment on attachment 402080 [details]
Python script to swap the existing private bytes (VSS) and RSS values in the graphserver db
I'm not writing to the right schema, apparently. :(
Attachment #402080 -
Attachment is obsolete: true
Attachment #402080 -
Flags: review?(catlee)
Assignee | ||
Comment 7•15 years ago
|
||
OK, think I'm targeting the correct schema now.
I did a test run in staging with the script limited to 10000 rows. Here's the tail of the output from that run:
# VSS rows: 10000
# dupes: 41
# unmatched: 1
# swaps: 9958
total # rows: 3550000
real 2m29.331s
user 0m2.885s
sys 0m6.892s
The "total # rows" is for all 3 updates (VSS->1, RSS->VSS, 1->RSS). Some rough math based on dividing that row count in 3, gives me just over 4 hours to process all 115 million rows in the production db using the script in it's current incarnation.
This script also doesn't do anything about re-mapping tests where there is no matching corresponding RSS test (unmatched) or where there are multiple corresponding RSS test matches (dupes). Not sure how aggressively we want to target those.
Attachment #402117 -
Flags: review?(catlee)
Assignee | ||
Comment 8•15 years ago
|
||
OK, now that I'm writing queries against the correct schema, things are much easier. We leave the data in place, and just swap the test_id in the test_runs table instead which makes for a much speedier process.
The new script swapped all relevant rows (47376) in the staging test_runs table in 9 seconds, so I don't foresee any problems for the downtime on Thursday.
Attachment #402117 -
Attachment is obsolete: true
Attachment #402262 -
Flags: review?(catlee)
Attachment #402117 -
Flags: review?(catlee)
Comment 9•15 years ago
|
||
Comment on attachment 402262 [details]
Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db
Looks good.
We should backup (or have IT backup) the database right before running this.
Attachment #402262 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 10•15 years ago
|
||
Comment on attachment 402262 [details]
Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db
Dave: we'll need someone from IT to run this script against the production graph server database during our planned releng downtime tomorrow (thurs sept 24).
We'll also want IT to perform a back-up of the db prior to running the script, just in case.
Can you give the script a quick once-over, and also let me know who is likely to be handling the IT side of things tomorrow AM during the downtime? Thanks.
Attachment #402262 -
Flags: review?(justdave)
Comment 11•15 years ago
|
||
(In reply to comment #10)
> (From update of attachment 402262 [details])
> Dave: we'll need someone from IT to run this script against the production
> graph server database during our planned releng downtime tomorrow (thurs sept
> 24).
> We'll also want IT to perform a back-up of the db prior to running the script,
> just in case.
>
> Can you give the script a quick once-over, and also let me know who is likely
> to be handling the IT side of things tomorrow AM during the downtime? Thanks.
We're looking at 8am-11am EDT currently for our downtime.
Updated•15 years ago
|
Attachment #401037 -
Flags: checked-in+
Comment 12•15 years ago
|
||
Comment on attachment 401037 [details] [diff] [review]
Use correct indices for virtual size and resident size.
Checking in cmanager_mac.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_mac.py,v <-- cmanager_mac.py
new revision: 1.6; previous revision: 1.5
done
Comment 13•15 years ago
|
||
Ran a backup dump of the DB followed by the script after catlee gave a go
ahead.
Comment 14•15 years ago
|
||
Attachment #402262 -
Attachment is obsolete: true
Attachment #402262 -
Flags: review?(justdave)
Updated•15 years ago
|
Attachment #402582 -
Attachment mime type: text/x-python → text/plain
Assignee | ||
Comment 15•15 years ago
|
||
Production graphs are correctly showing the swapped values now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 16•15 years ago
|
||
Managed to switch rss and pbytes for all platforms instead of just mac. Db corruption needs to be fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•15 years ago
|
Status: REOPENED → ASSIGNED
Priority: -- → P1
Assignee | ||
Comment 17•15 years ago
|
||
Attachment #402582 -
Attachment is obsolete: true
Attachment #402705 -
Flags: review?(anodelman)
Comment 18•15 years ago
|
||
Comment on attachment 402705 [details]
Updated catlee's script to use a sub-select, more verbose output
I'm willing to give this a try.
Attachment #402705 -
Flags: review?(anodelman) → review+
Comment 19•15 years ago
|
||
Comment on attachment 402705 [details]
Updated catlee's script to use a sub-select, more verbose output
This looks ok to me.
Attachment #402705 -
Flags: review+ → review?
Assignee | ||
Comment 20•15 years ago
|
||
Aravind ran the updated script and it seems to have worked. Spikes are gone from the linux and win32 graphs, and Mac remains the same.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•15 years ago
|
Attachment #402705 -
Flags: review?
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•