Closed Bug 516497 Opened 15 years ago Closed 15 years ago

Talos swaps reporting of privatebytes and RSS on Mac

Categories

(Release Engineering :: General, defect, P1)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jrmuizel, Assigned: coop)

Details

Attachments

(2 files, 4 obsolete files)

Talos currently reports VSZ as RSS and RSS as PrivateBytes because it uses the wrong indexes into the psData array. We should fix the reporting and perhaps change the code so that indexing happens during parsing and not in a separate function. This would have avoided this error in the first place. Something like the following perhaps: (pid, vss, rss) = line.split() ? Also, private bytes is not a very good name for virtual size.
Not sure how keen Alice is about disturbing this code more deeply than this. Most of it seems to be Annie-era. The patch gets the indices right at least.
Assignee: nobody → ccooper
Status: NEW → ASSIGNED
Attachment #401037 - Flags: review?(anodelman)
This'll be a bit of a flag-day event if it lands, right? Data from before the fix will be backwards? Is it feasible to contemplate a graph server DB operation to "fix history" as well?
(In reply to comment #2) > This'll be a bit of a flag-day event if it lands, right? Data from before the > fix will be backwards? Is it feasible to contemplate a graph server DB > operation to "fix history" as well? Should be simple to do db query to transpose the two numbers from all historical Mac data. Real downtime would make it easier.
I don't think that this should be landed without a) a downtime and b) a graph server update to repair old data.
Attachment #401037 - Flags: review?(anodelman) → review+
I didn't actually see how long this took run to completion...killed my screen session by accident this morning. It was still running after 4 hours on the staging db last night though. :( I'm not really familiar with mysqldb in python, but there were example scripts in the repo already and this seemed like a straightforward application. I'm more familiar with prepared statements in perl, so if there's a better way to cache to the statement in python for re-use, please let me know. The meat of the script is simple: * find all the private bytes (VSS) results on mac machines; * iterate over them one at a time, looking for a matching RSS result set (matched based on date and machine) * in the executemany, change the VSS result ids to 1, change the RSS result ids to the original VSS result id, change results with id 1 to the original RSS result id I use 1 as the placeholder ID for the swap here because I assume that older data with that ID will have been cleared out long ago. I can change it pretty trivially if that's not the case in production. One possible bright spot: justdave gave me 115,385,147 as the number of rows in the values table in production. The staging db has 264,134,321 in the same table.
Attachment #402080 - Flags: review?(catlee)
Attachment #402080 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 402080 [details] Python script to swap the existing private bytes (VSS) and RSS values in the graphserver db I'm not writing to the right schema, apparently. :(
Attachment #402080 - Attachment is obsolete: true
Attachment #402080 - Flags: review?(catlee)
OK, think I'm targeting the correct schema now. I did a test run in staging with the script limited to 10000 rows. Here's the tail of the output from that run: # VSS rows: 10000 # dupes: 41 # unmatched: 1 # swaps: 9958 total # rows: 3550000 real 2m29.331s user 0m2.885s sys 0m6.892s The "total # rows" is for all 3 updates (VSS->1, RSS->VSS, 1->RSS). Some rough math based on dividing that row count in 3, gives me just over 4 hours to process all 115 million rows in the production db using the script in it's current incarnation. This script also doesn't do anything about re-mapping tests where there is no matching corresponding RSS test (unmatched) or where there are multiple corresponding RSS test matches (dupes). Not sure how aggressively we want to target those.
Attachment #402117 - Flags: review?(catlee)
OK, now that I'm writing queries against the correct schema, things are much easier. We leave the data in place, and just swap the test_id in the test_runs table instead which makes for a much speedier process. The new script swapped all relevant rows (47376) in the staging test_runs table in 9 seconds, so I don't foresee any problems for the downtime on Thursday.
Attachment #402117 - Attachment is obsolete: true
Attachment #402262 - Flags: review?(catlee)
Attachment #402117 - Flags: review?(catlee)
Comment on attachment 402262 [details] Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db Looks good. We should backup (or have IT backup) the database right before running this.
Attachment #402262 - Flags: review?(catlee) → review+
Comment on attachment 402262 [details] Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db Dave: we'll need someone from IT to run this script against the production graph server database during our planned releng downtime tomorrow (thurs sept 24). We'll also want IT to perform a back-up of the db prior to running the script, just in case. Can you give the script a quick once-over, and also let me know who is likely to be handling the IT side of things tomorrow AM during the downtime? Thanks.
Attachment #402262 - Flags: review?(justdave)
(In reply to comment #10) > (From update of attachment 402262 [details]) > Dave: we'll need someone from IT to run this script against the production > graph server database during our planned releng downtime tomorrow (thurs sept > 24). > We'll also want IT to perform a back-up of the db prior to running the script, > just in case. > > Can you give the script a quick once-over, and also let me know who is likely > to be handling the IT side of things tomorrow AM during the downtime? Thanks. We're looking at 8am-11am EDT currently for our downtime.
Attachment #401037 - Flags: checked-in+
Comment on attachment 401037 [details] [diff] [review] Use correct indices for virtual size and resident size. Checking in cmanager_mac.py; /cvsroot/mozilla/testing/performance/talos/cmanager_mac.py,v <-- cmanager_mac.py new revision: 1.6; previous revision: 1.5 done
Ran a backup dump of the DB followed by the script after catlee gave a go ahead.
Attached file Updated script (obsolete) (deleted) —
Attachment #402262 - Attachment is obsolete: true
Attachment #402262 - Flags: review?(justdave)
Attachment #402582 - Attachment mime type: text/x-python → text/plain
Production graphs are correctly showing the swapped values now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Managed to switch rss and pbytes for all platforms instead of just mac. Db corruption needs to be fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Priority: -- → P1
Attachment #402582 - Attachment is obsolete: true
Attachment #402705 - Flags: review?(anodelman)
Comment on attachment 402705 [details] Updated catlee's script to use a sub-select, more verbose output I'm willing to give this a try.
Attachment #402705 - Flags: review?(anodelman) → review+
Comment on attachment 402705 [details] Updated catlee's script to use a sub-select, more verbose output This looks ok to me.
Attachment #402705 - Flags: review+ → review?
Aravind ran the updated script and it seems to have worked. Spikes are gone from the linux and win32 graphs, and Mac remains the same.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Attachment #402705 - Flags: review?
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: