387141 - Port Graph Server to mysql

Reporter

Description

•

17 years ago

Currently the Graph Server uses sqlite as the backing store. For scalability and data back reasons we should port it to use mysql (or postgres) and leverage the MySQL clusters we have setup for backups and performance. We are definitely seeing slowdowns on the production graphs.mozilla.org.

Jeremy Orem [:oremj]

Assignee

Updated

•

17 years ago

Assignee: nobody → oremj

Jeremy Orem [:oremj]

Assignee

Updated

•

17 years ago

Blocks: 386671

Jeremy Orem [:oremj]

Assignee

Comment 1

•

17 years ago

Attached patch Allows getdata.cgi to use MySQL (obsolete) (deleted) — Details — Splinter Review

This patch includes a sqlite to mysql migration script. We will still need to alter and verity annotate.cgi, bonsaibouncer.cgi,bulk.cgi,and collect.cgi. I'm not sure where these scripts are receiving input, so I'm not quite sure how I am going to test them.

Attachment #272070 - Flags: review?(vladimir)

Jeremy Orem [:oremj]

Assignee

Updated

•

17 years ago

Attachment #272070 - Flags: review?(anodelman)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 2

•

17 years ago

Comment on attachment 272070 [details] [diff] [review] Allows getdata.cgi to use MySQL I'm not familiar with the mysql execute interface -- will %s do the same thing as ? did? That is, do the binding such that the db engine sees a single data argument in that spot (hopefully without needing to explicitly quote it and the like)? I thought that the python dbapi was portable across databases? It would be nice if that was resolved so that we could use a single query string for both, since the ability to use sqlite in a quick standalone setup is very helpful. >diff --git a/getdata.cgi b/getdata.cgi >index 5deffd8..897fb4c 100755 >--- a/getdata.cgi >+++ b/getdata.cgi >@@ -103,7 +100,7 @@ def doListTests(fo, type, datelimit, branch, machine, testname): > "test_type": row[3], > "date": row[4], > "extra_data": row[5], >- "branch": row[6]}) >+ "branch": float(row[6])}) Do we really want to coerce branch to a float? It can be a full string; right now we're only using 1.9 and 1.8, but it might be a full symbolic name... >diff --git a/scripts/migration.py b/scripts/migration.py >new file mode 100755 >index 0000000..38bb73e >--- /dev/null >+++ b/scripts/migration.py >@@ -0,0 +1,27 @@ >+from pysqlite2 import dbapi2 as sqlite >+import MySQLdb, sys, os >+ >+DIRNAME= os.path.dirname(sys.argv[0]) >+if not DIRNAME: >+ DBPATH="../db/data.sqlite" >+else: >+ DBPATH=DIRNAME + "/../db/data.sqlite" >+ >+sqlite_db = sqlite.connect(DBPATH) >+ >+mysql_db = MySQLdb.connect("localhost","o","o","o_graphs") >+mysql_cur = mysql_db.cursor() >+ >+def migrate_table(table, select, insert): >+ print "Migrating: " + table >+ sqlite_cur = sqlite_db.cursor() >+ res = sqlite_cur.execute(select) >+ for row in res: >+ mysql_cur.execute(insert % row) This is kinda scary -- doing a % format with just %s will cause this to break on at least any ' characters that are embedded in any field. Assuming that mysql.execute will do the right quoting with %s, would execute(insert, row) work? >+ >+migrate_table('annotations',"SELECT dataset_id,time,value FROM annotations", "INSERT INTO annotations (`dataset_id`, `time`, `value` ) VALUES ('%s','%s','%s')") >+migrate_table('dataset_branchinfo',"SELECT `dataset_id`, `time`, `branchid` FROM dataset_branchinfo", "INSERT INTO dataset_branchinfo (`dataset_id`, `time`, `branchid` ) VALUES ('%s','%s','%s')") >+migrate_table('dataset_extra_data',"SELECT `dataset_id`, `time`, `data` FROM dataset_extra_data", "INSERT INTO dataset_extra_data (`dataset_id`, `time`, `data` ) VALUES ('%s','%s','%s')") >+migrate_table('dataset_info',"SELECT `id`, `type`, `machine`,`test`, `test_type`, `extra_data`, `branch`, `date` FROM dataset_info", "INSERT INTO dataset_info (`id`, `type`, `machine`,`test`, `test_type`, `extra_data`, `branch`, `date` ) VALUES ('%s', '%s', '%s','%s', '%s', '%s', '%s', '%s')") >+migrate_table('dataset_values',"SELECT `dataset_id`, `time`, `value` FROM dataset_values", "INSERT INTO dataset_values (`dataset_id`, `time`, `value` ) VALUES ('%s','%s','%s')") >diff --git a/sql/schema.sql b/sql/schema.sql >new file mode 100644 >index 0000000..ff32fe8 >--- /dev/null >+++ b/sql/schema.sql Tables look ok; is int(11) enough for a timestamp? I guess that's 11 numeric chars, so that's enough for an unsigned 32-bit number, so that's ok.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 3

•

17 years ago

Forgot to mention -- annotate.cgi isn't used anywhere yet, it's a feature that never got implemented (the ability to add markers to graphs, so that e.g., you can mark a line on a timestamp and say "new textframe turn-on"). bonsaibouncer.cgi shouldn't rely on the database; it's a helper to avoid running into cross-site xmlhttprequest issues when doing a bonsai query from the graph scripts (which is disabled for now -- this would be a good thing to add, basically "run bonsai query for selected range" that would just take you to bonsai; my code tried to be fancy and graph the results but I never got around to finishing it) collect.cgi is in theory supposed to be called by the tinderboxes; rhelmer and I kept talking about making tinderbox do two calls or redirecting a call from the old script (for the old graphs) to the new one or somesuch. But so far nothing calls it live. I'm not sure what bulk.cgi is; I'd guess it's to retrieve bulk data, not sure who calls it :)

Jeremy Orem [:oremj]

Assignee

Comment 4

•

17 years ago

(In reply to comment #2) > (From update of attachment 272070 [details] [diff] [review]) > > I'm not familiar with the mysql execute interface -- will %s do the same thing > as ? did? That is, do the binding such that the db engine sees a single data > argument in that spot (hopefully without needing to explicitly quote it and the > like)? I thought that the python dbapi was portable across databases? Unfortunately, MySQLdb uses %s instead of ?. Basically, when execute is called with (query,args) it escapes all the args with mysql's escape function and inserts them into %s. > > It would be nice if that was resolved so that we could use a single query > string for both, since the ability to use sqlite in a quick standalone setup is > very helpful. I'll work on a simple wrapper. > > >diff --git a/getdata.cgi b/getdata.cgi > >index 5deffd8..897fb4c 100755 > >--- a/getdata.cgi > >+++ b/getdata.cgi > >@@ -103,7 +100,7 @@ def doListTests(fo, type, datelimit, branch, machine, testname): > > "test_type": row[3], > > "date": row[4], > > "extra_data": row[5], > >- "branch": row[6]}) > >+ "branch": float(row[6])}) > > Do we really want to coerce branch to a float? It can be a full string; right > now we're only using 1.9 and 1.8, but it might be a full symbolic name... I accidentally left that in from when I was debugging :-( > >diff --git a/scripts/migration.py b/scripts/migration.py > >new file mode 100755 > >index 0000000..38bb73e > >--- /dev/null > >+++ b/scripts/migration.py > >@@ -0,0 +1,27 @@ > >+from pysqlite2 import dbapi2 as sqlite > >+import MySQLdb, sys, os > >+ > >+DIRNAME= os.path.dirname(sys.argv[0]) > >+if not DIRNAME: > >+ DBPATH="../db/data.sqlite" > >+else: > >+ DBPATH=DIRNAME + "/../db/data.sqlite" > >+ > >+sqlite_db = sqlite.connect(DBPATH) > >+ > >+mysql_db = MySQLdb.connect("localhost","o","o","o_graphs") > >+mysql_cur = mysql_db.cursor() > >+ > >+def migrate_table(table, select, insert): > >+ print "Migrating: " + table > >+ sqlite_cur = sqlite_db.cursor() > >+ res = sqlite_cur.execute(select) > >+ for row in res: > >+ mysql_cur.execute(insert % row) > > This is kinda scary -- doing a % format with just %s will cause this to break > on at least any ' characters that are embedded in any field. Assuming that > mysql.execute will do the right quoting with %s, would execute(insert, row) > work? I was running in to some strange problems using bind variables in this case. For some reason is was converting branches like 1.9 to 1.899999999... There is probably a better solution for this as well. Either way, this script should just be a one-off. > > >+ > >+migrate_table('annotations',"SELECT dataset_id,time,value FROM annotations", "INSERT INTO annotations (`dataset_id`, `time`, `value` ) VALUES ('%s','%s','%s')") > >+migrate_table('dataset_branchinfo',"SELECT `dataset_id`, `time`, `branchid` FROM dataset_branchinfo", "INSERT INTO dataset_branchinfo (`dataset_id`, `time`, `branchid` ) VALUES ('%s','%s','%s')") > >+migrate_table('dataset_extra_data',"SELECT `dataset_id`, `time`, `data` FROM dataset_extra_data", "INSERT INTO dataset_extra_data (`dataset_id`, `time`, `data` ) VALUES ('%s','%s','%s')") > >+migrate_table('dataset_info',"SELECT `id`, `type`, `machine`,`test`, `test_type`, `extra_data`, `branch`, `date` FROM dataset_info", "INSERT INTO dataset_info (`id`, `type`, `machine`,`test`, `test_type`, `extra_data`, `branch`, `date` ) VALUES ('%s', '%s', '%s','%s', '%s', '%s', '%s', '%s')") > >+migrate_table('dataset_values',"SELECT `dataset_id`, `time`, `value` FROM dataset_values", "INSERT INTO dataset_values (`dataset_id`, `time`, `value` ) VALUES ('%s','%s','%s')") > > > >diff --git a/sql/schema.sql b/sql/schema.sql > >new file mode 100644 > >index 0000000..ff32fe8 > >--- /dev/null > >+++ b/sql/schema.sql > > Tables look ok; is int(11) enough for a timestamp? I guess that's 11 numeric > chars, so that's enough for an unsigned 32-bit number, so that's ok. >

Jeremy Orem [:oremj]

Assignee

Comment 5

•

17 years ago

I was thinking more about being able to use both MySQL and sqlite. This is going to be someone hard/limiting unless we use a db abstraction layer. We could either use a pre-existing abstraction such as http://www.sqlalchemy.org/ or we could do something like make two db libraries and maintain queries for both. There is also the simple solution of running a translation on the queries for (%s -> ?) and vice-versa, but that sounds very error prone since not all queries will be compatible. What do you all think?

Mike Schroepfer

Reporter

Comment 6

•

17 years ago

(In reply to comment #5) > I was thinking more about being able to use both MySQL and sqlite. This is > going to be someone hard/limiting unless we use a db abstraction layer. We > could either use a pre-existing abstraction such as http://www.sqlalchemy.org/ > or we could do something like make two db libraries and maintain queries for > both. There is also the simple solution of running a translation on the > queries for (%s -> ?) and vice-versa, but that sounds very error prone since > not all queries will be compatible. > > > What do you all think? > CCing Morgamic who's had experience with sqlalchemy for Breakpad. I'd advocate it only if you can use it to simply wrap the db connections and easily pass it fairly raw SQL with it just doing the DB specific mappings. For these type of queries the O/R mapping stuff will generally get in the way. Morgamic is this easy with sqlalchemy?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 7

•

17 years ago

I'd say just do a simple sed %s -> ? for now -- it looked like all the queries were directly compatible. If that changes in the future then we should look into an abstraction layer, but I don't foresee the queries getting too complex.

Mike Schroepfer

Reporter

Comment 8

•

17 years ago

(In reply to comment #7) > I'd say just do a simple sed %s -> ? for now -- it looked like all the queries > were directly compatible. If that changes in the future then we should look > into an abstraction layer, but I don't foresee the queries getting too complex. > Yep ok - let's bias towards getting something working then we can improve :-)

Jeremy Orem [:oremj]

Assignee

Comment 9

•

17 years ago

Attached patch Converts application for use with MySQL (deleted) — Details — Splinter Review

This should resolve most of the questionable issues.

Attachment #272070 - Attachment is obsolete: true

Attachment #272507 - Flags: review?(vladimir)

Attachment #272070 - Flags: review?(vladimir)

Attachment #272070 - Flags: review?(anodelman)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 10

•

17 years ago

Comment on attachment 272507 [details] [diff] [review] Converts application for use with MySQL Looks fine, I don't think sqlite does the <=> stuff, but we can deal with that separately

Attachment #272507 - Flags: review?(vladimir) → review+

Jeremy Orem [:oremj]

Assignee

Comment 11

•

17 years ago

Checking in bulk.cgi; /cvsroot/mozilla/webtools/new-graph/bulk.cgi,v <-- bulk.cgi new revision: 1.3; previous revision: 1.2 done Checking in collect.cgi; /cvsroot/mozilla/webtools/new-graph/collect.cgi,v <-- collect.cgi new revision: 1.8; previous revision: 1.7 done Checking in dumpdata.cgi; /cvsroot/mozilla/webtools/new-graph/dumpdata.cgi,v <-- dumpdata.cgi new revision: 1.2; previous revision: 1.1 done Checking in getdata.cgi; /cvsroot/mozilla/webtools/new-graph/getdata.cgi,v <-- getdata.cgi new revision: 1.8; previous revision: 1.7 done RCS file: /cvsroot/mozilla/webtools/new-graph/graphsdb.py,v done Checking in graphsdb.py; /cvsroot/mozilla/webtools/new-graph/graphsdb.py,v <-- graphsdb.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/new-graph/databases/__init__.py,v done Checking in databases/__init__.py; /cvsroot/mozilla/webtools/new-graph/databases/__init__.py,v <-- __init__.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/new-graph/databases/mysql.py,v done Checking in databases/mysql.py; /cvsroot/mozilla/webtools/new-graph/databases/mysql.py,v <-- mysql.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/new-graph/scripts/migration.py,v done Checking in scripts/migration.py; /cvsroot/mozilla/webtools/new-graph/scripts/migration.py,v <-- migration.py initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/new-graph/sql/schema.sql,v done Checking in sql/schema.sql; /cvsroot/mozilla/webtools/new-graph/sql/schema.sql,v <-- schema.sql initial revision: 1.1 done

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Mike Schroepfer

Reporter

Comment 12

•

17 years ago

Is this deployed on production?

Jeremy Orem [:oremj]

Assignee

Comment 13

•

17 years ago

It is deployed at graphs-stage.mozilla.com.

alice nodelman [:alice] [:anode]

Comment 14

•

17 years ago

Looking over the deployed code on graphs-stage.m.o I realized that the contents of utils/ had not been ported. The code in utils is for grabbing data from the old build-graphs.m.o for plotting on the new graph server - currently the switch to mysql will break this code.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Jeremy Orem [:oremj]

Assignee

Comment 15

•

17 years ago

Attached patch Converts import.py to MySQL (deleted) — Details — Splinter Review

This should fix the problem, but I am not quite sure how to test it.

Attachment #275802 - Flags: review?(anodelman)

alice nodelman [:alice] [:anode]

Comment 16

•

17 years ago

That code can be tested by first creating a directory called 'old' in the utils/ directory. Then, ensure that the paths in pull.sh and import.py are correct (pointing at the old/ directory and the appropriate locations in the graph server code). Then run ./pull.sh. You should find at the end of the pull that there are now graphs in the given database representing the data from build-graphs.

Jeremy Orem [:oremj]

Assignee

Comment 17

•

17 years ago

what do I put in the 'old' directory?

alice nodelman [:alice] [:anode]

Comment 18

•

17 years ago

The old directory will be filled with pulled in build-graphs data - so you just need to provide the empty directory and it will fill itself. Voila!

Jeremy Orem [:oremj]

Assignee

Comment 19

•

17 years ago

After running the script on an empty database I get this http://oremj.khan-vm.mozilla.org/graph.html Does that look right?

alice nodelman [:alice] [:anode]

Updated

•

17 years ago

Attachment #275802 - Flags: review?(anodelman) → review+

Jeremy Orem [:oremj]

Assignee

Comment 20

•

17 years ago

Checking in utils/import.py; /cvsroot/mozilla/webtools/new-graph/utils/import.py,v <-- import.py new revision: 1.12; previous revision: 1.11 done

Status: REOPENED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

Michael Morgan [:morgamic]

Updated

•

16 years ago

No longer blocks: 386669

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Webtools → Webtools Graveyard

Allows getdata.cgi to use MySQL 17 years ago Jeremy Orem [:oremj] (deleted), patch		Details \| Diff \| Splinter Review
Converts application for use with MySQL 17 years ago Jeremy Orem [:oremj] (deleted), patch	vlad : review+	Details \| Diff \| Splinter Review
Converts import.py to MySQL 17 years ago Jeremy Orem [:oremj] (deleted), patch	anodelman : review+	Details \| Diff \| Splinter Review