Closed Bug 146679 (bz-dbschema) Opened 23 years ago Closed 20 years ago

Reusable, structured, database-independent schema

Categories

(Bugzilla :: Bugzilla-General, enhancement, P2)

2.15
enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 2.20

People

(Reporter: CodeMachine, Assigned: edwardjsabol)

References

(Blocks 3 open bugs)

Details

Attachments

(4 files, 9 obsolete files)

The current schema in checksetup.pl would be useful for other things. For example, foreign key references could be used by sanity check, as well as bug deletion, to automatically work out subsidiary tables. To do this we need to include foreign key references. That should not be difficult, you add "PRIMARY KEY(fieldname)" to the tables, and then elsewhere REFERENCES tablename to the end of the field line. Now they may not be enforced in MySQL (that is bug #109473), but it would mean these references would be centralised.for the various purposes they can be used for. Now to get this information back it would be necessary to parse the schema, and that's too complicated. Therefore it would be better for the schema to be structured and deparsed when used in checksetup. An additional benefit would be you wouldn't need to specify the field type information when adding a field during an upgrade, just the field name, and you look up the field type from the schema (and avoid potential bugs if it was inconsistent between the schema and the upgrade code).
We can't do that with forieng keys, because then if we ever change them, how would we specifiy that? What if we add them to the schema, then someone updates toa version of mysql which supports them. How do we 'know' which one of those the db knows about? I don't want to add foreign key stuff until we support a db which handles them. Note that postgres can't modify foreign keys (its alter table stuff sucks), so we don't really want these to start with.
To my knowledge MySQL will silently allow but ignore referential constraints that have been specified. I have used foreign key specifications in MySQL before. The point is the presence of foreign key specifications is useful for sanity check and so on without touching the database and therefore caring whether the database enforces referential integrity. At the moment the foreign key information is already in sanity check and also the bug table references in bug deletion (although incomplete there), and eventually will also be in the schema once we support referential integrity. So why not just have them in the schema?
Okay, I thought I'd bring some ideas from bug 98304 over here, and expand the scope of this bug just a little to fulfill the needs of that other bug. The idea is that as we add more and more databases, one of the great hurdles is creating the actual tables in checksetup. We don't want to have three checksetups for three different databases, and we don't want checksetup to grow by thousands of lines for every new DB. The current idea is to create a system of schema-generation that is database-independent. Then we'd have a structured schema that was useful as an object inside of Bugzilla, as well as having a way to create schemas across databases. I'll move some attachments and some blockers over here in the next few comments. -M
Summary: Reusable (structured?) schema. → Reusable (structured?) database-independent schema.
Jeroen Ruigrok van der Werven was the original poster of this attachment in bug 98304 comment 67. This is a pgsetup.pl file, similar to checksetup.pl, but it supports PgSQL. The differences between at least these two databases are seen more easily by comparing this file to checksetup.pl as it exists.
This is the best example we have, so far, of what we're moving toward. The primary difference between this and what would be best to implement is that the implementation should be more object-oriented, so that any part of bugzilla can get information about the schema.
Depends on: bz-postgres
This bug has certain dependencies that were originally part of bug 98304. -M
Depends on: bz-enums
No longer depends on: bz-postgres
(In reply to comment #5) > This is the best example we have, so far, of what we're moving toward. > > The primary difference between this and what would be best to implement is that > the implementation should be more object-oriented, so that any part of bugzilla > can get information about the schema. I agree that a more object-oriented approach would be a good idea. I suggest someone take a look at the CPAN module Dimedis::Ddl. It's quite unfortunate that the documentation is written in German, but, if you look at the code and the examples, it sure seems a lot like what we want here. It's object-oriented and even supports Sybase, Oracle, and Mysql out of the box. Even if Dimedis::Ddl doesn't offer the exact functionality or have the exact interface that we want, it should provide a lot of inspiration for someone. http://search.cpan.org/~jred/Dimedis-Ddl-0.35/
Blocks: bz-sybase
(In reply to comment #5) > Created an attachment (id=140047) > Proof of Concept code for DB independent schema generation > > This is the best example we have, so far, of what we're moving toward. > I'm glad you like it :-) > The primary difference between this and what would be best to implement is that > the implementation should be more object-oriented, so that any part of bugzilla > can get information about the schema. I agree that it should be OOified. However, I don't think that exposing schema specifics to the whole of the Bugzilla code is the best approach, architecturally. I would rather see a move in the reverse direction - to remove all knowledge of the database from most of the code and simply present a procedural interface. That interface code would know about schema specifics, but the rest of the code would not. Perhaps that's a bit too big a change, though.
(In reply to comment #7) > > I suggest someone take a look at the CPAN module Dimedis::Ddl. It's quite unfortunate that the > documentation is written in German, but, if you look at the code and the examples, it sure seems a > lot like what we want here. It's object-oriented and even supports Sybase, Oracle, and Mysql out of > the box. Even if Dimedis::Ddl doesn't offer the exact functionality or have the exact interface that > we want, it should provide a lot of inspiration for someone. > It doesn't appear to have a PostgreSQL driver.
(In reply to comment #9) > It doesn't appear to have a PostgreSQL driver. Based on perusing the DDL drivers for Sybase, Oracle, Mysql, MS SQL, I would hazard to say that it wouldn't be difficult for someone to make a PostgreSQL driver for it, utilizing much of what has already been learned with your proof-of-concept schema generation code and DBCompat.pm as a knowledge base. Besides, I was more suggesting that the Dimedis::Ddl module be used as inspiration for the Bugzilla implementation. It's got a nice OO design. And for the record, support for Oracle and Sybase is equally important as PostgreSQL in my mind and none of the other proposed solutions have even begun to address those.
I would be open to the Ddl library as a basis for something that we could ship with Bugzilla, provided that its license is amenable to that. (We couldn't just use the CPAN version, since we'd have to depend on our changes getting there, something that isn't good for speed of release or ease of support.) Also, I agree with Andrew from comment 8 -- the ideal is that BZ should not even know that it's running on a database, just that it has some sort of "persistence layer." However, that's in the future -- what I think would be good now is PgSQL support ASAP, and if that comes with Oracle + Sybase support, all the better. :-) -M
(In reply to comment #11) > I would be open to the Ddl library as a basis for something that we could ship > with Bugzilla, provided that its license is amenable to that. The Dimedis::Ddl README says: This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Since Perl is distributed jointly under either the Artistic or GPL licenses, I think that means we're safe to use it as a basis for Bugzilla. > (We couldn't just use the CPAN version, since we'd have to depend on our changes > getting there, something that isn't good for speed of release or ease of support.) I agree totally. Plus the module's namespace just isn't very aesthetically pleasing... :-) Bugzilla::DDL anyone? > Also, I agree with Andrew from comment 8 -- the ideal is that BZ should not even > know that it's running on a database, just that it has some sort of "persistence > layer." This discussion is probably for a different bug, but has anyone looked at Class::DBI? You'd practically have to rewrite most of Bugzilla, but, man, that would be sweet. Getting back on topic to database-independent schema gneration (somewhat tangentially), someone recently released a CPAN module called Class::DBI::DDL. Unfortunately, it doesn't abstract the column types and attributes in a way that is needed for true database-independence, so it would need a fair amount of work (more than Dimedis::Ddl), but someone might want to take a glance at that too. It can be used independently of Class::DBI, FWIW. > what I think would be good now is PgSQL support ASAP, and if > that comes with Oracle + Sybase support, all the better. :-) I can live with that approach.
Attached file OOified proof of concept code (deleted) —
I spent a little while to make the proof of conept code I wrote more object-oriented. Extra methods should be able to be added easily for schema discovery etc. if required. Right now the only significant method is the generation of all the DLL required for a particular DB - the 2 examples being MySQL and PostgreSQL. Demo code use at the bottom.
(In reply to comment #13) > Created an attachment (id=140325) > OOified proof of concept code It's barely OO. I still think the Dimedis::Ddl object design is more impressive, but this is admittedly simpler and may be easier to maintain and extend to other DBMS's. Make decimal(5,2) a DB-specific data type. Might want to add INT2 and INT4 types as well. Maybe change package name from Bugzilla::DBSchema to Bugzilla::DB::Schema? Anyone have any ideas on what methods this object class should support? Besides the method for creating the schema, of course. Someone mentioned methods for introspection, but I find it hard to believe
Attached file A slightly more OO schema class design (obsolete) (deleted) —
I took Andrew's OO proof-of-concept and fleshed it out a little more, made it a little more object-oriented, IMHO.
I'm OK with the changes Ed made, although I'm not certain they add much. It just occurred to me that we might be able to reduce the number of type fixups required by using type aliases, via Postgres's (SQL standard) CREATE DOMAIN. e.g. CREATE DOMAIN longblob AS bytea; Just a thought.
If you feel that this is ready and useful, I would suggest that you ask bbaetz to review it. I think it looks like a good idea, and is generally in the direction of where we were going with discussion on bug 98304.
Blocks: bz-postgres
We have several bugs discussing this sort of thing. I'd prefer that we just had one... The other one has more discussion
(In reply to comment #18) > We have several bugs discussing this sort of thing. I'd prefer that we just had > one... The other one has more discussion Could you be more specific? Nothing comes up when searching for either of the keywords "schema" or "DDL".
Blocks: bz-oracle
Comment on attachment 142116 [details] A slightly more OO schema class design Oooh, I like this. :) > UNIQUE => [qw(user_id group_id isderived isbless)], Unique indexes should probably have names, just like normal indexes do. I would define these the same way you do regular indexes. Regular indexes can also be multi-column, and you can have more than one unique index, also. I think it would be nice to get references included in here somehow, too... > longdescs => > { > FIELDS => [ > bug_id => {TYPE => 'integer', NOTNULL => 1, REFERENCES => ['bugs','bug_id'] }, > who => {TYPE => 'integer', NOTNULL => 1, REFERENCES => ['profiles','userid'] }, etc. ? > bug_when => {TYPE => 'DATETIME', NOTNULL => 1 }, > work_time => {TYPE => 'decimal(5,2)', > NOTNULL => 1, DEFAULT => '0'}, > thetext => {TYPE => 'text'}, > isprivate => {TYPE => 'smallint', > NOTNULL => 1, DEFAULT => '0' } MySQL has no booleans... a lot of these "smallint"s should be booleans. (probably every field that starts with "is"). > INDEXES => { > longdescs_bugid_idx => ['bug_id'], > longdescs_who_idx => ['who'], > longdescs_bugwhen_idx => ['bug_when'], > }, Here's a good example of your indexes, for comparison with the uniques... Unique is just a special kind of index. :) >sub ddl_statements { > $crst .= "\n\tUNIQUE \(" . join(", ",@$unique) . "\)\n" > if ($unique); This should be like the CREATE INDEX below: > while (my($iname,$ifields) = each %$indexes) { > push(@ddl, > "CREATE INDEX $iname ON $tname \(" . > join(", ",@$ifields) . "\);\n"); > } while (my($iname,$ifields) = each %$unique) { push(@ddl, "CREATE UNIQUE INDEX $iname ON $tname \(" . join(", ",@$ifields) . "\);\n"); }
(In reply to comment #20) > (From update of attachment 142116 [details]) > > Oooh, I like this. :) Glad to hear it! > > UNIQUE => [qw(user_id group_id isderived isbless)], > > Unique indexes should probably have names, just like normal indexes do. > I would define these the same way you do regular indexes. Regular indexes can > also be multi-column, and you can have more than one unique index, also. I agree unique indexes should have names. However, can we formalize the naming of the unique indexes to be tablename_unique_idx? All of the current indexes have a name of the form tablename_column(s)_idx, so that seems natural. > I think it would be nice to get references included in here somehow, too... There's already support for it, and, as you saw, a few tables have REFERENCES specified. What more would you like to see or what you like to see changed with regard to this? > > isprivate => {TYPE => 'smallint', > > NOTNULL => 1, DEFAULT => '0' } > > MySQL has no booleans... a lot of these "smallint"s should be booleans. > (probably every field that starts with "is"). OK, I've changed isprivate to a BOOLEAN. All the other fields that start with "is" were already declared that way. I also went through all of the smallint fields, and I think that was the only one that should have been BOOLEAN. > > INDEXES => { > > longdescs_bugid_idx => ['bug_id'], > > longdescs_who_idx => ['who'], > > longdescs_bugwhen_idx => ['bug_when'], > > }, > > Here's a good example of your indexes, for comparison with the uniques... > Unique is just a special kind of index. :) > > >sub ddl_statements { > > > $crst .= "\n\tUNIQUE \(" . join(", ",@$unique) . "\)\n" > > if ($unique); > > This should be like the CREATE INDEX below: > > > while (my($iname,$ifields) = each %$indexes) { > > push(@ddl, > > "CREATE INDEX $iname ON $tname \(" . > > join(", ",@$ifields) . "\);\n"); > > } > > while (my($iname,$ifields) = each %$unique) { > push(@ddl, > "CREATE UNIQUE INDEX $iname ON $tname \(" . > join(", ",@$ifields) . "\);\n"); > } Yeah, I'm not really clear why Andrew implemented UNIQUE in that way. Perhaps he can comment. I assumed it was a MySQL thing. I'm not actually familiar with MySQL. (My experience is mostly with Sybase and INGRES/OpenINGRES, plus some Oracle and PostgreSQL.) So MySQL does support "CREATE UNIQUE INDEX"? If so, I'll change it and upload a new patch.
The choice of column types between this bug and bug 248001 seem to be on a collision course. If booleans become booleans, bug 248001 ceases to be the right thing to do.
(In reply to comment #22) > The choice of column types between this bug and bug 248001 seem to be on a > collision course. If booleans become booleans, bug 248001 ceases to be the > right thing to do. Well, there's no collision at the moment. All current subclasses of the Schema module implement the BOOLEAN "database-specific data-type alias" (for lack of better terminology) using the smallest integer type supported by each database. But that is worth noting for any databases supported in the future.
(In reply to comment #21) > > Yeah, I'm not really clear why Andrew implemented UNIQUE in that way. Perhaps he > can comment. I assumed it was a MySQL thing. I'm not actually familiar with > MySQL. (My experience is mostly with Sybase and INGRES/OpenINGRES, plus some > Oracle and PostgreSQL.) So MySQL does support "CREATE UNIQUE INDEX"? If so, I'll > change it and upload a new patch. I don't recall - probably no very deep reason. I'm OK with whatever works. I agree the indexes should be explicitly named.
(In reply to comment #20 > Unique is just a special kind of index. :) I think what you're suggesting is to change the schema structure to look like this: UNIQUE => { components_unique_idx => [qw(product_id name)], INDEXES => { components_name_idx => ['name'] }, However, one could also take the "unique is just a special kind of index" statement further and just have INDEXES and flag an index to be UNIQUE, like this: INDEXES => { components_unique_idx => {FIELDS => [qw(product_id name)], UNIQUE => 1}, components_name_idx => {FIELDS => ['name']}, }, This form has the added advantage in that you could define multiple unique indexes per table. Is there a need for that? Personally, I think I prefer the way it is now and just formalize the naming of all unique indexes tablename_unique_idx: UNIQUE => [qw(product_id name)], INDEXES => { components_name_idx => ['name'] }, Anyone want to state their preference?
(In reply to comment #23) > (In reply to comment #22) > > The choice of column types between this bug and bug 248001 seem to be on a > > collision course. If booleans become booleans, bug 248001 ceases to be the > > right thing to do. > > Well, there's no collision at the moment. All current subclasses of the Schema > module implement the BOOLEAN "database-specific data-type alias" (for lack of > better terminology) using the smallest integer type supported by each database. > But that is worth noting for any databases supported in the future. Sorry for the noise but I was wrong. The current PostgreSQL Schema subclass does indeed implement BOOLEAN as boolean and TRUE as true and FALSE as false. Does bug 248001 represent the current thinking of those leading the efforts to port Bugzilla to PostgreSQL?
(In reply to comment #26) > (In reply to comment #23) > > (In reply to comment #22) > > > The choice of column types between this bug and bug 248001 seem to be on a > > > collision course. If booleans become booleans, bug 248001 ceases to be the > > > right thing to do. > > > > Well, there's no collision at the moment. All current subclasses of the Schema > > module implement the BOOLEAN "database-specific data-type alias" (for lack of > > better terminology) using the smallest integer type supported by each database. > > But that is worth noting for any databases supported in the future. > > Sorry for the noise but I was wrong. The current PostgreSQL Schema subclass does > indeed implement BOOLEAN as boolean and TRUE as true and FALSE as false. Does > bug 248001 represent the current thinking of those leading the efforts to port > Bugzilla to PostgreSQL? Postgresql doesnt like it when they are used within the WHERE clause of the SQL but they can be returned just fine as a column value. DBD::Pg will convert the 't' and 'f' values that PostgreSQL stores the values as to 1 and 0 respectively so Perl will not complain. But smallints work as well for most of these. Is there any reason not to use smallints to represent simple 1 or 0 values?
(In reply to comment #27) > But smallints work as well for most of these. Is > there any reason not to use smallints to represent simple 1 or 0 values? Perhaps space? Obviously, a boolean only needs one bit for storage, although I would guess most DBMSes probably use at least one byte for speed reasons. A smallint is typically two bytes.
(In reply to comment #25) > INDEXES => { > components_unique_idx => {FIELDS => [qw(product_id name)], > UNIQUE => 1}, > components_name_idx => {FIELDS => ['name']}, > }, I much prefer this form, as it allows us to also specify other index options as needed in the future.
Attached patch Revised schema modules (obsolete) (deleted) — Splinter Review
Attachment #142116 - Attachment is obsolete: true
(In reply to comment #29) > (In reply to comment #25) > > INDEXES => { > > components_unique_idx => {FIELDS => [qw(product_id name)], > > UNIQUE => 1}, > > components_name_idx => {FIELDS => ['name']}, > > }, > > I much prefer this form, as it allows us to also specify other index options > as needed in the future. Unfortunately, I already went with Dave's suggestion in the latest patch. That form was a bit "wordy" for my tastes, and I couldn't think of any scenario where one would want multiple separate unique indexes. But if the consensus agrees with you, I'll certainly switch. Unrelated: What's the deal with the lastused column of the logincookies table? It's MySQL type is timestamp. What should it be for PostgreSQL? Other MySQL timestamp fields were translated to interval fields for PostgreSQL except for this one.
Also, I suspect that many of the parties involved in this bug will have comments on bug 248175.
> Yeah, I'm not really clear why Andrew implemented UNIQUE in that way. Perhaps he > can comment. I assumed it was a MySQL thing. I'm not actually familiar with > MySQL. (My experience is mostly with Sybase and INGRES/OpenINGRES, plus some > Oracle and PostgreSQL.) So MySQL does support "CREATE UNIQUE INDEX"? If so, I'll > change it and upload a new patch. Please see http://www.in-nomine.org/~asmodai/mysql-to-pgsql.txt for some porting notes I was/am documenting for the PostgreSQL project. I also have, in the pipeline, plans to document MS-SQL to PostgreSQL and Oracle to PostgreSQL. Perhaps Sybase some day too.
All indexes must have names, so that we can more portably drop them later, if required. That includes primary keys.
(In reply to comment #31) > Unrelated: What's the deal with the lastused column of the logincookies table? It's MySQL type is > timestamp. What should it be for PostgreSQL? Other MySQL timestamp fields were translated to interval > fields for PostgreSQL except for this one. This is stretching my recollection a bit. Postgresql doesn't have MySQL's autoupdate timestamp gadget, so you have to do that with triggers. I think this is the only field that actually uses the autoupdate feature. (Not sure that this answers the question, though). I don't vouch for the accuracy of what I did - there could be some mistakes, which is why I labeled it as a proof of concept.
Blocks: 247560
In light of bug 248001 being resolved, shall I change the PostgreSQL schema contained herein to implement BOOLEANs as a smallint?
Why? Would we gain something?
(In reply to comment #37) > Why? Would we gain something? I just want to keep abreast of the changes to the schema. Will booleans still work in light of the changes committed in bug 248001?
(In reply to comment #38) > I just want to keep abreast of the changes to the schema. Will booleans still > work in light of the changes committed in bug 248001? Yeah. AFAIK, DBI handles all the conversions back and forth from "Y" to "1", etc. And I think that where we can, it's best to stick to the type most like our "conceptual" type, if it exists in the backend DB.
Comment on attachment 151500 [details] [diff] [review] Revised schema modules I'm going to take a crack at this, since I'm getting very familiar with checksetup and what it needs for DB schema. >diff -rNu Bugzilla.HEAD/Schema.pm Bugzilla/Schema.pm In the future, unless there's some reason not to, it's a good idea to use "cvs diff -u" instead of "diff -Nru." >+package Bugzilla::Schema; I think this should be DBSchema, to be consistent with DBCompat. Although maybe we should move all of those, and make it DB::Schema and DB::Compat. >+use Carp qw(croak); We use Bugzilla::Error::ThrowCodeError for code errors, not Carp, usually. >+ user_group_map => >+ { Is there a reason that the schema is initialized inside of a function instead of inside of an "our" or "my" variable? Or even read in from an entirely separate file? I think maybe Bugzilla::DB::Schema::Constants might be a good place, or just Bugzilla::Constants or Bugzilla::DB::Schema::ANSI. >+ UNIQUE => { user_group_map_unique_idx => >+ [qw(user_id group_id isderived isbless)] }, OK, I do like this method. There could be reasons to have more than one unique index per table, though, but I think this allows it. However, some day we might want to also be able to set other index options. We need a form where we can set *any* index option, where "unique" is just one option. >+ work_time => {TYPE => 'decimal(5,2)', While we're at it, wouldn't it be better to have TYPE => 'decimal', PRECISION => [5,2]? Or OPTIONS => '(5,2)'? >+ thetext => {TYPE => 'text'}, We definitely need documentation for some of these types, to say what they *need* to do in order for Bugzilla to work. For example, not all DBs have a "text," type, but they all have something that would work exactly the same, for Bugzilla's purposes. They all have different limitations, though. >+ id => {TYPE => 'SMALLSERIAL', NOTNULL => 1, I'm not the ANSI master... is "smallserial" actually an ANSI type? I'd like to stick with ANSI types in the Schema module. >+ name => {TYPE => 'varchar(64)', NOTNULL => 1 }, Similar to the above, we could do TYPE => 'varchar', LENGTH => 64 (or OPTIONS => '(64)'). >+ dup => {TYPE => 'integer', NOTNULL => 1 , >+ DISPLAYWIDTH => 9, PRIMARYKEY => 1}, Would it make more sense to have a PRIMARY => 'dup', since there can only be one Primary Key anyway? >+ id => {TYPE => 'INTSERIAL', NOTNULL => 1, Same question again about the type. Also, all the ids should be the same type, shouldn't they? I can imagine there existing a DB where there might be problems if we didn't have the same type for the different ids in our JOIN/WHEREs. >+ userregexp => {TYPE => 'TINYTEXT', NOTNULL => 1}, 'tinytext' is a weird thing. I mean, maybe we use it now, but... is it ANSI? (It might be, I just don't know.) >+ # These next six are the "enumerated" fields used in bugs. >+ # We might need another column for sort order so that could be >+ # played around with... These are all now actually in existence in bug 17453's patch. >+ REFERENCES => >+ 'bug_severity_values(value)'}, I like this. :-) I forget... are there any DBs that take important options to making a Foreign Key (besides just the field name)? >+ resolution => {TYPE => 'varchar(255)', NOTNULL => 1, >+ REFERENCES => >+ 'bug_resoltion_values(value)'}, Typo. >+ INDEXES => { >+ bugs_assigned_to_idx => ['assigned_to'], >+ bugs_creation_ts_idx => ['creation_ts'], >+ bugs_delta_ts_idx => ['delta_ts'], >+ bugs_bug_severity_idx => ['bug_severity'], >+ bugs_bug_status_idx => ['bug_status'], >+ bugs_op_sys_idx => ['op_sys'], >+ bugs_priority_idx => ['priority'], >+ bugs_product_id_idx => ['product_id'], >+ bugs_reporter_idx => ['reporter'], >+ bugs_version_idx => ['version'], >+ bugs_component_id_idx => ['component_id'], >+ bugs_resolution_idx => ['resolution'], >+ bugs_target_milestone_idx => ['target_milestone'], >+ bugs_qa_contact_idx => ['qa_contact'], >+ bugs_votes_idx => ['votes'], >+ }, I'm almost certain that some of the indexes on the bug table need to be multi-column... are they not, now? >+ thedata => {TYPE => 'LONGBLOB', NOTNULL => 1}, Hrm, don't LongBlobs need some options for some DBs in order to work the same way they do for Bugzilla in MySQL? (I'm not certain here, once again, I just have this vague memory...) >+sub adjust_schema { >+ # PRIVATE method to alter the abstract schema to be DB-specific. WHOA! This whole function needs major commenting; it's pretty mysterious, at this point. :-) >+sub ddl_statements { >+ # PUBLIC method to generate DDL from the concrete schema. >+ # Returns an array of DDL strings. You might want say "SQL statements to create the table/indexes" instead of DDL for us Mere Mortal Perl Coders. :-) >+ my $self = shift; ?+ my @ddl = (); I think that ddl should be a hash, so that we could just do $ddl{'create'}. Also, you might want to have a separate function to just generate and return the CREATE TABLE statement on the fly, just returning a string. I think that would be much simpler for the Installation script. >+ $create_table .= "\t$fname\t$type"; >+ $create_table .= " NOT NULL" if ($finfo->{NOTNULL}); >+ $create_table .= " DEFAULT $default" if (defined($default)); >+ $create_table .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY}); I think that some of this may need to go into the DB-specific stuff, wouldn't you think? >+ "CREATE UNIQUE INDEX $iname ON $tname \(" . This command also could possibly differ across DBs. >+#------------------------------------------------------------------------------ >+1; OK, here's what I *really* need. :-) A function that reads in the *current* schema, and compares it to the required schema. Of course, that's the ENTIRE purpose of checksetup at this point, so that change could possibly go into bug 248175. >+ BOOLEAN => 'boolean', >+ FALSE => 'false', >+ TRUE => 'true', Hrm, to save DBI the work, wouldn't it be better to do 0 and 1? >+ LONGBLOB => 'bytea', Are these actually equivalent? I recall having to do something strange to get them to work equivalently in the RH Bugzilla fork. >+ $self->{db_extras} = >+ [ >+ " CREATE RULE logincookies_insert AS\n" . >+ " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" . >+ " DO INSTEAD INSERT INTO logincookies VALUES \(\n" . >+ " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n", Oh, this is somewhat "bad," because it's not abstractly tied to the schema. That is, if I update the schema and it requires one of these, then I'd have to go and know to update each subclass that requires this, which defeats the purpose of subclassing. We should be able to modify the schema somehow that we automatically know to make these from certain options passed in the schema. Those are my comments to start with. :-)
Reassigning to patch writer.
Assignee: zach → edwardjsabol
I'm pretty sure that this is a P2, since it blocks a P2 and a P1.
Priority: -- → P2
Target Milestone: --- → Bugzilla 2.22
(In reply to comment #40) > In the future, unless there's some reason not to, it's a good idea to use > "cvs diff -u" instead of "diff -Nru." Clarification: I just meant that you should use "cvs diff" instead of "diff" when possible. :-) The -Nru is of course always very important if you're using a normal "diff" command.
(In reply to comment #40) > I think this should be DBSchema, to be consistent with DBCompat. Although > maybe we should move all of those, and make it DB::Schema and DB::Compat. Fine with me. I'll make it DBSchema in the next patch. If DBCompat is renamed, I'll follow suit. > We use Bugzilla::Error::ThrowCodeError for code errors, not Carp, usually. OK, I've switched to ThrowCodeError(). > Is there a reason that the schema is initialized inside of a function instead > of inside of an "our" or "my" variable? The schema is modified at object instantiation time. If the schema structure is class data instead of instance data, you might have a scenario in which multiple object instances are modifying the same structure with conflicting results. It's just safer if the schema is instance-based data, I think. I suppose you could "my" the abstract schema and then copy it to instance data at instantiation time, but then you have an extra copy of the schema structure taking up memory. > Or even read in from an entirely separate file? > I think maybe Bugzilla::DB::Schema::Constants might be a good > place, or just Bugzilla::Constants or Bugzilla::DB::Schema::ANSI. I have no strong feelings on this and whatever the consensus is is fine with me, but I think it's simpler if the schema is actually in the schema module. > >+ work_time => {TYPE => 'decimal(5,2)', > > While we're at it, wouldn't it be better to have TYPE => 'decimal', PRECISION > => [5,2]? Or OPTIONS => '(5,2)'? Seems like overkill, especially since decimal is an ANSI type supported by all the DBMSes that Bugzilla will likely ever support. > >+ thetext => {TYPE => 'text'}, > > We definitely need documentation for some of these types, to say what they > *need* to do in order for Bugzilla to work. For example, not all DBs have a > "text," type, but they all have something that would work exactly the same, for > Bugzilla's purposes. They all have different limitations, though. Contributions welcome. Unfortunately, I don't feel qualified to do that myself. My experience with the Bugzilla code outside of this module is minimal. > >+ id => {TYPE => 'SMALLSERIAL', NOTNULL => 1, > > I'm not the ANSI master... is "smallserial" actually an ANSI type? I'd like > to stick with ANSI types in the Schema module. The uppercased types like INTSERIAL are "abstract" types. It's a requirement that the database-specific subclasses define them. (Yes, this needs to be documented...) I don't think there is an ANSI auto- incrementing, serialized integer type, is there? > >+ name => {TYPE => 'varchar(64)', NOTNULL => 1 }, > > Similar to the above, we could do TYPE => 'varchar', LENGTH => 64 (or OPTIONS > => '(64)'). I don't see how that improves things. It seems like an unnecessary complication. > >+ dup => {TYPE => 'integer', NOTNULL => 1 , > >+ DISPLAYWIDTH => 9, PRIMARYKEY => 1}, > > Would it make more sense to have a PRIMARY => 'dup', since there can only be > one Primary Key anyway? Maybe. Multi-column primary keys are possible, though perhaps arguably indicative of a poor database design. > >+ id => {TYPE => 'INTSERIAL', NOTNULL => 1, > > Same question again about the type. Also, all the ids should be the same > type, shouldn't they? I can imagine there existing a DB where there might be > problems if we didn't have the same type for the different ids in our > JOIN/WHEREs. I believe the current MySQL schema has smallint auto_increment ids, mediumint auto_increment ids, and regular integer auto_increment ids. The goal was to reproduce the current schema used for MySQL, so the abstract base schema needs to differentiate between these different auto_increment types in order to do that. > >+ # These next six are the "enumerated" fields used in bugs. > >+ # We might need another column for sort order so that could be > >+ # played around with... > > These are all now actually in existence in bug 17453's patch. So what needs to be changed? > I forget... are there any DBs that take important options to > making a Foreign Key (besides just the field name)? I'm not aware of any. > >+ resolution => {TYPE => 'varchar(255)', NOTNULL => 1, > >+ REFERENCES => > >+ 'bug_resoltion_values(value)'}, > > Typo. Fixed in the forthcoming patch. > I'm almost certain that some of the indexes on the bug table need to be > multi-column... are they not, now? I just checked checksetup.pl and it doesn't look like there are any multi-column indexes on the bugs table to me, but maybe I'm reading the code wrong? > I think that ddl should be a hash, so that we could just do $ddl{'create'}. What other keys would you have in such a hash? An abbreviated example of the structure you are suggesting that the ddl_statements() method return would be useful. > Also, you might want to have a separate function to just generate and return > the CREATE TABLE statement on the fly, just returning a string. I think that > would be much simpler for the Installation script. Um, that's what it does now, doesn't it? > >+ $create_table .= "\t$fname\t$type"; > >+ $create_table .= " NOT NULL" if ($finfo->{NOTNULL}); > >+ $create_table .= " DEFAULT $default" if (defined($default)); > >+ $create_table .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY}); > > I think that some of this may need to go into the DB-specific stuff, wouldn't > you think? You mean the primary key part? Yeah, I guess some refactoring here would probably be appropriate... > OK, here's what I *really* need. :-) A function that reads in the *current* > schema, and compares it to the required schema. Of course, that's the ENTIRE > purpose of checksetup at this point, so that change could possibly go into bug > 248175. Well, what can be done in this class to facilitate that goal? > >+ BOOLEAN => 'boolean', > >+ FALSE => 'false', > >+ TRUE => 'true', > > Hrm, to save DBI the work, wouldn't it be better to do 0 and 1? I don't know. I'll change it if that's the consensus. > >+ LONGBLOB => 'bytea', > > Are these actually equivalent? I recall having to do something strange to get > them to work equivalently in the RH Bugzilla fork. Beats me. I don't use blobs. > >+ $self->{db_extras} = > >+ [ > >+ " CREATE RULE logincookies_insert AS\n" . > >+ " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" . > >+ " DO INSTEAD INSERT INTO logincookies VALUES \(\n" . > >+ " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n", > > Oh, this is somewhat "bad," because it's not abstractly tied to the schema. > That is, if I update the schema and it requires one of these, then I'd have to > go and know to update each subclass that requires this, which defeats the > purpose of subclassing. We should be able to modify the schema somehow that we > automatically know to make these from certain options passed in the schema. How exactly do you suggest generalizing this? This "CREATE RULE" is rather DBMS-specific, isn't it?
(In reply to comment #43) > (In reply to comment #40) > > In the future, unless there's some reason not to, it's a good idea to use > > "cvs diff -u" instead of "diff -Nru." > > Clarification: I just meant that you should use "cvs diff" instead of "diff" > when possible. :-) The -Nru is of course always very important if you're using a > normal "diff" command. Sorry, I can't seem to figure out how to use "cvs diff" with files that haven't yet been added to the CVS repository. Could you elaborate? "cvs diff -uN" doesn't work, at least not with the version of cvs I have installed.
Attached patch Latest DBSchema patch (obsolete) (deleted) — Splinter Review
Partially addresses comment#40.
Attachment #151500 - Attachment is obsolete: true
(In reply to comment #44) > (In reply to comment #40) > > >+ $self->{db_extras} = > > >+ [ > > >+ " CREATE RULE logincookies_insert AS\n" . > > >+ " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" . > > >+ " DO INSTEAD INSERT INTO logincookies VALUES \(\n" . > > >+ " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n", > > > > Oh, this is somewhat "bad," because it's not abstractly tied to the schema. > > That is, if I update the schema and it requires one of these, then I'd have to > > go and know to update each subclass that requires this, which defeats the > > purpose of subclassing. We should be able to modify the schema somehow that we > > automatically know to make these from certain options passed in the schema. > > How exactly do you suggest generalizing this? This "CREATE RULE" is rather DBMS-specific, isn't it? Hmmm, if I understand correctly what the query does, it's to cope with timestamp type of MySQL, which does not exists in PostgreSQL. If yes, then it's not neccessary to do this for logincookies since bug 257303 has been checked in. It's still neccessary for bug table (but I can't find it in this patch). If someone can review patch at bug 257315, then this 'hack' can be dropped completely, which would be IMHO a good thing.
(In reply to comment #44) > The schema is modified at object instantiation time. [snip] OK. I agree. The variable is good where it is. > > While we're at it, wouldn't it be better to have TYPE => 'decimal', PRECISION > > => [5,2]? Or OPTIONS => '(5,2)'? > > Seems like overkill, especially since decimal is an ANSI type supported by all > the DBMSes that Bugzilla will likely ever support. It just seemed like a good idea as far as abstraction went. If it turns out to be easier to code this way, then that's all good. :-) > > We definitely need documentation for some of these types, [snip] > > Contributions welcome. Unfortunately, I don't feel qualified to do that > myself. My experience with the Bugzilla code outside of this module is > minimal. OK. I *might* be able to be of help, but I think that only myk, bbaetz, or justdave could really say for certain what the requirements of each DB field are. > > I'm not the ANSI master... is "smallserial" actually an ANSI type? I'd > > like to stick with ANSI types in the Schema module. > > The uppercased types like INTSERIAL are "abstract" types. It's a requirement > that the database-specific subclasses define them. (Yes, this needs to be > documented...) I don't think there is an ANSI auto-incrementing, serialized > integer type, is there? Yeah, I'm pretty sure there's no ANSI auto-increment type. As long as this is documented that classes need to implement this, that's fine. :-) (Is there no way in perl to programmatically enforce that requirement?) > > >+ dup => {TYPE => 'integer', NOTNULL => 1 , > > >+ DISPLAYWIDTH => 9, PRIMARYKEY => 1}, > > > > Would it make more sense to have a PRIMARY => 'dup', since there can only > > be one Primary Key anyway? > > Maybe. Multi-column primary keys are possible, though perhaps arguably > indicative of a poor database design. Oh, that's true. OK, maybe we'd better keep it the way it is, then. > I believe the current MySQL schema has smallint auto_increment ids, mediumint > auto_increment ids, and regular integer auto_increment ids. The goal was to > reproduce the current schema used for MySQL, so the abstract base schema needs > to differentiate between these different auto_increment types in order to do > that. OK. Let's keep those as they are now, then. We don't want to change the current schema and abstract it at the same time. > > >+ # These next six are the "enumerated" fields used in bugs. > > >+ # We might need another column for sort order so that could be > > >+ # played around with... > > > > These are all now actually in existence in bug 17453's patch. > > So what needs to be changed? I'll send you the schema changes in a direct email, since they're sort of long. Others who are concerned can look at the checksetup.pl changes in the bug 17453 patch. > > I'm almost certain that some of the indexes on the bug table need to be > > multi-column... are they not, now? > > I just checked checksetup.pl and it doesn't look like there are any > multi-column indexes on the bugs table to me, but maybe I'm reading the code > wrong? Oh yeah, you are indeed correct. No multi-column indexes at all. > > Also, you might want to have a separate function to just generate and return > > the CREATE TABLE statement on the fly, just returning a string. I think that > > would be much simpler for the Installation script. > > Um, that's what it does now, doesn't it? Oh. I didn't realize that. :-) Just document what it returns more exactly, and then nobody will ever complain again. :-) > > > >+ $create_table .= "\t$fname\t$type"; > > >+ $create_table .= " NOT NULL" if ($finfo->{NOTNULL}); > > >+ $create_table .= " DEFAULT $default" if (defined($default)); > > >+ $create_table .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY}); > > > > I think that some of this may need to go into the DB-specific stuff, > wouldn't you think? > > You mean the primary key part? Yeah, I guess some refactoring here would probably be appropriate... Yeah. I'm not certain that every DB does primary keys the same way. Of course, at the worst, we could just refactor it later. All I'm concerned about to start with is Pg and MySQL support. > > OK, here's what I *really* need. :-) A function that reads in the > > *current* schema, and compares it to the required schema. Of course, that's > > the ENTIRE purpose of checksetup at this point, so that change could > > possibly go into bug 248175. > > Well, what can be done in this class to facilitate that goal? A function that could read the current, actual schema into a DBSchema object. :-) > > >+ LONGBLOB => 'bytea', > > > > Are these actually equivalent? I recall having to do something strange to > > get them to work equivalently in the RH Bugzilla fork. > > Beats me. I don't use blobs. byeta: http://www.postgresql.org/docs/7.3/static/datatype-binary.html "The SQL standard defines a different binary string type, called BLOB or BINARY LARGE OBJECT. The input format is different compared to bytea, but the provided functions and operators are mostly the same." longblob: http://dev.mysql.com/doc/mysql/en/String_type_overview.html#IDX1144 So I think they're pretty much the same, for our purposes, and DBI should handle any differences. > > >+ $self->{db_extras} = > > >+ [ > > >+ " CREATE RULE logincookies_insert AS\n" . > > >+ " ON INSERT TO logincookies WHERE NEW.lastused IS NULL\n" . > > >+ " DO INSTEAD INSERT INTO logincookies VALUES \(\n" . > > >+ " NEW.cookie, NEW.userid, NEW.ipaddr, localtimestamp\);\n", > > > > Oh, this is somewhat "bad," because it's not abstractly tied to the > > schema. That is, if I update the schema and it requires one of these, then > > I'd have to go and know to update each subclass that requires this, which > > defeats the purpose of subclassing. We should be able to modify the schema > > somehow that we automatically know to make these from certain options passed > > in the schema. > > How exactly do you suggest generalizing this? This "CREATE RULE" is rather > DBMS-specific, isn't it? Yes. We would need a TIMESTAMP abstract field type, that let us know that we had to do something like this. If timestamps go away before this patch is checked-in, then we're OK and don't need it, but in the mean time it might be a good idea.
Oh, also, we need a way to specify that certain indexes are FULLTEXT, like bugs.short_desc. :-(
Blocks: 249400
OK, I've now started to actually work on cross-DB installation. I have discovered that *all* I actually need, in order to make Bugzilla cross-DB compatible for installation, is something that maps field types to SQL. So that part of this patch is actually all I need. :-) I just need to be able to put in a variable in all of the "ChangeFieldType" calls and so forth in checksetup.pl. Could you re-submit a patch that does just that? :-) (Also, I'm now a real reviewer, so I can review your patch on this.)
Status: NEW → ASSIGNED
Oh, and the schema should fit inside the structure of bug 237862's new Bugzilla::DB heirarchy. You can just put the schema information directly into the Mysql.pm and Pg.pm.
Depends on: bz-dbcompat
Target Milestone: Bugzilla 2.22 → Bugzilla 2.20
(In reply to comment #50) > I have discovered that *all* I actually need, in order to make Bugzilla cross-DB > compatible for installation, is something that maps field types to SQL. Honestly, that doesn't sound right to me. There's something wrong with the code design if that's the case. The SQL that is used to create a table should be coming from this class. > So that part of this patch is actually all I need. :-) I just need to be able to > put in a variable in all of the "ChangeFieldType" calls and so forth in > checksetup.pl. > > Could you re-submit a patch that does just that? :-) (Also, I'm now a real > reviewer, so I can review your patch on this.) Hunh? Give an example of the arguments you would pass to such a method and what you think the return values of the method should be for those arguments. Lately, I've been thinking about redoing this whole class around DBIx::DBSchema. I'm not happy with this code at all really. If you read this whole bug, you'll see I've been advocating a more object-oriented approach from the beginning. A database schema object should be a collection of table objects which should be a collection of column objects and index objects, and that's exactly what DBIx::DBSchema does. Is it feasible to make DBIx:DBSchema a prequisite for Bugzilla? http://search.cpan.org/~ivan/DBIx-DBSchema-0.23/DBSchema.pm
(In reply to comment #51) > Oh, and the schema should fit inside the structure of bug 237862's new > Bugzilla::DB heirarchy. > > You can just put the schema information directly into the Mysql.pm and Pg.pm. Sorry, I don't agree at all with that.
Oh. :-) Actually, thinking about it, you're right! :-) The alter table syntax and create table syntax should come from this. :-) Look at my work on bug 277617 -- those methods that I moved are the various methods of Bugzilla::DB that need to be made cross-platform. As far as where the code goes, I think I may have not been totally clear. :-) There is a new structure in Bugzilla: Bugzilla::DB::Mysql and Bugzilla::DB::Pg -- these contain the Bugzilla-specific cross-db compatibility code. They work really well, they're very cool. They are created in bug 237862. The schema stuff should be a part of those modules, because that way we can get the information out of $dbh.
(In reply to comment #52) > Is it feasible to make DBIx:DBSchema a prequisite for Bugzilla? > > http://search.cpan.org/~ivan/DBIx-DBSchema-0.23/DBSchema.pm Well, the only problem with that is it limits us to supporting, for all of time, only the databases that DBIx::DBSchema supports. At this time, that's Mysql and Postgres. So that's great. :-) But I think we do want to support Oracle at some point, too, and I know that there is a small camp who want Sybase support. Really, I'm OK with adding more prerequisites to Bugzilla. And I'm happy to have code that other people have written and tested. My most important goal with this abstract schema is to support cross-database installation. If you'll look at things like Bugzilla::DB::bz_change_field_def (in bug 277617), that's pretty much all that has to work. Any elegant, future-proof way of making that work is good by me. :-)
(In reply to comment #54) > As far as where the code goes, I think I may have not been totally clear. :-) > There is a new structure in Bugzilla: > > Bugzilla::DB::Mysql and Bugzilla::DB::Pg -- these contain the Bugzilla-specific > cross-db compatibility code. They work really well, they're very cool. They are > created in bug 237862. > > The schema stuff should be a part of those modules, because that way we can get > the information out of $dbh. Yeah, I've been following that bug, but I haven't looked at the code yet. I believe the Schema should be its own object class. I have no problems calling the class Bugzilla::DB::Schema though. But initializing and carrying around the whole database schema is a waste of cycles and memory. The only time the schema is used is at installation time really. I suppose we could add a method to Bugzilla::DB which instantiates a schema object and returns it, but I firmly believe the schema should be a separate object class.
(In reply to comment #56) > > The schema stuff should be a part of those modules, because that way we can get > > the information out of $dbh. > > Yeah, I've been following that bug, but I haven't looked at the code yet. I > believe the Schema should be its own object class. I have no problems calling > the class Bugzilla::DB::Schema though. But initializing and carrying around the > whole database schema is a waste of cycles and memory. The only time the schema > is used is at installation time really. I suppose we could add a method to > Bugzilla::DB which instantiates a schema object and returns it, but I firmly > believe the schema should be a separate object class. I second that. We need the schema when creating/upgrading the database (checksetup.pl) and we will need it for consistency checks (sanity_check.pl). It should be separate module, instatiated on request. Also, we should try to put to the DB specific modules as little information as possible, otherwise it will be a nightmare to keep it in sync. The idea of a module with DB schema description in some generic, DB agnostic way, and then DB specific functions mapping this schema to particular database is so far the best I have seen here. What I am missing there is just how to work with things like upgrades - do you still need all the conversion code we have in checksetup.pl now, or can we somehow automagically detect that the DB schema and actual DB are different and fix the differences? Maybe a DB version number stored somewhere in the database would help here?
(In reply to comment #56) > But initializing and carrying around the > whole database schema is a waste of cycles and memory. The only time the schema > is used is at installation time really. Yeah, OK, I agree with that. > I suppose we could add a method to > Bugzilla::DB which instantiates a schema object and returns it, but I firmly > believe the schema should be a separate object class. Hrm. Well, the reason that I like the idea of having it in the Bugzilla::DB subclasses is that's already our structure for determining which database we're running on. That is, it's a very elegant OO solution that "just works." Basically, it picks the right module based on the $db_driver in localconfig. I'd hate to duplicate that code. I think it's definitely not a good idea to init the schema unless we ask for it, though. So perhaps users could get at the schema with $dbh->bz_schema->method, and then we'd only init the schema if we were using it, and we'd already know (even if the schema code was in another class) what database we were using. Oh, and another small note re: DBIx::DBSchema: it's a bit concerning that the most recent version of DBIx::DBSchema on CPAN is from 23 October 2003.
(In reply to comment #57) > do you still need all the conversion code we have in > checksetup.pl now, or can we somehow automagically detect that the DB schema and > actual DB are different and fix the differences? Maybe a DB version number > stored somewhere in the database would help here? I've thought about this for quite some time, myself. :-) Unfortunately, such a tool is not possible. We need checksetup to do various different things when the schema changes, not just change the schema itself. Sometimes we have to do migration and then a schema change, a schema change and then a migration, or a mixture of migration code and schema changes in complex sequences. Although some schema changes are simple and don't involve migration code, most changes do.
(In reply to comment #59) > (In reply to comment #57) > > do you still need all the conversion code we have in > > checksetup.pl now, or can we somehow automagically detect that the DB schema and > > actual DB are different and fix the differences? Maybe a DB version number > > stored somewhere in the database would help here? > > I've thought about this for quite some time, myself. :-) Unfortunately, such a > tool is not possible. We need checksetup to do various different things when the > schema changes, not just change the schema itself. Sometimes we have to do > migration and then a schema change, a schema change and then a migration, or a > mixture of migration code and schema changes in complex sequences. > > Although some schema changes are simple and don't involve migration code, most > changes do. Yep, I was already afraid that's the case, I was just hoping someone would have that magical wand :-). Which leaves us with a schema usable for new installations and (maybe) consistency checks. Upgrade code will be hardcoded in checksetup as it is now and will be another place with DB schema information. Although I don't like duplicating information all around the code, this seems to be acceptable (as it correspond to current implementation :-) ). So what we basically needs is: 1) set of DB independent functions for inspecting and modifying existing database exposed to checksetup.pl - bug 277617 2) DB independent function for creating a new database, which will take DB schema as a parameter (in DB agnostic format), or set of functions and then do the DB structure parsing in checksetup, based on how complex and DB dependant it is. 3) modify sanity_check to use the schema module (low priority ATM). Any takers to implement 2? :-) (considering the patch at this bug has already pretty good DB agnostic description).
(In reply to comment #58) > Hrm. Well, the reason that I like the idea of having it in the Bugzilla::DB > subclasses is that's already our structure for determining which database we're > running on. That is, it's a very elegant OO solution that "just works." > Basically, it picks the right module based on the $db_driver in localconfig. I'd > hate to duplicate that code. It's hardly rocket science. The current schema class constructor already does mostly that, except for the $db_driver part. But if we just use Bugzilla::DB to instantiate the Schema object, then that simplifies things further. > I think it's definitely not a good idea to init the schema unless we ask for > it, though. So perhaps users could get at the schema with > $dbh->bz_schema->method, and then we'd only init the schema if we were using it, > and we'd already know (even if the schema code was in another class) what > database we were using. Yes, I like that idea very much. On the first call to the bz_schema() method, the schema object would be instantiated and stored as class data. Any subsequent call would simply return the schema object. Personally, I would move all those bz_* methods you've added in bug 277617 to the Schema object class. Instead of $dbh->bz_add_field(), it would be $dbh->bz_schema->add_field(). I suppose that's a matter of taste though and there are pros and cons either way. > Oh, and another small note re: DBIx::DBSchema: it's a bit concerning that the > most recent version of DBIx::DBSchema on CPAN is from 23 October 2003. That's version 0.22. The current version is 0.23 which was released on 16 February 2004. Not much better, I suppose. Oh, it mostly already supports Sybase. But I'll probably stick with the current implementation, as I've suddenly started to feel kind of lazy. :-) If we were starting over from scratch though, I'd definitely go with DBIx::DBSchema or something similar.
(In reply to comment #60) > So what we basically needs is: > 1) set of DB independent functions for inspecting and modifying existing > database exposed to checksetup.pl - bug 277617 > 2) DB independent function for creating a new database, which will take DB > schema as a parameter (in DB agnostic format), or set of functions and then do > the DB structure parsing in checksetup, based on how complex and DB dependant it is. > 3) modify sanity_check to use the schema module (low priority ATM). > > Any takers to implement 2? :-) (considering the patch at this bug has already > pretty good DB agnostic description). We already mostly have 2. The ddl_statements() method returns the SQL statements used to create the database. All we need is a method that loops over those SQL statements, passes them to $dbh->do(), and checks for errors. So the question is should that go in Bugzilla::DB or Bugzilla::DB::Schema? I think it should go in the latter (along with bug 277617's methods for altering the database), but, like I wrote in a previous comment, there are pros and cons to doing it either way. The inspecting portion of 1 is a lot harder, I think. DBI has some methods for doing that stuff, but the level of support varies from DBMS to DBMS.
> Personally, I would move all those bz_* methods you've added in bug 277617 to > the Schema object class. Instead of $dbh->bz_add_field(), it would be > $dbh->bz_schema->add_field(). I suppose that's a matter of taste though and > there are pros and cons either way. Yeah, I was thinking about that, too. I was sort of waffling as to whether the Schema classes should just be a hidden implementation detail of Bugzilla::DB, or actually used by callers. Also, it's kind of nice to have all the functions that a caller needs to know about for the database in any way inside of one module. But if we can come up with some pros of having the functions in the Schema object, I definitely wouldn't object. > That's version 0.22. The current version is 0.23 which was released on 16 > February 2004. Not much better, I suppose. Oh, it mostly already supports > Sybase. But I'll probably stick with the current implementation, as I've > suddenly started to feel kind of lazy. :-) Hahaha, OK. :-) > If we were starting over from scratch though, I'd definitely go with > DBIx::DBSchema or something similar. Interesting. Yeah, I hear that there's a lot of interesting code in this area, like AnyDBD or things like that. But I agree, I think it's better to go with what we have. The Schema stuff that you've already designed is pretty cool. (In reply to comment #62) > We already mostly have 2. The ddl_statements() method returns the SQL > statements used to create the database. All we need is a method that loops > over those SQL statements, passes them to $dbh->do(), and checks for errors. Yeah. Which is totally great for creating the database, the first time. So I wouldn't object to using that method for that. Right now, most of checksetup is upgrade code, though. And I would be slightly concerned to have a database created using one code path, but upgraded using another code path. Of course, as long as they were all modular in some way, and it worked, I wouldn't object. :-) My goal right now is working PostgreSQL support in the installer, and anything that gets me that makes me happy. :-) > The inspecting portion of 1 is a lot harder, I think. DBI has some methods for > doing that stuff, but the level of support varies from DBMS to DBMS. Well, we can sort of create a generic interface for it. We can just parse the results of some custom commands (or use the DBI stuff, as you mention), and return a hash/array that gives us certain information about a table/column. We do need some way of generating cross-platform versions of all of the bug 277617 functions, though -- re-writing checksetup entirely to use a different method would delay cross-platform support for possibly six months or a year, and only provided that I could actually get enough time at my company to work that out, and then actually find somebody to review it. :-) The functions that we have now work, and they're all we need to implement it. If we have to fudge it for PostgreSQL support, I won't object. If we can get it elegant and perfect for PostgreSQL support, I also won't object. :-)
(In reply to comment #62) > We already mostly have 2. The ddl_statements() method returns the SQL statements > used to create the database. All we need is a method that loops over those SQL > statements, passes them to $dbh->do(), and checks for errors. I know, I have looked at the code (looks very good, btw :-) ). It just needs to be finished and brought up to date with the latest development on bug 237862 :-). > > So the question is should that go in Bugzilla::DB or Bugzilla::DB::Schema? I > think it should go in the latter (along with bug 277617's methods for altering > the database), but, like I wrote in a previous comment, there are pros and cons > to doing it either way. I would personally vote for the schema plus DB independent support functions to go to DB::Schema, DB independent code which may be used without the schema to Bugzilla::DB, and DB specific functions to Bugzilla::DB overrides (Mysql and Pg ATM). The idea behind is that if you need the schema (and methods to work with it), you include the DB::Schema module, but not otherwise (I think right now, only checksetup.pl will need to include the Schema module, maybe sanity_check later). > > The inspecting portion of 1 is a lot harder, I think. DBI has some methods for > doing that stuff, but the level of support varies from DBMS to DBMS. Yes, but as we can implement it separatelly on every DB through the DB specific modules, and do not have to rely on DBI (although we should use it whenewer possible), it should be doable.
You dont have to rewrite much of checksetup at all. Old installations will always be on MySQL. It is only new installations and upgrades from this point forward that need a compatibility layer.
Yeah. Actually (this is for another bug, though), I'm going to re-write checksetup to migrate as much as possible starting with 2.17.1, which was when Red Hat started their PostgreSQL fork. But I might also just write a separate script to handle those migrations.
(In reply to comment #63) > Also, it's kind of nice to have all the functions that a caller needs to know > about for the database in any way inside of one module. > On one hand, it's nice to have them together, on the other, it may be tempting then to use functions meant to be used for DB setup by runtime as well (I can see a code checking for column type somewhere in search.pm and it scares me). If we separate "DB management" and "DB use" into two modules, it's more obvious how they were meant to be used. > And I would be slightly concerned to have a database created using one code > path, but upgraded using another code path. Of course, as long as they were all > modular in some way, and it worked, I wouldn't object. :-) That worries me too, but is basically what we have now, only it's on two places in one module (checksetup.pl, if we forget about sanity_check for now). We are proposing to put it to two separate modules. Maybe if we move DB upgrade code from checksetup to schema module as well? That would be closer to what we have now. It would also mean that checksetup will be involved only in OS and perl checks, file permisions and stuff like that, which I personally sort of like :-). We can also add one sanity check at the end of the setup phase, which would compare the current database after upgrade with our schema, ensuring that we got by the upgrade what we need. > We do need some way of generating cross-platform versions of all of the bug > 277617 functions, though -- re-writing checksetup entirely to use a different > method would delay cross-platform support for possibly six months or a year, and > only provided that I could actually get enough time at my company to work that > out, and then actually find somebody to review it. :-) > > The functions that we have now work, and they're all we need to implement it. > If we have to fudge it for PostgreSQL support, I won't object. If we can get it > elegant and perfect for PostgreSQL support, I also won't object. :-) The major rule here is "small steps". We can easily move the functions from checksetup to the DB layer (that's what your current patch does), then we will have to make them DB agnostic, and then we can continue improving the interface. But redesigning it from scratch is very seldom a way to success...
(In reply to comment #63) > Yeah, I was thinking about that, too. > > I was sort of waffling as to whether the Schema classes should just be a > hidden implementation detail of Bugzilla::DB, or actually used by callers. > > Also, it's kind of nice to have all the functions that a caller needs to know > about for the database in any way inside of one module. Yes, that's the most compelling reason to keep those methods in Bugzilla::DB. > But if we can come up with some pros of having the functions in the Schema > object, I definitely wouldn't object. Well, the only argument I can come up with is one based on object design principles, but it's not holding water upon further thought. I was thinking that the methods that are schema-related obviously should be in the schema class. But while those methods for altering the database *utilize* the schema object, but they don't actually *act* on the schema object. So I think I've convinced myself that they belong in Bugzilla::DB. But whatever. > > We already mostly have 2. The ddl_statements() method returns the SQL > > statements used to create the database. All we need is a method that loops > > over those SQL statements, passes them to $dbh->do(), and checks for errors. > > Yeah. Which is totally great for creating the database, the first time. So I > wouldn't object to using that method for that. OK, so do we need a create_db() method in Bugzilla::DB or would just looping over the $dbh->bz_schema->ddl_statements in checksetup be sufficient? Since you're working on bug 27716, I'll leave that up to you, Max. > Right now, most of checksetup is upgrade code, though. > > And I would be slightly concerned to have a database created using one code > path, but upgraded using another code path. Of course, as long as they were all > modular in some way, and it worked, I wouldn't object. :-) I think that's always going to be a danger. The only solution that is 100% effective and perfectly cross-DBMS is to dump the database to text files, drop the database, recreate the database from scratch, and populate the new database using the database dump from before the upgrade. I suppose that wouldn't be a very popular method though.
Max, that brings us back to here, I think: (In reply to comment #52) > (In reply to comment #50) > > I have discovered that *all* I actually need, in order to make Bugzilla cross-DB > > compatible for installation, is something that maps field types to SQL. > > > So that part of this patch is actually all I need. :-) I just need to be able to > > put in a variable in all of the "ChangeFieldType" calls and so forth in > > checksetup.pl. > > > > Could you re-submit a patch that does just that? :-) (Also, I'm now a real > > reviewer, so I can review your patch on this.) > > Give an example of the arguments you would pass to such a method and what you > think the return values of the method should be for those arguments. How would you be using this method exactly? A usage example would be very helpful in implementing what you want.
(In reply to comment #68) > So I think I've convinced myself that they belong in Bugzilla::DB. But > whatever. OK. They are there now (with my bug), so that sounds good to me. :-) > OK, so do we need a create_db() method in Bugzilla::DB or would just looping > over the $dbh->bz_schema->ddl_statements in checksetup be sufficient? Since > you're working on bug 27716, I'll leave that up to you, Max. I think a create_db() method would be great. The more code that I can move out of checksetup, the happier I am. > [snip] I suppose that wouldn't be a very popular method though. Hahahaha, yeah. That certainly wouldn't be very popular. :-) (In reply to comment #69) > Max, that brings us back to here, I think: > [snip] > How would you be using this method exactly? A usage example would be very > helpful in implementing what you want. OK, here's some checksetup code that exists, now (in bug 277617 form): > $dbh->bz_add_field('products', 'disallownew', 'tinyint not null'); That's the simplest -- me adding a field to the products table, called disallownew, with that type. Everything but the type is already cross-DB, there. > if ($dbh->bz_get_field_def('products', 'product')) { > $dbh->bz_change_field_type ('bugs', 'product', 'varchar(64) not null'); There's something a little more complicated -- we check for the existence of a field, and if it exists, we redefine some fields. Basically, we see if the schema is old, and we change it to the new schema. When it gets more complicated than this, we: (1) Check for the existence of a field (2) Do some migration code (3) Change a field definition (4) Possibly repeat 2/4 a few times That's the general idea. In the same way, I need to be able to add indexes, drop indexes, and check if tables exist. For the most part, functions for all of these things already exist, and are handled by bug 277617. If, as I work on checksetup, I discover I need another function, that can definitely be arranged. :-)
Oh, and as far as return values, look at the pod docs for bz_get_field_def and bz_get_index_def in bug 277617.
> OK, here's some checksetup code that exists, now (in bug 277617 form): > > > $dbh->bz_add_field('products', 'disallownew', 'tinyint not null'); <snip> What I would also like to see here, and that seems to fit into the scope of this bug, is some abstraction of DB types, probably using constants. So the line above would become something like: $dbh->bz_add_field('products', 'disallownew', 'BOOLEAN NOT NULL'); or something in that sense. As we don't want to use DB specific type for DB schema, we do not want to use them for upgrade code also...
(In reply to comment #72) > > OK, here's some checksetup code that exists, now (in bug 277617 form): > > > > > $dbh->bz_add_field('products', 'disallownew', 'tinyint not null'); > > <snip> > > What I would also like to see here, and that seems to fit into the scope of this > bug, is some abstraction of DB types, probably using constants. So the line > above would become something like: > > $dbh->bz_add_field('products', 'disallownew', 'BOOLEAN NOT NULL'); > > or something in that sense. As we don't want to use DB specific type for DB > schema, we do not want to use them for upgrade code also... Just to clarify this further, that means that the we need something like _adjust_schema method in this patch which will be generalised to be able to map abstract DB type to DB specific type on demand, and it will be used internally by all these functions which takes types as arguments.
(In reply to comment #73) > (In reply to comment #72) > > > OK, here's some checksetup code that exists, now (in bug 277617 form): > > > > > > > $dbh->bz_add_field('products', 'disallownew', 'tinyint not null'); Max, how about if we make it $dbh->bz_add_field('products','disallownew') and then change bz_add_field() to look up the type via the Schema object? Otherwise, the code will end up looking like this: $dbh->bz_add_field('products','disallownew', $dbh->bz_schema- >get_table_column_type('products','disallownew')) which seems awfully redundant. Everyone happy with the method name get_table_column_type()? It's kind of verbose... If so, a patch will be forthcoming which adds this to the DB::Schema class.
(In reply to comment #74) > (In reply to comment #73) > > (In reply to comment #72) > > > > OK, here's some checksetup code that exists, now (in bug 277617 form): > > > > > > > > > $dbh->bz_add_field('products', 'disallownew', 'tinyint not null'); > > Max, how about if we make it > > $dbh->bz_add_field('products','disallownew') > > and then change bz_add_field() to look up the type via the Schema object? No, unfortunatelly you can't do that. If you happen to be upgrading from really old BZ installation to a new one, and it happens that some column changed type twice (e.g. text->varchar->blob), you would immediatelly convert to the last type via the first call, possibly breaking the upgrade path. > > Otherwise, the code will end up looking like this: > > $dbh->bz_add_field('products','disallownew', $dbh->bz_schema- > >get_table_column_type('products','disallownew')) > > which seems awfully redundant. > > Everyone happy with the method name get_table_column_type()? It's kind of verbose... If so, a patch will > be forthcoming which adds this to the DB::Schema class. Which is exactly what I was targeting by requesting the functions to take abstract types and doing the conversion internally. You will call it e.g.: $dbh->bz_add_field('products', 'disallownew', BOOLEAN); and the bz_add_field function will look up what type it should use for BOOLEAN on e.g. MySQL. Sorry, but as we already agreed, the schema knowledge just has to be on two places, one for upgrade, and one for new install.
(In reply to comment #74) > $dbh->bz_add_field('products','disallownew') Believe it or not, sometimes I may need to change a field to something that isn't its current definition, and then change it again. Migration is nuts.
(In reply to comment #75) > No, unfortunatelly you can't do that. If you happen to be upgrading from really > old BZ installation to a new one, and it happens that some column changed type > twice (e.g. text->varchar->blob), you would immediatelly convert to the last > type via the first call, possibly breaking the upgrade path. Tom, I think you're getting bz_add_field() mixed up with bz_change_field_type(). What you are describing is not a problem with bz_add_field(), but it is definitely a problem with how bz_change_field_type() is utilized. I agree there needs to be a Schema method for converting the DB- generic types like BOOLEAN to its corresponding DB-specific type. I'll implement that as well. Any suggestions for what to name that method?
(In reply to comment #77) Just to clarify: I think I may also need to sometimes add fields that don't have their modern definition, and then change them later in the code.
Alias: bz-dbschema
(In reply to comment #77) > Any suggestions for what to name that method? sql_column_type, maybe? That would be consistent.
(In reply to comment #77) > Tom, I think you're getting bz_add_field() mixed up with bz_change_field_type(). What you are > describing is not a problem with bz_add_field(), but it is definitely a problem with how > bz_change_field_type() is utilized. Yes, I was talking more about change_field_type than add_field, but as Max already pointed out, the same holds true for add_field as well. You can have cases where you add a field with certain type and change it later (and you can't just add the field with the new type). Yes, DB upgrading sucks :-). (In reply to comment #79) > sql_column_type, maybe? That would be consistent. Where are we going to use this? I would almost say it should be private, as it should be called only by other Bugzilla::DB functions, am I right here?
(In reply to comment #78) > Just to clarify: I think I may also need to sometimes add fields that don't > have their modern definition, and then change them later in the code. Ugh. That seems kind of nuts, if you ask me. Kind of defeats the whole purpose of the Schema class! I suppose that's the only feasible way to implement upgrading the database from really old versions to the current version. Well, I'll make one last suggestion: If bz_add_field() is passed only two arguments, I think it should still look up the (current) type via Schema. If bz_add_field() is passed three arguments, then it could use that third argument as the type of the column to be added. What do you think?
Alias: bz-dbschema
(In reply to comment #81) > Well, I'll make one last suggestion: If bz_add_field() is passed only two > arguments, I think it should still look up the (current) type via Schema. If > bz_add_field() is passed three arguments, then it could use that third argument > as the type of the column to be added. What do you think? It's a fine idea, but checksetup is "time-based" code. That is, at a certain time in the history of Bugzilla I want to change a field to a specific type, and then later I want to change it to another specific type. checksetup code is supposed to be "write once, ignore forever." :-)
(In reply to comment #81) > (In reply to comment #78) > > Just to clarify: I think I may also need to sometimes add fields that don't > > have their modern definition, and then change them later in the code. > > Ugh. That seems kind of nuts, if you ask me. Kind of defeats the whole purpose of the Schema class! I > suppose that's the only feasible way to implement upgrading the database from really old versions to > the current version. Yep, it's nuts, but that's life :-). As it is now, schema will be used ONLY for new installs (and maybe consistency checks etc.), but not for upgrades. > Well, I'll make one last suggestion: If bz_add_field() is passed only two arguments, I > think it should still look up the (current) type via Schema. If bz_add_field() is passed three arguments, > then it could use that third argument as the type of the column to be added. What do you think? No. bz_add_field() should be used only by checksetup, nothing else. If we encourage (or even allow) use with two parameters, it will cause trouble later on when you find you really need to change that type. Think about the whole upgrade process as a journal, or DB transaction - you are always only adding new conversion statements, but never modify or remove what's already there. That's the only way to guarantee upgrade from anything to latest version. If you allow to use Schema for some of the calls, you will effectively change it's parameters when the schema changes, which means breaking the transaction record and (possibly) breaking the upgrade...
Attached patch V3 (obsolete) (deleted) — Splinter Review
New patch which adds sql_type() method, which I think addresses our recent discussion with regards to supporting abstract data types in checksetup.pl. This patch also addresses comment #49 by assuming that Mysql will always need to create a FULLTEXT index whenever one of the fields being indexed is of type text. I just noticed that checksetup.pl and the Mysql schema generated by this object class exhibit a fair number of differences. Most of the Mysql integer data types are "mediumint", but the Schema object has "integer". Some of the "text" fields are "mediumtext" in checksetup.pl. Do these differences need to be addressed?
Attachment #168834 - Attachment is obsolete: true
(In reply to comment #84) > I just noticed that checksetup.pl and the Mysql schema generated by this object > class exhibit a fair number of differences. Most of the Mysql integer data > types are "mediumint", but the Schema object has "integer". Some of the "text" > fields are "mediumtext" in checksetup.pl. Do these differences need to be > addressed? Yeah. Unfortunately, MySQL will either refuse to do, or fail to use correct indexes for, or at least spit out a warning about, comparing integer fields of different sizes. So what is now a mediumint needs to stay a mediumint. If I could go back in time, yes, I'd almost definitely change all the "integer" fields to something more sane. But for now, if it's "mediumint" we need a way to keep it "mediumint." For PostgreSQL, I don't care, as long as it's all consistent. :-) If we can keep all the integer types consistent for Pg, then I'm really happy. :-) Perhaps later, we can put in a checksetup change to standardize all integer fields to the same type. In fact, if somebody would file a bug for that and assign it to me, I'd be happy to do it (after this patch, here).
Comment on attachment 173714 [details] [diff] [review] V3 As always, this is very well-designed. :-) >+sub initialize { We usually call this sort of thing _create, in Bugzilla modules. I suppose it's not technically private, here, but "protected" instead. Still, if it's not meant to be used by callers, I'd give it an underscore. Or some other convention. >+ my($self,$tname,$iname,$ifields,$itype) = @_; Bugzilla style is usually to put a space between those. Also, use more descriptive variable names. We don't object to having longer_names, for Bugzilla. :-) That would make it a lot clearer to people who aren't you and I what this code is doing. :-) >diff -rNu Bugzilla.HEAD/DB/Schema/Pg.pm Bugzilla/DB/Schema/Pg.pm > [snip] >+ $self->{db_extras} = As I've mentioned before, I would prefer if somehow this could be done by the create_db() function (or whatever it will be called) when it runs into a "timestamp" type. Of course, we're also trying to eliminate all timestamp fields, so we'll probably get to that before we actually go live with this code. (Still, I'd rather that not block this patch.) It's only a nit, though, and since this *works*, I'd be happy to check this part in as-is. >diff -rNu Bugzilla.HEAD/DB/Schema/Sybase.pm Bugzilla/DB/Schema/Sybase.pm Wow! :-) Cool. :-) Let's keep Sybase out of this patch for now, though. We can add it in later, for the Sybase bug, or in some other bug. >+ $self->{end_of_statement} = "\ngo\n"; Oh, I kept forgetting to mention this -- DBI is supposed to handle this. We don't need to handle this. At the least, this should be in Bugzilla::DB, not here. >--- Bugzilla.HEAD/DB/Schema.pm Wed Dec 31 19:00:00 1969 >+++ Bugzilla/DB/Schema.pm Tue Feb 8 02:31:30 2005 >+use strict; >+use Bugzilla::Config qw(:db); You shouldn't need that information here, technically, because Bugzilla::DB will always be your caller, and will supply you with something useful. Perhaps new() should just take an $dbh handle created by Bugzilla::DB, or something (just idle thoughts). >+ my $this = shift; >+ my $class = ref($this) || $this; >+ my $driver = @_ ? shift : $db_driver; Yeah, this is what we shouldn't be doing here. (It has to do with the fact that every time we call $db_driver, we have to do error-checking to output sensible error messages to the user, and I'd like to keep that all in the Buzilla::DB framework.) I think that we can get rid of the mechanism, and expect the caller to always know what type of DB they will be instantiating on (since the caller will always be Bugzilla::DB, in our case). >+sub _adjust_schema { >+ # PRIVATE method to alter the abstract schema at instantiation-time to be >+ # DB-specific. It's a generic enough routine that it can be defined here in >+ # the base class. Subclasses may override or extend this method if needed. >+ >+ my $self = shift; >+ my $db_specific = $self->{db_specific}; >+ foreach my $table (keys %{ $self->{schema} }) { >+ my %fields = (@{ $self->{schema}{$table}{FIELDS} }); >+ foreach my $field_def (values %fields) { >+ if (exists($db_specific->{$field_def->{TYPE}})) { >+ $field_def->{TYPE} = $db_specific->{$field_def->{TYPE}}; >+ } >+ if (exists($field_def->{DEFAULT}) >+ && exists($db_specific->{$field_def->{DEFAULT}})) { >+ $field_def->{DEFAULT} = $db_specific->{$field_def->{DEFAULT}}; As I mentioned before, this needs extensive line-by-line comments. >+sub ddl_statements { >+ # PUBLIC method to generate the SQL statements needed to create the >+ # the Bugzilla database's tables and indexes. Subclasses may override or >+ # extend this method if needed. >+ # Returns an array of strings containing the SQL statements. Hrm... I'd rather that this were private, and a create_db() call was public. Then we could just call create_db() in checksetup, and eliminate a lot of nonsense from that file. :-) Look at the part of checksetup.pl that starts with: >######################################################################## ># Create tables >######################################################################## A lot of that can be moved to inside this class, in a create_db() function. So generally, instead, what should happen is a lot of looping $dbh->do statements. If that shows up as Bugzilla::DB::bz_create_db(), I'm totally fine with that. If it shows up here, because it requires intimate knowledge of the internals of the Schema, I'm also OK with that. >+ my $self = shift; >+ my @ddl = (); >+ >+ foreach my $tname (keys %{ $self->{schema} }) { >+ my $thash = $self->{schema}{$tname}; Another point where longer_variable_names would help. >+ ThrowCodeError("According to the database schema, the table $table does not contain a field named $field.") unless ($finfo); This line is a bit too long, so it'll just need to be cut up into some appended strings. >+ my $type = $finfo->{TYPE}; >+ my $default = $finfo->{DEFAULT}; >+ my $fkref = $self->{no_references} ? undef : $finfo->{REFERENCES}; I could use a comment for that line. :-) >+ my $sql = $type; >+ $sql .= " NOT NULL" if ($finfo->{NOTNULL}); >+ $sql .= " DEFAULT $default" if (defined($default)); >+ $sql .= " PRIMARY KEY" if ($finfo->{PRIMARYKEY}); >+ $sql .= "\n\t\t\t\tREFERENCES $fkref" if $fkref; That REFERENCES has to not be in the code for MySQL 3. Do we handle that? >+sub sql_type ($) { >+ # PUBLIC method to convert abstract (database-generic) field types to >+ # database-specific field types. >+ >+ my($self,$type) = @_; >+ >+ return($self->{db_specific}{$type} || $type); And this looks perfect. :-) I wouldn't object to having it in Bugzilla::DB, too. OK, here's what we'll actually need in order for this to get checked-in: (1) Important comments addressed (2) Actual use in the codebase inside of this patch. That is, the "create tables" section of checksetup should be replaced with a call to bz_create_db() or bz_create_tables() or whatever you want to call it. :-) We will then need to do regression testing -- I can do this, because I have access to landfill, where we have a lot of old Bugzilla DBs. (3) Full integration into the Bugzilla::DB structure (which will be required for 2 above, anyway).
Attachment #173714 - Flags: review-
Attached patch V4 (obsolete) (deleted) — Splinter Review
(In reply to comment #86) > >+sub initialize { > > We usually call this sort of thing _create, in Bugzilla modules. I suppose > it's not technically private, here, but "protected" instead. Still, if it's not > meant to be used by callers, I'd give it an underscore. Or some other > convention. Right. Renamed method to _initialize. > >+ my($self,$tname,$iname,$ifields,$itype) = @_; > > Bugzilla style is usually to put a space between those. OK. > Also, use more descriptive variable names. We don't object to having > longer_names, for Bugzilla. :-) That would make it a lot clearer to people who > aren't you and I what this code is doing. :-) OK. > >diff -rNu Bugzilla.HEAD/DB/Schema/Pg.pm Bugzilla/DB/Schema/Pg.pm > > [snip] > >+ $self->{db_extras} = > > As I've mentioned before, I would prefer if somehow this could be done by the > create_db() function (or whatever it will be called) when it runs into a > "timestamp" type. Of course, we're also trying to eliminate all timestamp > fields, so we'll probably get to that before we actually go live with this > code. (Still, I'd rather that not block this patch.) > > It's only a nit, though, and since this *works*, I'd be happy to check this > part in as-is. I don't really understand what you want here. If you want me to delete both of these "CREATE RULE" statements from the Pg class, just say the word. Or is the whole db_extras thing that you don't like? > Let's keep Sybase out of this patch for now, though. We can add it in later, > for the Sybase bug, or in some other bug. Well, I expected that. > >+ $self->{end_of_statement} = "\ngo\n"; > > Oh, I kept forgetting to mention this -- DBI is supposed to handle this. We > don't need to handle this. At the least, this should be in Bugzilla::DB, not > here. Yes, you're absolutely right. This has been removed. > >+use strict; > >+use Bugzilla::Config qw(:db); > > You shouldn't need that information here, technically, because Bugzilla::DB > will always be your caller, and will supply you with something useful. > > Perhaps new() should just take an $dbh handle created by Bugzilla::DB, or > something (just idle thoughts). > > >+ my $this = shift; > >+ my $class = ref($this) || $this; > >+ my $driver = @_ ? shift : $db_driver; > > Yeah, this is what we shouldn't be doing here. (It has to do with the fact > that every time we call $db_driver, we have to do error-checking to output > sensible error messages to the user, and I'd like to keep that all in the > Buzilla::DB framework.) It doesn't make the constructor any shorter or simpler to remove it, but OK. > I think that we can get rid of the mechanism, and expect the caller to always > know what type of DB they will be instantiating on (since the caller will > always be Bugzilla::DB, in our case). Yeah, sure. > >+sub _adjust_schema { > As I mentioned before, this needs extensive line-by-line comments. Done. > >+sub ddl_statements { > >+ # PUBLIC method to generate the SQL statements needed to create the > >+ # the Bugzilla database's tables and indexes. Subclasses may override or > >+ # extend this method if needed. > >+ # Returns an array of strings containing the SQL statements. > > Hrm... I'd rather that this were private, and a create_db() call was public. > Then we could just call create_db() in checksetup, and eliminate a lot of > nonsense from that file. :-) I thought we decided to put create_db() in Bugzilla::DB? Aside: Looking at checksetup.pl, I see it's creating the tables with "TYPE = MYISAM" appended to all of the "CREATE TABLE" statements. The current implmentation of the Schema class does not do this. Should it? > >+ ThrowCodeError("According to the database schema, the table $table does not contain a field named $field.") unless ($finfo); > > This line is a bit too long, so it'll just need to be cut up into some > appended strings. Done. > >+ my $type = $finfo->{TYPE}; > >+ my $default = $finfo->{DEFAULT}; > >+ my $fkref = $self->{no_references} ? undef : $finfo->{REFERENCES}; > > I could use a comment for that line. :-) I basically just put that in there so that subclasses can eliminate references without overriding the whole method. I'm still thinking it over. > That REFERENCES has to not be in the code for MySQL 3. Do we handle that? Not currently. I could add a Mysql3.pm subclass which ISA Mysql with the only difference being that it's _initialize method sets $self->{no_references} to true after calling SUPER::_initialize. It would be up to Bugzilla::DB to instantiate either the Mysql or Mysql3 schema. What do you think? > OK, here's what we'll actually need in order for this to get checked-in: > > (1) Important comments addressed Mostly done?
Attachment #173714 - Attachment is obsolete: true
(In reply to comment #87) > I don't really understand what you want here. If you want me to delete both of > these "CREATE RULE" statements from the Pg class, just say the word. Nah. Don't change anything for now. The CREATE RULE statements will eventually disappear anyway, when the timestamp fields disappear. > I thought we decided to put create_db() in Bugzilla::DB? Oh yeah, we did. You can add it there, though, as a part of this patch. (At least, after the required patch is checked-in.) > Aside: Looking at checksetup.pl, I see it's creating the tables with "TYPE = > MYISAM" appended to all of the "CREATE TABLE" statements. The current > implmentation of the Schema class does not do this. Should it? Yes, it definitely should. That was a bugfix; I forget the bug number. > I basically just put that in there so that subclasses can eliminate references > without overriding the whole method. I'm still thinking it over. OK. It needs to be documented somewhere, then, if you keep it. > > That REFERENCES has to not be in the code for MySQL 3. Do we handle that? > > Not currently. I could add a Mysql3.pm subclass which ISA Mysql with the only > difference being that it's _initialize method sets $self->{no_references} to > true after calling SUPER::_initialize. It would be up to Bugzilla::DB to > instantiate either the Mysql or Mysql3 schema. What do you think? Instead, there should be a server_version sub in Bugzilla::DB that will tell you the information you need inside of Schema::Mysql to disable references or not. > Mostly done? I'll check the new patch itself over more, later. :-) It does sound like it's mostly done, though. :-)
(In reply to comment #88) > Nah. Don't change anything for now. The CREATE RULE statements will eventually > disappear anyway, when the timestamp fields disappear. > They already did, last patch was checked in today. There shouldn't be any column with timestamp type anymore.
Oh, I just saw this: Take a look at DBI::type_info_all and DBI::type_info. I think it could be really useful for this module, perhaps... Perhaps we could override type_info for types that it doesn't support or that we want to change.
Attached patch V5 (obsolete) (deleted) — Splinter Review
I went through all the table definitions in checksetup.pl and fixed any discrepancies with the abstract schema. There were a lot of changes made to the schema since Andrew first created the abstract schema! Please look over these changes. This version should generate mediumint and mediumtext fields in all the same places that the current checksetup.pl schema does. I also appended "TYPE = MYISAM" to all MySQL CREATE TABLE statements. I removed the CREATE RULE statements from the Pg schema, as per comment #89.
Attachment #173835 - Attachment is obsolete: true
Comment on attachment 174459 [details] [diff] [review] V5 Great! This actually basically looks good. It needs some POD docs, though. And before I can actually review+ it, I need to see some use of it or other. Perhaps a create_db function in the new Bugzilla::DB structure. And also patch the new Bugzilla::DB structure to know about the Schema objects. But this part of it looks fundamentally good.
Attachment #174459 - Flags: review-
And note that there have been some schema checkins since this patch was posted, too: http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/webtools/bugzilla/checksetup.pl cryptpassword varchar(128) is the only one that I see, but there might have been others before that, that I'm not catching.
FWIW, when you are at that :-), I would also like to see some more integration with the db compat modules. I think we don't need new modules for DB specific code, you can now add it to the existing ones. We can also remove the driver dependent instantiation from here to avoid duplicating effort. I would probably go for a function in the DB.pm module to return/create the db, which would get the generic schema from either DB.pm, or, preferably, from Schema.pm, and then call DB specific (abstract) method to adjust it with DB specific rules. How does that sound? :-) Otherwise I really like the direction this patch is going...
(In reply to comment #94) > FWIW, when you are at that :-), I would also like to see some more integration > with the db compat modules. There will be. However, the Schema objects should stay as they are. They will be objects privately held by the Bugzilla::DB object, and accessed that way. Nobody outside of Bugzilla::DB will ever know about Schema. > I would probably go for a function in the DB.pm module to return/create the db, Already planned. :-) That's what I just mentioned in my review. :-)
Comment on attachment 174459 [details] [diff] [review] V5 >+} #eosub--new >+#------------------------------------------------------------------------------ Oh, and don't do those -- it's not standard Bugzilla style.
(In reply to comment #96) > (From update of attachment 174459 [details] [diff] [review] [edit]) > >+} #eosub--new > >+#------------------------------------------------------------------------------ > > Oh, and don't do those -- it's not standard Bugzilla style. > It should be.
(In reply to comment #97) > > Oh, and don't do those -- it's not standard Bugzilla style. > > > > It should be. It won't be, though. It clutters the space and nobody will ever remember to always do it. :-) It's a nice thing, particularly when reading in a highlighting editor, but it's just not what we do at this point. I prefer to be as consistent as possible with the current standards.
Actually, I thought about it, and it does make things more readable. But make the lines 70 characters, not 80, so we can quote them more easily.
(In reply to comment #99) > Actually, I thought about it, and it does make things more readable. > But make the lines 70 characters, not 80, so we can quote them more easily. I should clarify. :-) I've decided I like the #---- things. But make them 70 characters long instead of 80 characters long. :-)
As stated at bug 283076, we need to specify precision of dates for Postgres to be zero to get rid of the fraction. Also, you will need to add "without time zone" to Postgres date types. See the (outdated) pgsetup.pl at bug 98304 (https://bugzilla.mozilla.org/attachment.cgi?id=130925) for details.
Ed: All I need for this patch to get review+ is to have a section that adds a "bz_schema" object to the Bugzilla::DB instance. Then I can code the actual installation code.
Alias: bz-dbschema
Blocks: 284348
(In reply to comment #102) > Ed: All I need for this patch to get review+ is to have a section that adds a > "bz_schema" object to the Bugzilla::DB instance. > > Then I can code the actual installation code. I'll post a new patch in a few days, Max.
OK. Feature freeze for 2.20 is March 15, so I've been hacking away at checksetup pretty rapidly. I might get around to it first, so we'll see.
OK, now that I'm actually implementing this for bug 284348, I've realized that I need get_table_ddl, and I don't need ddl_statements. It needs to be the Bugzilla::DB class that runs the check for the tables -- because I don't always have to create *all* the tables; I just need to create some of them, sometimes. So get_table_ddl should become public, and ddl_statements should be removed.
To give you an idea, here's what I'm working on, now: sub bz_setup_database { my ($self) = @_; # Get a list of the existing tables (if any) in the database my $table_sth = $self->table_info(undef, undef, undef, "TABLE"); my @tables = @{$self->selectcol_arrayref($table_sth, { Columns => [3] })}; # go through our %table hash and create missing tables while (my ($table_name) = each %table) { next if grep /^$table_name$/, @tables; print "Creating table $tabname ...\n"; $self->do($self->_bz_schema->get_table_ddl($table_name)) or die "Could not create table '$tabname'. " . "Please check your " . $self->PROGRAM_NAME . " access rights.\n"; }
(In reply to comment #105) > OK, now that I'm actually implementing this for bug 284348, I've realized that I > need get_table_ddl, and I don't need ddl_statements. > > It needs to be the Bugzilla::DB class that runs the check for the tables -- > because I don't always have to create *all* the tables; I just need to create > some of them, sometimes. > > So get_table_ddl should become public, and ddl_statements should be removed. Surely you need both? I'm thinking of the case where the database is created from scratch (fresh installation). Or at least a method to get a list of all tables in the schema. The latter sounds like it could be useful in any case.
Well, actually, when I create all the tables in the database, I do it in the loop above. So it's Bugzilla::DB that needs to do the loop. There's actually never a time when I say "create this database from scratch" -- I just say, "create for me any tables that don't already exist."
Also, I just realized that the patch has a bunch of tabs in it. Use four spaces instead of tabs.
(In reply to comment #108) > Well, actually, when I create all the tables in the database, I do it in the > loop above. So it's Bugzilla::DB that needs to do the loop. > > There's actually never a time when I say "create this database from scratch" -- > I just say, "create for me any tables that don't already exist." OK, but what is %table and where does it come from in the code snippet above? I maintain that the list of tables should come from Bugzilla::DB::Schema, so I think there should be a get_tables() method (or get_table_list() method?) to supplement the requested get_table_ddl() method.
Attached patch v6 (Up-to-date schema, 2005-03-01) (obsolete) (deleted) — Splinter Review
OK, this schema is up-to-date as of today with exactly what checksetup creates. This is the file I'm using to work on bug 284348, where I've tested it and I know that it works. There are a few things that need to be addressed, though, even with this patch: + The table definitions need to be in a consistent order. Why are they ordered like they are now? If there's some reason, put a comment in the file. In any case, put a comment in the file noting how the table definitions should be ordered, so they don't become chaotic. + There are tabs in the file, and there's also strange spacing. All indents should be four characters, except where something else would be clearer. The table definitions should try to have all their "=>" aligned in each table. + I've left a few XXX comments -- they should be addressed. + Index names: Right now, we technically *do* have index names in MySQL. They are named after the first column in the index. I've changed a few of the existing ones to match that, just so that NO changes whatsoever happen to the schema. If we do decide to go this way, then the other ones also need to be changed. When they are created by _get_index_ddl, they can have their name changed to table_name_$name_idx, if we want. I'm not sure if we should do that only for non-MySQL databases, or for all databases. If Bugzilla really does NOT currently depend on index names, then we can change them. However, we should also then put code in checksetup to drop all the current indexes, or to rename them to the proper names, so that in the future we *can* use the index names if we want.
Attachment #174459 - Attachment is obsolete: true
Oh, and yes, I'd love a get_table_list method. :-) That would be great! :-)
(In reply to comment #111) > + The table definitions need to be in a consistent order. Why are they > ordered like they are now? There's no reason for the order as far as I know. It's in the same order as when Andrew Dunstan created the abstract schema in his initial prototype. What order do you prefer? > + There are tabs in the file, and there's also strange spacing. All indents > should be four characters, except where something else would be clearer. I'll get rid of the tabs. > The table definitions should try to have all their "=>" aligned in each table. Fine. > + I've left a few XXX comments -- they should be addressed. The FULLTEXT issue is already addressed in the Mysql Schema module, dude. Nothing needs to be done here. The DISPLAYWIDTH attributes are not used. I think Andrew intended them for the future, but they can be removed. > + Index names: Right now, we technically *do* have index names in MySQL. They > are named after the first column in the index. I've changed a few of the > existing ones to match that, just so that NO changes whatsoever happen to the > schema. If we do decide to go this way, then the other ones also need to be > changed. > When they are created by _get_index_ddl, they can have their name changed > to table_name_$name_idx, if we want. I'm not sure if we should do that only for > non-MySQL databases, or for all databases. > If Bugzilla really does NOT currently depend on index names, then we can > change them. However, we should also then put code in checksetup to drop all > the current indexes, or to rename them to the proper names, so that in the > future we *can* use the index names if we want. I disagree with this change. First off, I am extremely doubtful that Bugzilla depends on the index names. Secondly, the index naming scheme was discussed nearly a year ago and all comments indicate tacit approval of the naming scheme that has been employed. It was implicitly approved by Dave, I believe. If you want a specific index naming scheme for Mysql, then it can addressed in the Mysql module (perhaps in the _create_index_ddl method) or you rename existing indexes in checksetup.
Status: ASSIGNED → NEW
Hrm, I'm not sure what order I prefer. :-) I think that related tables should be grouped together. So all the groups tables should be in one place, and all the flags tables should be in another, and so on. Eventually I want to have POD docs for the schema, for each table. That's why I moved all the checksetup comments into the module, so that we can eventually convert them to POD docs. So the tables should be in some order where if we make the POD docs inline then related tables will be grouped together in the docs. As far as the index naming, that's OK, then. I talked to Dave, and he agrees that we should be using named indexes. So you can change back the things I changed -- my apologies on that, then! :-) I *thought* that we were doing something about the FULLTEXT stuff in the MySQL module. :-) My apologies. :-) I would like to *somehow* note that certain fields *ought* to have fulltext indexes, though, in the primary schema. The "Simple Bug Search" page depends on fulltext indexes, so even though we're not going to implement them in PostgreSQL for Bugzilla 2.20, it would be nice to note that they *ought* to be there *if possible* in every DB.
Status: NEW → ASSIGNED
Oh, by the way, you might want to check out my work so far in bug 284348, where I'm using (and very much enjoying) your module. I created the hooks in Bugzilla::DB, so you don't need to do that for this patch if you don't want. :-)
(In reply to comment #114) > I *thought* that we were doing something about the FULLTEXT stuff in the MySQL > module. :-) My apologies. :-) I would like to *somehow* note that certain fields > *ought* to have fulltext indexes, though, in the primary schema. The "Simple Bug > Search" page depends on fulltext indexes, so even though we're not going to > implement them in PostgreSQL for Bugzilla 2.20, it would be nice to note that > they *ought* to be there *if possible* in every DB. Yep, I concur to that. I would prefer to add some flag or field to the generic schema stating that we want full text index on specific column. DB which support that would use that flag, other DBs can ignore it, but it should be specified in the schema explicitly. Do I understand correctly, that we are currently adding fulltext index to every text field, which has index associated with it? If yes, it's different from what we have in checksetup - see e.g. table components, column name - it's indexed, but not fulltext. Having fulltext on every indexed text column is a bit waste, isn't it?
(In reply to comment #116) > Do I understand correctly, that we are currently adding fulltext index to every > text field, which has index associated with it? If yes, it's different from what > we have in checksetup - see e.g. table components, column name - it's indexed, > but not fulltext. Having fulltext on every indexed text column is a bit waste, > isn't it? Sorry, my mistake, you are not checking varchar, but only tiny/medium/text, so it will work correctly. But anyway, with this approach we are losing the flexibility to have text column as index without full text. I don't know if it makes sense to create indexed text without fulltext index, but I just wanted to point that out :-).
(In reply to comment #117) > But anyway, with this approach we are losing the > flexibility to have text column as index without full text. I don't know if it > makes sense to create indexed text without fulltext index, but I just wanted to > point that out :-). I'm aware of that and that was my belief. I don't think it makes sense to ever create a non-fulltext index on a text field. There are two additional reasons why I am resistant to the idea of an index having an attributes other than UNIQUE: 1. It's not ANSI/ISO-standard SQL, as far as I know. Philosophically, anything non-standard should be implemented in the DB-specific subclasses. 2. The schema data structure would have to be restructured (refer to comment #25), and I'm lazy. I think I can overcome these with sufficient persuasion. Just out of curiosity, what DBMSs besides MySQL support FULLTEXT indexes?
(In reply to comment #118) > I'm aware of that and that was my belief. I don't think it makes sense to ever > create a non-fulltext index on a text field. It does (any index is better than no index), but FULLTEXT indexes are usually more useful. > There are two additional reasons why I am resistant to the idea of an index > having an attributes other than UNIQUE: > 1. It's not ANSI/ISO-standard SQL, as far as I know. Philosophically, > anything non-standard should be implemented in the DB-specific subclasses. Yeah. But we need the functionality in Bugzilla, so I'd like to at least have the option. We should just document in the Schema class that the FULLTEXT indexes may be implemented by subclasses as normal indexes, so Caveat Emptor. > 2. The schema data structure would have to be restructured (refer to > comment #25), and I'm lazy. Hahahaha. :-) You could also create a FULLTEXT section. It's up to you. > I think I can overcome these with sufficient persuasion. /me makes persuasive hand motions. :-) > Just out of curiosity, what DBMSs besides MySQL support FULLTEXT indexes? I know that there's an add-on module for PostgreSQL called tsearch that does it. We may use it in Bugzilla in the future. I think that there are a few other DBMSes that do, but I don't know off the top of my head. I'll bet MS-SQL does in some way or another, but I haven't used it in a while. Does Oracle not have any fulltext support? (Oracle is likely our next implementation after PostgreSQL.)
(In reply to comment #114) > Hrm, I'm not sure what order I prefer. :-) I think that related tables should be > grouped together. So all the groups tables should be in one place, and all the > flags tables should be in another, and so on. I think you'll need to be more specific. I could do alphabetical. I could mimic the order in checksetup.pl. But if you want a specific order, maybe you should e-mail it to me.
OK, I emailed the proposed order. Oh, and I forgot to mention that I removed all the REFERENCES stuff -- let's leave that for bug 109473, because it would actually change the schema at this point, and I'm trying to avoid any functionality changes while we re-arch this.
Attached patch V7 (obsolete) (deleted) — Splinter Review
Changes: 1. Removed tabs. 2. Re-ordered tables. 3. Removed ddl_statements() method. 4. Added get_table_ddl() and get_table_list() methods. 5. Renamed some protected methods. 6. Added some pod, but more pod needed, I'm sure. 7. Reverted index names. 8. Merged in Max's schema changes. Didn't address recent FULLTEXT comments yet. Will probably restructure the schema data structure a la comment #25 in the next version.
Comment on attachment 176022 [details] [diff] [review] V7 This is awesome. The POD is great. :-) And the spacing is way better. This makes me happy. Try to put the POD inline in the file -- that's the new style that we're moving toward. That way the function docs end up above the functions, and the module description ends up at the top of the file. It's nice for people who are reading the code. You can use =cut to end sections. Note somewhere in the module that the entire Schema module is package-private. That is, only Bugzilla::DB should ever interact with the Schema modules -- CGI callers should only interact with Bugzilla::DB. Oh, and instead of sql_type, how about a method that takes the standard Schema format for a field, and returns me some SQL? That is, {TYPE => 'INT1', NOTNULL => 1}. I think that would be the best. Then we could use that format in checksetup for the future, and have another function that translates the raw SQL definitions into that format for backwards-compatibility functions. >diff -rNu Bugzilla.HEAD/DB/Schema/Mysql.pm Bugzilla/DB/Schema/Mysql.pm > [snip] >+ BOOLEAN => 'tinyint', I'm not sure if we've discussed this, but should we do tinyint(1)? I think it probably doesn't matter, actually. >+ INTERVAL => 'timestamp', I think we don't need that type anymore. :-) >+ $self->{create_table_postamble} = ' TYPE = MYISAM'; I think it would be better to override _get_create_table_ddl and append that to the end of the returned data from $self->SUPER::_get_create_table_ddl. I look forward to the new FULLTEXT thing! :-)
(In reply to comment #123) > Oh, and instead of sql_type, how about a method that takes the standard Schema > format for a field, and returns me some SQL? That is, {TYPE => 'INT1', NOTNULL > => 1}. I think that would be the best. Then we could use that format in > checksetup for the future, and have another function that translates the raw > SQL definitions into that format for backwards-compatibility functions. Sounds like _get_field_ddl(), but it's not clear to me what exactly return value(s) you want. > >+ $self->{create_table_postamble} = ' TYPE = MYISAM'; > > I think it would be better to override _get_create_table_ddl and append that > to the end of the returned data from $self->SUPER::_get_create_table_ddl. Perhaps. I'll consider it.
(In reply to comment #124) > Sounds like _get_field_ddl(), but it's not clear to me what exactly return > value(s) you want. Yeah, it does sound like that. (I haven't looked over the existing functions in detail. I want it to preferably return SQL that I can pass to ALTER TABLE as the column definition. Or a function that would generate an ALTER TABLE statement to change a column definition (it would be the backend for bz_change_field_def). That would also be acceptable. But I think the thing to generate the SQL for the column definition would be more immediately useable.
(In reply to comment #125) >> Sounds like _get_field_ddl(), but it's not clear to me what exactly return >> value(s) you want. > Yeah, it does sound like that. (I haven't looked over the existing functions > in detail. I want it to preferably return SQL that I can pass to ALTER TABLE as > the column definition. I'm just dense when it comes to these things... If I'm understanding you correctly, you pass {TYPE => 'INT1', NOTNULL => 1} to some method and you want it to return the string "tinyint not null" for MySQL. Presumably it should grok DEFAULT and PRIMARYKEY as well. OK, no problem. I'll probably call this method get_type_ddl().
OK, a few things that I have discovered: (1) We should use smallint for booleans. DBD::Pg only converts 0 to f and 1 to t if you use placeholders. Bugzilla uses 0 and 1 outside of placeholders ALL OVER. So it's to smallint we go. HOWEVER: From http://www.postgresql.org/docs/7.3/interactive/datatype.html#DATATYPE-INT ------------------------------------------------------------------------------ Note: If you have a column of type smallint or bigint with an index, you may encounter problems getting the system to use that index. For instance, a clause of the form ... WHERE smallint_column = 42 will not use an index, because the system assigns type integer to the constant 42, and PostgreSQL currently cannot use an index when two different data types are involved. A workaround is to single-quote the constant, thus: ... WHERE smallint_column = '42' ------------------------------------------------------------------------------- Which means that all our integer types for PostgreSQL should be "integer," I think.
We could also use bit(1) for the booleans. That might be better, but I'm not certain.
(In reply to comment #127) > HOWEVER: > > From http://www.postgresql.org/docs/7.3/interactive/datatype.html#DATATYPE-INT > ------------------------------------------------------------------------------ > Note: If you have a column of type smallint or bigint with an index, you may > encounter problems getting the system to use that index. For instance, a clause > of the form > > ... WHERE smallint_column = 42 > > will not use an index, because the system assigns type integer to the constant > 42, and PostgreSQL currently cannot use an index when two different data types > are involved. A workaround is to single-quote the constant, thus: > > ... WHERE smallint_column = '42' > ------------------------------------------------------------------------------- > > Which means that all our integer types for PostgreSQL should be "integer," I think. You should use the latest version of the docs. http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-INT shows that issue is gone in 8.x. As for booleans/bits, it's possible that these types will eventually support combining multiple boolean/bit fields into a single word, which would provide far more efficient storage than a smallint would. I don't believe this is currently done, though.
(In reply to comment #129) > You should use the latest version of the docs. > http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-INT > shows that issue is gone in 8.x. We definitely won't be requiring PostgreSQL 8.x. Having PostgreSQL not use indexes when we used a raw int would kill us. For the booleans, I've done some testing, and I think that we should use integer.
Blocks: 284529
Well, another possibility is to add "::text" at the end of the parameter. I'm not sure what caused this issue in pre-8.0, but it's likely it could be fixed by adding a few default casts to the bugzilla database as well. Of course this is extra work; I don't know how important small/big ints are to bugzilla. Did you test boolean v. int or boolean v. bit? BTW, you could also get booleans to cast automatically to ints just by adding a few casts to the database schema. You could also change the default boolean->text casting so it returned 1 or 0.
No longer blocks: 284529
(In reply to comment #129) > You should use the latest version of the docs. > http://www.postgresql.org/docs/8.0/interactive/datatype.html#DATATYPE-INT shows > that issue is gone in 8.x. > > As for booleans/bits, it's possible that these types will eventually support > combining multiple boolean/bit fields into a single word, which would provide > far more efficient storage than a smallint would. I don't believe this is > currently done, though. That's very cool and sexy :-). But we are currently aiming at Pg 7.3, we can't force people to use bleeding-edge if they don't want to :-). I would like to use booleans too, it's almost always better to use correct type :-), but it would currently break too many things and we don't have time to fix them all. Let's get this working with ints first (the same as we are using with MySQL), and then we can work out if we can switch to bools later (and if we gain anything with that ;-) ).
Blocks: 284529
(In reply to comment #132) If you want to avoid 'bleeding edge' (which isn't really available in released versions of PostgreSQL), then you should target 7.4.x. 7.3 is about 2 years old, and 7.4 has substantial performance improvements (among other things).
(In reply to comment #133) > (In reply to comment #132) > > If you want to avoid 'bleeding edge' (which isn't really available in released > versions of PostgreSQL), then you should target 7.4.x. 7.3 is about 2 years old, > and 7.4 has substantial performance improvements (among other things). We know that 7.2 is not working without hacks. We thing that 7.3 should work. So that's the minimal version required. We are not encouraging people to run old versions, but we are trying to run on as much setups as possible. If we hit a problem which would be solved by bumping up the version, we'll consider that, but AFAIK there is no reason to do it yet.
There's a field called 'oneemailperbug' in the attached schema that needs to be called 'onemailperbug'. (One Mail Per Bug, not One Email Per Bug).
Attached patch V8 (obsolete) (deleted) — Splinter Review
Changes: 1. Many corrections to the abstract schema. 2. Integrated POD into the code. 3. Eliminated create_table_postamble cruft. Mysql.pm overrides _get_create_table_ddl() instead to specify the MYISAM storage engine. 4. Eliminated the sql_type() and _get_field_ddl() methods. 5. Added the following public methods: get_type_ddl(), get_column_info(), and get_table_columns(). Max, you asked for get_type_ddl(). The last one is for bug #247560. The second one has no current usefulness, but it just seemed like a good idea. I just noticed that there are similar methods in Bugzilla::DB ... Maybe these latter two new methods should be called get_field_info() and get_table_fields() instead? I'm once again tempted to create a "column" object class... The schema data structure re-structuring will have to wait another version. I wanted to get the schema fixes and API changes out there ASAP.
Attachment #176022 - Attachment is obsolete: true
Yeah, the new API looks good. Don't worry about the similar functions -- the functions in Bugzilla::DB return the current state of the DB on the disk, while the functions in Bugzilla::DB::Schema only return the "ideal" state. And yeah, I can't currently imagine a need for get_column_info. If you have any suggestions about the patch on bug 284529, though, that would be appreciated. Right now DBI's type_info is somewhat unreliable. column_info is also not good for figuring out if a field is a serial/auto-increment field. However, they are the only methods that I have, at the moment, and they work fairly decently other than those two problems.
OK, as far as comment 137 goes, I'll handle that. I have a lot of experience now with DBI's type_info, and I know how to handle this. If you just post a new version with the updated index stuff, it's review+, and it goes in.
Blocks: 284845
Blocks: 284850
Blocks: 285111
Blocks: 285113
OK, based on my work in bug 285113, it would be really nice if the abstract schema was a Class variable using our or use constant (use constant would probably be the best, so that it's definitely read-only), and then was just copied into the $self instance.
Attached patch V9 (deleted) — Splinter Review
The constructor now dies instead of ThrowCodeError(). Max, please clean up the _bz_schema() instantiation in Bugzilla::DB as we discussed. The abstract schema is now a constant, and it has been documented. I threw in Max's SCHEMA_VERSION constant as well. Indexes are defined slightly differently in this version, allowing you to specify an index TYPE for those fans of FULLTEXT indexes. And since unique indexes are simply a type of index as well, the UNIQUEs are gone. The table hash contains only FIELDS and INDEXES. Storable's dclone() routine is used to copy the abstract schema before it is made database-specific. A version of the abstract schema structure is retained as object data as well. All indentation nits should be addressed now, I hope.
Attachment #176112 - Attachment is obsolete: true
Comment on attachment 176707 [details] [diff] [review] V9 Excellent. :-) However, you changed a few fields from NULL to "NOT NULL default 0", apparently: bugs_activity.attach_id components.initialowner components.initialqacontact I can fix that on checkin. Also, after we check-in this patch, we should have another one to make the current MySQL fully compliant with this one -- the only change (besides renaming the indexes, which I've already filed a bug for) is to fix all the tinyint(1) fields to be normally-sized tinyint fields.
Attachment #176707 - Flags: review+
Flags: approval?
Attachment #175992 - Attachment is obsolete: true
Blocks: 285345
Flags: approval? → approval+
(In reply to comment #141) > (From update of attachment 176707 [details] [diff] [review] [edit]) > However, you changed a few fields from NULL to "NOT NULL default 0", > apparently: > > bugs_activity.attach_id > components.initialowner > components.initialqacontact I don't remember changing them. I suspect they've been like that for at least a couple versions.... > I can fix that on checkin. OK, fine with me.
Comment on attachment 176707 [details] [diff] [review] V9 >diff -rNu Bugzilla.HEAD/DB/Schema.pm Bugzilla/DB/Schema.pm >--- Bugzilla.HEAD/DB/Schema.pm Wed Dec 31 19:00:00 1969 >+++ Bugzilla/DB/Schema.pm Tue Mar 8 02:46:03 2005 >+=item C<ABSTRACT_SCHEMA> >+ >+The abstract database schema structure consists is a hash reference >+in which each key is the name of a table in the Bugzilla database. s/constists is/consists of/ >+ target_milestone => {TYPE => 'varchar(20)', >+ NOTNULL => 1, DEFAULT => "'---'"}, I'm worried with this because it's possible that not every database engine is going to use the same quoting style (more a problem if we put something in a default value that needs to be escaped). But don't let that stop this... we can cross that bridge when we come to it I guess.
Thank you so much, Ed. :-) This is a wonderful enhancement to Bugzilla. I addressed the checkin comments. RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v done Checking in Bugzilla/DB/Schema.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v <-- Schema.pm initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v done Checking in Bugzilla/DB/Schema/Mysql.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v <-- Mysql.pm initial revision: 1.1 done RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v done Checking in Bugzilla/DB/Schema/Pg.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v <-- Pg.pm initial revision: 1.1 done
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Summary: Reusable (structured?) database-independent schema. → Reusable, structured, database-independent schema
Blocks: 285380
Blocks: 287483
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: