Closed Bug 285113 Opened 20 years ago Closed 20 years ago

Bugzilla::DB::Schema needs a way to serialize and store its abstract schema

Categories

(Bugzilla :: Installation & Upgrading, enhancement, P2)

2.19.2
enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 2.20

People

(Reporter: mkanat, Assigned: mkanat)

References

Details

Attachments

(1 file, 2 obsolete files)

We need a way to serialize the "abstract" schema in Bugzilla::DB::Schema. That is, we need to be able to get at it before _adjust_schema is called (or copy it to some other variable before that happens -- that's probably the best solution), so that I can write that to the database and use it as the authoritative guide to what fields are currently what type, etc. I think that we also ought to keep a record of some "version" number of the schema. I think that we should probably start at 1.00 and go on to 1.01, etc. Either that, or we could use the CVS version number of the file, though I think that's less reliable when we're writing patches.
Oh, I think we also ought to be able to serialize the specific schema (that is, the schema translated into real stuff for this database, after _adjust_schema), because it might come in handy some day. That's not as big of a priority, though.
Depends on: bz-dbschema
Severity: normal → enhancement
Target Milestone: --- → Bugzilla 2.20
OK, I think we should use Storable for this, since that's its purpose -- serializing and deserializing data structures.
Honestly, I think you've jumped the gun on opening this bug. I'm not sure I even agree that there's a problem, let alone with your proposed solution.
Well, feel free to respond on the developers list or on the parent bug, if you'd like. There are definitely various problems that I've run into when actually implementing things.
Summary: Bugzilla::DB::Schema needs a way to serialize its *abstract* schema and its specifc schema → Bugzilla::DB::Schema needs a way to serialize and store its abstract schema
OK, here's a way to serialize and deserialize a Schema, so that we can read off the "current Schema" into Bugzilla::DB. This will make the rest of dbinstall MUCH easier.
Attachment #176686 - Flags: review?(bugzilla)
This is, as you can imagine, a work-in-progress, somewhat. store_abstract_schema() should not be in Bugzilla::DB::Schema, anymore. :-)
Status: NEW → ASSIGNED
Comment on attachment 176686 [details] [diff] [review] Methods to serialize, store, and deserialize a schema So, this is a little less dense than some of my other patches, Tomas, if you want to look at it... :-D
Attachment #176686 - Flags: review?(Tomas.Kopal)
Comment on attachment 176686 [details] [diff] [review] Methods to serialize, store, and deserialize a schema Yeah, I love this stuff :-). I was thinking about including some version number in the database for a long time now, but I never thought about storing the whole schema :-). Great idea. >+=item C<SCHEMA_VERSION> >+ >+The 'version' of the internal schema structure. This version number >+is incremented every time the the fundamental structure of Schema >+internals changes. >+ >+This is NOT changed every time a table or a column is added. This >+number is incremented only if the internal structures of this >+Schema would be incompatible with the internal structures of a >+previous Schema version. >+ >+=begin undocumented >+ >+As a guideline for whether or not to change this version number, >+you should think, "Will my changes make this structure fundamentally >+incompatible with the old structure?" Think about serialization >+of the data structures, because that's what this version number >+is used for. >+ >+You should RARELY need to increment this version number. >+ >+=end undocumented It looks you really tried to explain it, but I am still not clear when I need to update the version number and when not. This needs to be crystal clear to everyone (including me ;-) ), otherwise we are in trouble. Please, try a bit harder :-). Maybe some examples would help? > my $self = shift; >+ my $schema = shift; > > # Define the Bugzilla abstract database schema. >- $self->{schema} = { >+ $schema ||= { > > # BUG-RELATED TABLES > # ------------------ Nit: Indentation. >+sub deserialize_abstract { >+ my ($serialized, $version) = @_; >+ >+ my %thawed_hash = thaw($serialized); >+ >+ # At this point, we have no backwards-compatibility >+ # code to write, so $version is ignored. What do you intend to do here? Do you want to move the DB upgrade code here? If yes, we need BIG delimiters and comments around :-). How do you plan to use the version number? >+=item C<store_abstract_schema()> >+ >+ Description: Stores an abstract database schema in the database >+ for later recovery. >+ Params: $schema (optional) - hashref. The schema that we should >+ store. If omitted, we store the current schema. >+ Returns: nothing >+ >+=cut >+ >+sub store_abstract_schema { >+ my ($self, $schema) = @_; >+ >+ $schema ||= $self->{_abstract_schema}; >+ >+ my $store_me = _serialize_abstract( >+} Something missing here? ;-) >+=begin undocumented >+ >+=item C<_bz_real_schema()> >+ >+ Description: Returns the current Schema for the real database >+ on the disk, as opposed to the "ideal" schema >+ returned by _bz_schema. Note that only the "abstract" >+ parts of _bz_real_schema are guaranteed to be accurate -- >+ the "database-specific" parts may not work properly. I do not like this comment. What other database we have than a real one on the disk :-)? Maybe something like "curently used database we are going to upgrade" or something in that direction? Better yet, why not say directly that this will recover the abstract schema of the database which was used for the current installation? >+=item C<_bz_store_real_schema()> >+ >+ Description: Stores the _bz_real_schema structures in the database >+ for later recovery. Call this function whenever you make >+ a change to the _bz_real_schema. Again, I don't understand who and when should be calling this function. Wouldn't it be better to say something like "Call this function whenewer you do a change to the database schema (both in the actual database and the abstract schema structure)"? I am probably just missing the big picture of how all this fits together. Maybe doing some short top-level description of the overall design would be helpful? (Maybe on devel mailing list for broader audience and review). >+sub _bz_store_real_schema { >+ my ($self) = @_; >+ >+ # We want to store the current object, not one >+ # that we read from the database. So we use the actual hash >+ # member instead of the subroutine call. >+ my $store_me = $self->{_bz_real_schema}->serialize_abstract(); >+ $self->do("UPDATE bz_schema SET version = ?, schema_data = ?", >+ undef, Bugzilla::DB::Schema::SCHEMA_VERSION, $store_me); If the schema does not exists yet, this will not update anything. Better would be DELETE followed by INSERT.
Attachment #176686 - Flags: review?(Tomas.Kopal) → review-
OK, now that I've explained the whole plan in bug 285111 comment 5, I'll explain here how serialization works: Basically, we store the $self->{schema} hash as a binary object into a blob field. Along with it, we store a version number. We store this number because one should ALWAYS store a version number along with anything you serialize -- you'll need it in case you need to deserialize in a strange new world where everything is different. An example of the use of the version number: Today, we store all individual columns like this: column_name => { TYPE => 'TYPE', NOTNULL => 1 } Imagine that someday we decide that NOTNULL is bad, and we want to change it to NULL => 0. But we have a bunch of Bugzilla installations around the world with a serialized schema that has NOTNULL in it! When we deserialize that structure, it just WILL NOT WORK properly inside of our new Schema object. So, immediately after deserializing, we need to go through the hash and change all NOTNULLs to NULLs and so on. We know that we need to do that because we know that version 1.00 used NOTNULL. Having made the change to NULL, we would now be version 1.01. To serialize and deserialize, we use Storable, the mechanism in perl intended for exactly this purpose.
(In reply to comment #8) Yeah, thanks for the feedback on my docs. That's really what I needed, was to write them and then have somebody explain to me how they needed to be clarified. It was difficult for me, with all this serialization magic in my head, to clearly explain to somebody unless they told me what about it they didn't understand. > >+ $self->do("UPDATE bz_schema SET version = ?, schema_data = ?", > >+ undef, Bugzilla::DB::Schema::SCHEMA_VERSION, $store_me); > > If the schema does not exists yet, this will not update anything. Better would > be DELETE followed by INSERT. Yeah, I've thought about that. I was just worried as to what would happen if for some reason we could DELETE but we couldn't INSERT; then we'd lose the whole thing. Too bad that we can't do it inside of a transaction... :-) I suppose that I could go for DELETE/INSERT now, with an XXX comment saying that we should use transactions when we support them.
(In reply to comment #10) > Yeah, I've thought about that. I was just worried as to what would happen if > for some reason we could DELETE but we couldn't INSERT; then we'd lose the whole > thing. Too bad that we can't do it inside of a transaction... :-) > > I suppose that I could go for DELETE/INSERT now, with an XXX comment saying > that we should use transactions when we support them. If you add some unique column (either version number, but then you need to increment it on every update, which is probably not what you want, or you can add something like serial number, last modified date etc.), then you can do it in the opposite order - first insert the new record, and if and only if successful, you delete the old one. Then if the delete fails, you can make the code reading from the table to get always the latest, newest record (ORDER BY + LIMIT 1). That should be bullet proof :-). Which comes to the bigger picture - what if you modify the actual database structure and then the serailisation fails (e.g. can't insert)? Will you somehow "undo" the change? Again, no transactions :-(. I think it's very difficult make this completely safe...
Attachment #176686 - Flags: review?(bugzilla)
Attached patch v2 (obsolete) (deleted) — Splinter Review
OK, how about this one? I've also fixed some of the internal comments in the Schema that got checked-in, because they need it, and this is an ideal time to do it.
Attachment #176686 - Attachment is obsolete: true
Attachment #176827 - Flags: review?(Tomas.Kopal)
Priority: -- → P2
Depends on: 285443
By the way, I thought I'd point out that Bugzilla::DB::Schema apparently no longer needs to "use Bugzilla::Error", so you might want to nix that in the patch you are working on here even though it has nothing to do with serialization, I guess.
Comment on attachment 176827 [details] [diff] [review] v2 >Index: Bugzilla/DB.pm >=================================================================== >RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v >retrieving revision 1.27 >diff -u -r1.27 DB.pm >--- Bugzilla/DB.pm 8 Mar 2005 23:23:30 -0000 1.27 >+++ Bugzilla/DB.pm 9 Mar 2005 04:01:17 -0000 >@@ -521,9 +521,69 @@ > return $self; > } > >+##################################################################### >+# Private Methods >+##################################################################### >+ >+=begin private >+ >+=head1 PRIVATE METHODS >+ >+These methods really are private. Do not override them in subclasses. >+ >+=over 4 Nit: I haven't found any matching =back for this. >+sub _bz_store_real_schema { >+ my ($self) = @_; >+ >+ # We want to store the current object, not one >+ # that we read from the database. So we use the actual hash >+ # member instead of the subroutine call. If the hash >+ # member is not defined, we will (and should) fail. >+ my $store_me = $self->{_bz_real_schema}->serialize_abstract(); >+ $self->do("UPDATE bz_schema SET schema_data = ?, version = ?", >+ undef, $store_me, Bugzilla::DB::Schema::SCHEMA_VERSION); >+} We really need to change this to work on new installations, with this table empty. This is a showstopper for this patch, sorry. >Index: Bugzilla/DB/Schema.pm >=================================================================== >RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v >retrieving revision 1.1 >diff -u -r1.1 Schema.pm >--- Bugzilla/DB/Schema.pm 9 Mar 2005 01:46:29 -0000 1.1 >+++ Bugzilla/DB/Schema.pm 9 Mar 2005 04:01:17 -0000 >@@ -978,15 +1012,25 @@ > define the database-specific implementation of the all > abstract data types), and then call the C<_adjust_schema> > method. >- Parameters: none >+ Parameters: $abstract_schema (optional) - A reference to a hash. If >+ provided, this hash will be used as the internal >+ representation of the abstract schema instead of our >+ default abstract schema. This is intended for internal >+ use only by deserialize_abstract. > Returns: the instance of the Schema class > > =cut > > my $self = shift; >+ my $abstract_schema = shift; Nit: I would prefer syntax "my ($self, $abstract_schema) = @_;" here. >+A field hash reference should must contain the key C<TYPE>. Optional field >+keys include C<PRIMARYKEY>, C<NOTNULL>, and C<DEFAULT>. Should, or must? :-) >+ Description: Takes any format ever returned by any version >+ of serialize_abstract and returns a Schema object >+ representing the serialized data instead >+ of the modern data. Do not talk about modern data here, the function does not have anything to do with modern data. What about "...serialized data as stored for the schema for current installation." or something in that sense? Otherwise, it looks good :-).
Attachment #176827 - Flags: review?(Tomas.Kopal) → review-
Attached patch v3 (deleted) — Splinter Review
Attachment #176827 - Attachment is obsolete: true
Attachment #177112 - Flags: review?(Tomas.Kopal)
Attachment #177112 - Flags: review?(edwardjsabol)
Blocks: 285713
Blocks: 285723
Depends on: 285740
Comment on attachment 177112 [details] [diff] [review] v3 >+ die "Attempted to initialize the schema but there are already " >+ . " $table_size copies of it stored.\nThis should never happen.\n" >+ . " Compare the two rows of the bz_schema table and delete the " >+ . "newer one."; Nit: what if there are three ;-)? "...newer one(s)."? >+=head1 SERIALIZATION/DESERIALIZATION >+ >+=item C<serialize_abstract()> Nit: =over 4 between these two lines (and =back later)? Otherwise, it's perfect :-).
Attachment #177112 - Flags: review?(Tomas.Kopal) → review+
Blocks: 286527
Blocks: 285748
OK, here's what I want to do. I want to check in this code, but I want to effectively disable it by commenting-out the call to _bz_init_schema_storage(). The reason that I want to check it in now is that I'm getting REALLY far into this branch, and I'm starting to worry a lot about bitrot. This code *has* to go in for 2.20 anyhow.
Flags: approval?
Attachment #177112 - Flags: review?(edwardjsabol)
This appears to have a dependency on Storable that we're not checking for in checksetup.pl. We still support Perl 5.6.1, and it doesn't ship with Storable as part of core.
Flags: approval? → approval+
er, that approval was accidental, but this is okay I guess, since that dependency was already there before this patch. Let's get a new bug to make sure checksetup.pl is checking for Storable.
Storable issue filed as bug 286669. Checking in Bugzilla/DB.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm new revision: 1.34; previous revision: 1.33 done Checking in Bugzilla/DB/Schema.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v <-- Schema.pm new revision: 1.9; previous revision: 1.8 done Checking in Bugzilla/DB/Schema/Mysql.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v <-- Mysql.pm new revision: 1.2; previous revision: 1.1 done Checking in Bugzilla/DB/Schema/Pg.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Pg.pm,v <-- Pg.pm new revision: 1.4; previous revision: 1.3 done
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Oh, I forgot to fix the nits. Did it: Checking in Bugzilla/DB.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm new revision: 1.35; previous revision: 1.34 done Checking in Bugzilla/DB/Schema.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v <-- Schema.pm new revision: 1.10; previous revision: 1.9 done
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: