Closed Bug 211769 Opened 21 years ago Closed 20 years ago

Create a database abstraction layer

Categories

(Bugzilla :: Bugzilla-General, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 237862

People

(Reporter: anthony, Assigned: mkanat)

Details

I've been giving some thought to this, because I'm developing a new web-database application (in which, of course, I'm wildly stealing ideas from Bugzilla :-). The summary says it all. We currently have people wanting to run Bugzilla on PostgreSQL (bug 98304), Sybase (bug 173130), Oracle (bug 189947), and of course MySQL. So here's a bunch of extra-crazy ideas which raise some issues, but which I think are worth considering because they could be worked into something good. Essentially the idea is that instead of SendSQL("INSERT INTO mytab (cola, colb, colc) VALUES ($vala, $valb, $valc)"); you'd write db_insert('mytable', ['cola', 'colb', 'colc'], [vala, valb, valc]); Now of course this is the basic idea, and of course it is really much harder than that. But I think that it might be not as hard as it initially seems. OK, db_insert needs to know the data types (so that it quotes the values, or converts them if they are dates). You can give it a third list, with the data types. But this might not be needed; db_insert might already know the data types. How? By having learnt about the database structure at the beginning of execution. This is explained in bug 146679, so I'll only give a coding example here. Here's the definition of the users table from checksetup.pl: $table{profiles} = 'userid mediumint not null auto_increment primary key, login_name varchar(255) not null, cryptpassword varchar(34), realname varchar(255), disabledtext mediumtext not null, mybugslink tinyint not null default 1, emailflags mediumtext, refreshed_when datetime not null, unique(login_name)'; Here, by contrast, is the definition of the users table in my application, sorry it's Python: tables = [ ... db.DbTable("users", "Registered users", (("id", "integer", NOTNULL, PK, NOFK, SEQ), ("login", "varchar(16)", NOTNULL), ("password", "varchar(28)", NOTNULL), ("full_name", "varchar(64)", WITHNULL), ("user_type", "smallint", NOTNULL, NOPK, "user_types(id)"), ("remarks", "text", WITHNULL))), ... ] We need not explain all details; you get the idea: instead of writing SQL, we make a table describing the table, and our abstraction mechanism will then have to convert that into SQL. The tables definition goes out of checksetup.pl into a module which is imported by checksetup.pl and by the rest of Bugzilla. Result: Bugzilla is always aware of the database structure. So far so good. We can insert rows. We can create tables. We can update the database on version upgrades more elegantly: foreach my $table (@tables) { if ($table->exists) { $table->check_columns; } else { $table->create; } } But what about selects? Selecting is the hardest part. Let's give it a try: db_select(list_of_expressions, list_of_inner_joined_tables, description_of_outer_joins, # this mustn't be too hard conditions # (esp when compared to this one :-) ); The conditions is what must be converted to a where clause. A where clause is simple expressions ANDed and ORed together. The conditions could be specified as a tree, the leaves of which would be the expressions and the way of connecting them would show whether they should be ANDed or ORed. Alternatively, since in all SQLs I guess that the syntax of the ANDs, ORs, and brackets is identical, we could manually join the expressions together in a string. The tree is probably more complex, but it is a generalization of the @specialchart used in Bugzilla::Search, and might lead to better query construction code. OK, but how do I specify an expression (either for the SELECT list of expressions or for the conditions)? I'd guess that defining another data structure might really start becoming overkill, so expressions could be constructed as strings, the way proposed by David Miller with DBCompat.pm (bug 173130). (We can forget about subselects for the time being, since we don't use them.) This was ideas on the interface. What about the implementation? This can be done either as in DBCompat.pm, which is simple but has support for all RDBMS's in the same file in the form of "if ... elsif ... elsif ...", or it could be done using ideas such as DBIx::AnyDBD, which keeps RDBMS interfaces separate but adds yet another abstraction layer. Oh, and I just found out I forgot about GROUP BY and ORDER BY :-), but if we can do the rest, we can do these as well. See also bug 104682, bug 131136, bug 174295, bug 204217, bug 98446, and the DBIx::Abstract module.
4 months later... The disillusionment :-) What has happened during the last 4 months is that I've been laying the foundation to the Python web application I was talking about above. I mostly did database design. I also read Fabian Pascal's book ("Practical Issues in Database Management: A Reference for the Thinking Practitioner"). First, describing the schema in a hash or other Perl or Python data structure results, as I found out, in code that is much more difficult for your eye to parse than are SQL DDL statements. Rather than doing this, it may be better to write a database creation script using CREATE TABLE statements, and write a Perl script to parse the SQL and create the data structures. Second, it is obvious that the whole idea is, to a certain extent, absurd: there exists, already, an abstraction layer, namely SQL, but sadly each RDBMS has a different SQL, which makes us want to build another abstraction layer on top of what should be the abstraction layer. Third, if you care about enforcing integrity at the database level, you have to write not only the primary key, unique, foreign key, and check constraints; you have to write quite a few triggers. In addition, if you use object-oriented concepts in the database design, you are likely to have supertypes and subtypes, and you'll need views. These are things that you either can't do in our abstraction layer, or that are extremely hard to do. I can't see any way to abstract database functions; you'd have to write them for each database. Fourth, RDBMS's differ not only in the interface, but in the functionality as well. If an application is to be portable across a number of RDBMS's, it has to use the lowest common denominator of these RDBMS's, and if your database is simple enough to use that, you might well use BerkeleyDB or something, to make it neat and easy to install and administer. If you really need an RDBMS, it's probably more reasonable to choose one that fits your needs and stick to that. I did considerable work in order to write a partly functioning abstraction layer in Python, but I abandoned it a week ago as I came to realize all the above. The bottom line is that if Bugzilla's database is simple enough to be made portable, it might be easier and more reasonable to use BerkeleyDB and write the extra code needed to do all required queries, than to create an abstraction layer, which would require, I estimate, several weeks of full time work, the result of which would be of doubtful theoretical and practical value. So, in the short term, I propose that we implement RDBMS support with less extreme solutions, such as bug 131136. Suggest WONTFIX.
Assignee: justdave → mkanat
QA Contact: mattyt-bugzilla → default-qa
Yeah, I considered doing something like this, too. In the end, what we ended up doing was the current Bugzilla::DB structure, along with Bugzilla::DB::Schema. I think it works pretty well. *** This bug has been marked as a duplicate of 237862 ***
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.