Open Bug 173130 (bz-sybase) Opened 22 years ago Updated 16 years ago

Allow Bugzilla to work with Sybase

Categories

(Bugzilla :: Bugzilla-General, enhancement, P4)

enhancement

Tracking

()

People

(Reporter: myk, Unassigned)

References

(Depends on 3 open bugs)

Details

Attachments

(3 files)

Bugzilla should support Sybase databases.
See also bug 98304 (PostgreSQL support).
Blocks: bz-zippy
Depends on: 156834
Alias: bz-sybase
Depends on: 180086
Depends on: 153129
Depends on: 182136
Depends on: 174295
Summary: support Sybase → Allow Bugzilla to work with Sybase
Depends on: 190432
See also Bug# 189947 (Oracle support)
Depends on: 196101
Attached patch Search.pm modifications (deleted) — Splinter Review
This may be worth moving to another bug and checking in separately, but we can probably decide that later. Wanted to put this somewhere that people could look at it. This attachment is the changes I had to make to Search.pm to get buglists to work correctly on Sybase. And this code happens to work on MySQL, too :) BBaetz tells me this SQL will suck on Postgres prior to 7.4 though. The primary changes here is that Sybase performs explicit joins (those declared using the word JOIN) first before it performs implicit joins (those with commas). This causes any LEFT JOIN that includes the bugs table to fail, because bugs is listed first, and there's a comma between bugs and the next mentioned table in better than 90% of the queries. By changing all of the comma joins to be INNER JOIN, the joins are performed in order, and the bugs.bug_id column will already exist in the prepared table when the following joins against it happen. Also, ANSI SQL says if we use GROUP BY, that we have to include every column mentioned in the SELECT part in the GROUP BY clause. Sybase won't let you put more than 31 columns in a GROUP BY, and also has temporary table size limitations with GROUP BY that are pretty darn restrictive. So the "GROUP BY bugs.bug_id" has been replaced with "DISTINCT bugs.bug_id" at the beginning of the SELECT part. The join ordering may have other side effects (like if a LEFT JOIN happens before a later INNER JOIN perhaps?) but we haven't run into any problems yet on Sybase at least. All of the columns come back correctly, and the queries seem to product the same results as the old version. I'd love it if anyone could prove whether those possible side effects would really happen or not.
CCing other people that are working on cross-DB support for comment
actually, forget what I said about the order mattering - I don't think it does, since the USING/ON constrains what we're joining against - $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = $::userid "; + $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who = $::userid "; Why did you have to change that ordering arround?
>- $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = $::userid "; >+ $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who = $::userid "; > >Why did you have to change that ordering arround? Because I've had experience in some places where it'll choke if the column references on the left side of the comparison isn't already in the prepared table.
Attached file original Sybase schema conversion (deleted) —
This is the *original* conversion of the Schema to Sybase syntax that we did back in December. There's been a lot of changes to it since then, but getting the current schema coerced back into something I can post publicly would take a while, and I'm mainly posting this for the benefit of someone asking a question about it on the developers mailing list.
Attached file Bugzilla/DBCompat.pm (deleted) —
This attachment is the DB Compatibility module we did. There are a couple usage examples at the bottom, but I suck at POD docs, so the documentation is pretty incomplete. :)
See bug 98304, attachment 130979 [details] for an updated DBCompat.pm with more documentation.
Depends on: bz-dbschema
Depends on: 201030
Depends on: bz-dbinstall
Depends on: 248001
Depends on: 250832
Depends on: 250547
Also see bug 251960 with very similar patch, working on Postgres.
I have compared the first patch (Search.pm) against the attachment 153568 [details] [diff] [review] in bug 251960, and they are different only in: - attachment 153568 [details] [diff] [review] is against a bit newer sources, making it easier to apply - attachment 153568 [details] [diff] [review] is nicer formatted ;-) - attachment 153568 [details] [diff] [review] does not address the GROUP BY vs DISTINCT problem. IMHO the best move would be to go for attachment 153568 [details] [diff] [review] in bug 251960 and to move the GROUP BY vs DISTINCT discussion to other bug, probably bug 174295.
I have just looked at the DISTINCT vs GROUP BY problem from the PostgreSQL point of view and, unfortunatelly, this does not work here. As soon as you do JOIN, you have to have all columns in GROUP BY, except for columns used only in aggregate function call (if I understand correctly ANSI SQL 92, which PostgreSQL implements in this case). So we need GROUP BY for all columns for PostgreSQL. If Sybase can't do it, we need special case either for Sybase or PostgreSQL, probably in DBCompat. Suggestions?
I think we are going to need an SQL object that lets us add selected fields, add selected aggregate fields, add group-by fields that we mean, and then let it use its knowledge of the database backend to decide which of the other fields need to be added to the group-by. Essentially, SQL code generation will need to have an abstraction layer that knows which of the selected fields need to be auto-appended to the group-by condition. Similarly, this will have to understand where it can use aliases and where it needs to look bak to the original definition of the alias and inline it. etc...
Depends on: 253357
Depends on: 253360
Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not xxxx". Does this work on Sybase? I still hope we can come up with a query which just works, without the need for another abstraction object, which would be probably pretty complex and error prone.
I can't find any way how to get rid of the HAVING clause, and with it Postgres refuses to run the query without all columns in the GROUP BY. Solution would be to use subselects (I got it actually working with subselect). What is current opinion on this? Are we going to require MySQL 4.1 for 2.20? Can we start using subselects?
2.20 will still support mysql3 but large sites will be expected to be on mysql4 for performance. SQL that mysql3 does not optimize well will be permitted. 2.22 will probably require mysql4
(In reply to comment #14) > Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT > DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems > starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not > xxxx". Does this work on Sybase? I'm having difficulty following your train of thought. What are you asking about Sybase? If you're asking if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause, then, yes, it is, but, it's not very common. > I still hope we can come up with a query which just works, without the need for > another abstraction object, which would be probably pretty complex and error prone. Me too.
The way to do this sort of thing is with subselects. Depending on exactly what you're trying to do, you may need to use an 'inline view' (ie SELECT * from a, (SELECT * from b ....) AS c WHERE ....), or have an IN or EXISTs subselect. Or possibly a correlated subquery. Exactly which one is better to use depends on exactly what you're trying to do in a particular situation. Which is another reason why DB abstraction layers inherently suck. :)
I'm not suggesting a universal abstraction. However, as we assemble sql statements if we build the parts up using something slightly more insightful than text concatenation, when we get to assembling the whole statement, we can adjust the final assembly of that particular type of statement to suit the various databases. For the example at hand, this may be a matter of pushing select fields into a list and group fields (that we really mean) onto a list and then having a common function that takes the 2 lists and returns the GROUP BY clause. Incidentally, if I'm not mistaken, HAVING is death on query performance, though optimizers have gotten a lot better since I first adopted that dogma.
HAVING isn't always the best thing to do, but its better than doing it yourself in perl. My point is that abstracting SQL away to that level of detail lose flexability. Can't we just drop mysql3 support, and get rid of 90% of this.
Another option would be to have two paths, one doing it "the right way (tm)", i.e. with subselects, and having some fallback for DBs which does not support it yet (MySQL3). Once MySQL4 is widely deployed and we stop supporting it, we can remove this "legacy" stuff. In Search.pm, it would affect just two places which currently use HAVING, search for "flag is not" and search for "percent complete". I know it's close to hack and can complicate maintenance, but it's probably still better than abstraction layer. BTW, MySQL4.0 is AFAIK still not enough for subselects, 4.1 is required. So we still probably need a version of Search.pm without subselects for a while :-(.
(In reply to comment #17) > (In reply to comment #14) > > Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT > > DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems > > starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not > > xxxx". Does this work on Sybase? > > I'm having difficulty following your train of thought. What are you asking about Sybase? If you're asking > if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause, then, yes, it is, but, it's not > very common. I was actually asking whether anyone tried doing search for a bug not having specified flag on Sybase with this change applied, i.e. to excercise the query with HAVING part, and with GROUP BY replaced by DISTINCT. But if Sybase supports HAVING without GROUP BY, it will probably work...
(In reply to comment #15) > I can't find any way how to get rid of the HAVING clause, and with it Postgres > refuses to run the query without all columns in the GROUP BY. I'm sorry... Could you post a complete example of such a query?
(In reply to comment #23) > (In reply to comment #15) > > I can't find any way how to get rid of the HAVING clause, and with it Postgres > > refuses to run the query without all columns in the GROUP BY. > > I'm sorry... Could you post a complete example of such a query? Sure. Here is relevant part of the query Search.pm generate when searching for all bugs which do not have review flag set: SELECT bugs.bug_id, SUM(CASE WHEN (flagtypes_0.name || flags_0.status) IS NOT NULL THEN 1 ELSE 0 END) AS allflags_0, SUM(CASE WHEN (flagtypes_0.name || flags_0.status) != 'review' THEN 1 ELSE 0 END) AS matchingflags_0 FROM bugs LEFT JOIN flags AS flags_0 ON (bugs.bug_id = flags_0.bug_id AND flags_0.is_active = 1) LEFT JOIN flagtypes AS flagtypes_0 ON (flags_0.type_id = flagtypes_0.id) GROUP BY bugs.bug_id HAVING SUM(CASE WHEN (flagtypes_0.name || flags_0.status) IS NOT NULL THEN 1 ELSE 0 END) = SUM(CASE WHEN (flagtypes_0.name || flags_0.status) != 'review' THEN 1 ELSE 0 END); Trouble is with the "GROUP BY bugs.bug_id". If you specify it, postgres wants to have all other columns which appear in the query in the GROUP BY too. Which means some twenty or thirty plus columns for full query. That seems to upset sybase, which have limit on number of GROUP BY parameters. If you leave it out completely, postgres complains that it's not there. It seems that the HAVING is not the only problem, if I remove it and leave just the select, I still have to provide GROUP BY... I have very vague idea why is it so, and no idea how to fix it without having special code for different DBs :-(.
Use a subselect instead. It becomes much clearer, cleaner, and quicker.
(In reply to comment #25) > Use a subselect instead. It becomes much clearer, cleaner, and quicker. If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+) then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {} elsif ($db_driver eq 'EverythingElse') {}.
(In reply to comment #26) > (In reply to comment #25) > > Use a subselect instead. It becomes much clearer, cleaner, and quicker. > > If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+) > then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {} > elsif ($db_driver eq 'EverythingElse') {}. > > I would love to use them, but, unfortunatelly, subselects starts at MySQL 4.1. So we can't use them yet, at least no without the mentioned if{}else{} (which make everything even worse).
For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3 but still works. Anybody who complains about performance can be pushed to MySQL4, but we have not agreed to break compatibility with MySQL3.
Yes, it will need ot be conditional, and also mysql4 needs to be tested for speed - the curent query was very much tuned for it and its one-index-per-table-per-query restriction.
(In reply to comment #28) > For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3 > but still works. Anybody who complains about performance can be pushed to > MySQL4, but we have not agreed to break compatibility with MySQL3. Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19 timeframe.
(In reply to comment #30) > Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19 > timeframe. You realize we're freezing for 2.20 in 2 weeks, right? Might be better to hold off on that for 2.21/2.22. I doubt the freeze period for 2.20 will be anywhere near as long as 2.18 took.
>You realize we're freezing for 2.20 in 2 weeks, right? Might be better to hold >off on that for 2.21/2.22. I doubt the freeze period for 2.20 will be anywhere >near as long as 2.18 took. Sure, makes sense.
Depends on: 280493
Depends on: 283076
Depends on: 285547
Reassigning bugs that I'm not actively working on to the default component owner in order to try to make some sanity out of my personal buglist. This doesn't mean the bug isn't being dealt with, just that I'm not the one doing it. If you are dealing with this bug, please assign it to yourself.
Assignee: justdave → general
QA Contact: mattyt-bugzilla → default-qa
Severity: normal → enhancement
Priority: -- → P4
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: