173130 - (bz-sybase) Allow Bugzilla to work with Sybase

This may be worth moving to another bug and checking in separately, but we can probably decide that later. Wanted to put this somewhere that people could look at it. This attachment is the changes I had to make to Search.pm to get buglists to work correctly on Sybase. And this code happens to work on MySQL, too :) BBaetz tells me this SQL will suck on Postgres prior to 7.4 though. The primary changes here is that Sybase performs explicit joins (those declared using the word JOIN) first before it performs implicit joins (those with commas). This causes any LEFT JOIN that includes the bugs table to fail, because bugs is listed first, and there's a comma between bugs and the next mentioned table in better than 90% of the queries. By changing all of the comma joins to be INNER JOIN, the joins are performed in order, and the bugs.bug_id column will already exist in the prepared table when the following joins against it happen. Also, ANSI SQL says if we use GROUP BY, that we have to include every column mentioned in the SELECT part in the GROUP BY clause. Sybase won't let you put more than 31 columns in a GROUP BY, and also has temporary table size limitations with GROUP BY that are pretty darn restrictive. So the "GROUP BY bugs.bug_id" has been replaced with "DISTINCT bugs.bug_id" at the beginning of the SELECT part. The join ordering may have other side effects (like if a LEFT JOIN happens before a later INNER JOIN perhaps?) but we haven't run into any problems yet on Sybase at least. All of the columns come back correctly, and the queries seem to product the same results as the old version. I'd love it if anyone could prove whether those possible side effects would really happen or not.

Dave Miller [:justdave]

Comment 4

•

22 years ago

CCing other people that are working on cross-DB support for comment

Bradley Baetz (:bbaetz)

Comment 5

•

22 years ago

actually, forget what I said about the order mattering - I don't think it does, since the USING/ON constrains what we're joining against - $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = $::userid "; + $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who = $::userid "; Why did you have to change that ordering arround?

Dave Miller [:justdave]

Comment 6

•

22 years ago

>- $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who = $::userid "; >+ $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who = $::userid "; > >Why did you have to change that ordering arround? Because I've had experience in some places where it'll choke if the column references on the left side of the comparison isn't already in the prepared table.

Dave Miller [:justdave]

Comment 7

•

21 years ago

Attached file original Sybase schema conversion (deleted) — Details

This is the *original* conversion of the Schema to Sybase syntax that we did back in December. There's been a lot of changes to it since then, but getting the current schema coerced back into something I can post publicly would take a while, and I'm mainly posting this for the benefit of someone asking a question about it on the developers mailing list.

Dave Miller [:justdave]

Comment 8

•

21 years ago

Attached file Bugzilla/DBCompat.pm (deleted) — Details

This attachment is the DB Compatibility module we did. There are a couple usage examples at the bottom, but I suck at POD docs, so the documentation is pretty incomplete. :)

Jeroen Ruigrok van der Werven

Comment 9

•

21 years ago

See bug 98304, attachment 130979 [details] for an updated DBCompat.pm with more documentation.

Ed Sabol

Updated

•

21 years ago

Depends on: bz-dbschema

Ed Sabol

Updated

•

20 years ago

Depends on: 201030

Joel Peshkin

Updated

•

20 years ago

Depends on: bz-dbinstall

Dave Miller [:justdave]

Updated

•

20 years ago

Depends on: 248001

Tomas Kopal

Updated

•

20 years ago

Depends on: 250832

Tomas Kopal

Updated

•

20 years ago

Depends on: 250547

Tomas Kopal

Comment 10

•

20 years ago

Also see bug 251960 with very similar patch, working on Postgres.

Tomas Kopal

Comment 11

•

20 years ago

I have compared the first patch (Search.pm) against the attachment 153568 [details] [diff] [review] in bug 251960, and they are different only in: - attachment 153568 [details] [diff] [review] is against a bit newer sources, making it easier to apply - attachment 153568 [details] [diff] [review] is nicer formatted ;-) - attachment 153568 [details] [diff] [review] does not address the GROUP BY vs DISTINCT problem. IMHO the best move would be to go for attachment 153568 [details] [diff] [review] in bug 251960 and to move the GROUP BY vs DISTINCT discussion to other bug, probably bug 174295.

Tomas Kopal

Comment 12

•

20 years ago

I have just looked at the DISTINCT vs GROUP BY problem from the PostgreSQL point of view and, unfortunatelly, this does not work here. As soon as you do JOIN, you have to have all columns in GROUP BY, except for columns used only in aggregate function call (if I understand correctly ANSI SQL 92, which PostgreSQL implements in this case). So we need GROUP BY for all columns for PostgreSQL. If Sybase can't do it, we need special case either for Sybase or PostgreSQL, probably in DBCompat. Suggestions?

Joel Peshkin

Comment 13

•

20 years ago

I think we are going to need an SQL object that lets us add selected fields, add selected aggregate fields, add group-by fields that we mean, and then let it use its knowledge of the database backend to decide which of the other fields need to be added to the group-by. Essentially, SQL code generation will need to have an abstraction layer that knows which of the selected fields need to be auto-appended to the group-by condition. Similarly, this will have to understand where it can use aliases and where it needs to look bak to the original definition of the alias and inline it. etc...

Tomas Kopal

Updated

•

20 years ago

Depends on: 253357

Tomas Kopal

Updated

•

20 years ago

Depends on: 253360

Tomas Kopal

Comment 14

•

20 years ago

Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not xxxx". Does this work on Sybase? I still hope we can come up with a query which just works, without the need for another abstraction object, which would be probably pretty complex and error prone.

Tomas Kopal

Comment 15

•

20 years ago

I can't find any way how to get rid of the HAVING clause, and with it Postgres refuses to run the query without all columns in the GROUP BY. Solution would be to use subselects (I got it actually working with subselect). What is current opinion on this? Are we going to require MySQL 4.1 for 2.20? Can we start using subselects?

Joel Peshkin

Comment 16

•

20 years ago

2.20 will still support mysql3 but large sites will be expected to be on mysql4 for performance. SQL that mysql3 does not optimize well will be permitted. 2.22 will probably require mysql4

Ed Sabol

Comment 17

•

20 years ago

(In reply to comment #14) > Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT > DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems > starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not > xxxx". Does this work on Sybase? I'm having difficulty following your train of thought. What are you asking about Sybase? If you're asking if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause, then, yes, it is, but, it's not very common. > I still hope we can come up with a query which just works, without the need for > another abstraction object, which would be probably pretty complex and error prone. Me too.

Bradley Baetz (:bbaetz)

Comment 18

•

20 years ago

The way to do this sort of thing is with subselects. Depending on exactly what you're trying to do, you may need to use an 'inline view' (ie SELECT * from a, (SELECT * from b ....) AS c WHERE ....), or have an IN or EXISTs subselect. Or possibly a correlated subquery. Exactly which one is better to use depends on exactly what you're trying to do in a particular situation. Which is another reason why DB abstraction layers inherently suck. :)

Joel Peshkin

Comment 19

•

20 years ago

I'm not suggesting a universal abstraction. However, as we assemble sql statements if we build the parts up using something slightly more insightful than text concatenation, when we get to assembling the whole statement, we can adjust the final assembly of that particular type of statement to suit the various databases. For the example at hand, this may be a matter of pushing select fields into a list and group fields (that we really mean) onto a list and then having a common function that takes the 2 lists and returns the GROUP BY clause. Incidentally, if I'm not mistaken, HAVING is death on query performance, though optimizers have gotten a lot better since I first adopted that dogma.

Bradley Baetz (:bbaetz)

Comment 20

•

20 years ago

HAVING isn't always the best thing to do, but its better than doing it yourself in perl. My point is that abstracting SQL away to that level of detail lose flexability. Can't we just drop mysql3 support, and get rid of 90% of this.

Tomas Kopal

Comment 21

•

20 years ago

Another option would be to have two paths, one doing it "the right way (tm)", i.e. with subselects, and having some fallback for DBs which does not support it yet (MySQL3). Once MySQL4 is widely deployed and we stop supporting it, we can remove this "legacy" stuff. In Search.pm, it would affect just two places which currently use HAVING, search for "flag is not" and search for "percent complete". I know it's close to hack and can complicate maintenance, but it's probably still better than abstraction layer. BTW, MySQL4.0 is AFAIK still not enough for subselects, 4.1 is required. So we still probably need a version of Search.pm without subselects for a while :-(.

Tomas Kopal

Comment 22

•

20 years ago

(In reply to comment #17) > (In reply to comment #14) > > Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT > > DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems > > starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not > > xxxx". Does this work on Sybase? > > I'm having difficulty following your train of thought. What are you asking about Sybase? If you're asking > if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause, then, yes, it is, but, it's not > very common. I was actually asking whether anyone tried doing search for a bug not having specified flag on Sybase with this change applied, i.e. to excercise the query with HAVING part, and with GROUP BY replaced by DISTINCT. But if Sybase supports HAVING without GROUP BY, it will probably work...

Ed Sabol

Comment 23

•

20 years ago

(In reply to comment #15) > I can't find any way how to get rid of the HAVING clause, and with it Postgres > refuses to run the query without all columns in the GROUP BY. I'm sorry... Could you post a complete example of such a query?

Tomas Kopal

Comment 24

•

20 years ago

(In reply to comment #23) > (In reply to comment #15) > > I can't find any way how to get rid of the HAVING clause, and with it Postgres > > refuses to run the query without all columns in the GROUP BY. > > I'm sorry... Could you post a complete example of such a query? Sure. Here is relevant part of the query Search.pm generate when searching for all bugs which do not have review flag set: SELECT bugs.bug_id, SUM(CASE WHEN (flagtypes_0.name || flags_0.status) IS NOT NULL THEN 1 ELSE 0 END) AS allflags_0, SUM(CASE WHEN (flagtypes_0.name || flags_0.status) != 'review' THEN 1 ELSE 0 END) AS matchingflags_0 FROM bugs LEFT JOIN flags AS flags_0 ON (bugs.bug_id = flags_0.bug_id AND flags_0.is_active = 1) LEFT JOIN flagtypes AS flagtypes_0 ON (flags_0.type_id = flagtypes_0.id) GROUP BY bugs.bug_id HAVING SUM(CASE WHEN (flagtypes_0.name || flags_0.status) IS NOT NULL THEN 1 ELSE 0 END) = SUM(CASE WHEN (flagtypes_0.name || flags_0.status) != 'review' THEN 1 ELSE 0 END); Trouble is with the "GROUP BY bugs.bug_id". If you specify it, postgres wants to have all other columns which appear in the query in the GROUP BY too. Which means some twenty or thirty plus columns for full query. That seems to upset sybase, which have limit on number of GROUP BY parameters. If you leave it out completely, postgres complains that it's not there. It seems that the HAVING is not the only problem, if I remove it and leave just the select, I still have to provide GROUP BY... I have very vague idea why is it so, and no idea how to fix it without having special code for different DBs :-(.

Bradley Baetz (:bbaetz)

Comment 25

•

20 years ago

Use a subselect instead. It becomes much clearer, cleaner, and quicker.

David Lawrence [:dkl]

Comment 26

•

20 years ago

(In reply to comment #25) > Use a subselect instead. It becomes much clearer, cleaner, and quicker. If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+) then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {} elsif ($db_driver eq 'EverythingElse') {}.

Tomas Kopal

Comment 27

•

20 years ago

(In reply to comment #26) > (In reply to comment #25) > > Use a subselect instead. It becomes much clearer, cleaner, and quicker. > > If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+) > then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {} > elsif ($db_driver eq 'EverythingElse') {}. > > I would love to use them, but, unfortunatelly, subselects starts at MySQL 4.1. So we can't use them yet, at least no without the mentioned if{}else{} (which make everything even worse).

Joel Peshkin

Comment 28

•

20 years ago

For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3 but still works. Anybody who complains about performance can be pushed to MySQL4, but we have not agreed to break compatibility with MySQL3.

Bradley Baetz (:bbaetz)

Comment 29

•

20 years ago

Yes, it will need ot be conditional, and also mysql4 needs to be tested for speed - the curent query was very much tuned for it and its one-index-per-table-per-query restriction.

Myk Melez [:myk] [@mykmelez]

Reporter

Comment 30

•

20 years ago

(In reply to comment #28) > For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3 > but still works. Anybody who complains about performance can be pushed to > MySQL4, but we have not agreed to break compatibility with MySQL3. Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19 timeframe.

Dave Miller [:justdave]

Comment 31

•

20 years ago

(In reply to comment #30) > Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19 > timeframe. You realize we're freezing for 2.20 in 2 weeks, right? Might be better to hold off on that for 2.21/2.22. I doubt the freeze period for 2.20 will be anywhere near as long as 2.18 took.

Myk Melez [:myk] [@mykmelez]

Reporter

Comment 32

•

20 years ago

>You realize we're freezing for 2.20 in 2 weeks, right? Might be better to hold >off on that for 2.21/2.22. I doubt the freeze period for 2.20 will be anywhere >near as long as 2.18 took. Sure, makes sense.

Shane H. W. Travis

Updated

•

20 years ago

Depends on: 280493

Tomas Kopal

Updated

•

20 years ago

Depends on: 283076

Max Kanat-Alexander

Updated

•

20 years ago

Depends on: 285547

Dave Miller [:justdave]

Comment 33

•

20 years ago

Reassigning bugs that I'm not actively working on to the default component owner in order to try to make some sanity out of my personal buglist. This doesn't mean the bug isn't being dealt with, just that I'm not the one doing it. If you are dealing with this bug, please assign it to yourself.

Assignee: justdave → general

QA Contact: mattyt-bugzilla → default-qa

Vlad Dascalu

Updated

•

19 years ago

Severity: normal → enhancement

Max Kanat-Alexander

Updated

•

18 years ago

Priority: -- → P4

Search.pm modifications 22 years ago Dave Miller [:justdave] (deleted), patch		Details \| Diff \| Splinter Review
original Sybase schema conversion 21 years ago Dave Miller [:justdave] (deleted), text/plain		Details
Bugzilla/DBCompat.pm 21 years ago Dave Miller [:justdave] (deleted), text/plain		Details