Closed Bug 281582 Opened 20 years ago Closed 20 years ago

Replace CONCAT and MATCH with Bugzilla::DB function call

Categories

(Bugzilla :: Bugzilla-General, enhancement, P1)

2.19.2
enhancement

Tracking

()

RESOLVED FIXED
Bugzilla 2.20

People

(Reporter: mkanat, Assigned: Tomas.Kopal)

References

()

Details

Attachments

(1 file, 7 obsolete files)

There is one place we use CONCAT in Bugzilla, which is not ANSI SQL, and not supported by PostgreSQL. From the MySQL manual: MySQL Server doesn't support the standard SQL || operator for string concatenation; use CONCAT() instead. Because CONCAT() takes any number of arguments, it's easy to convert use of the || operator to MySQL Server.
By the way, it's this line in Bugzilla::Search: $ff = "CONCAT($flagtypes.name, $flags.status)";
Note that you must explicitly cast both arguments to varchar for the PostgreSQL version -- PostgreSQL cannot CONCAT a varchar and a char, whereas MySQL can.
Let's also throw MATCH into this patch, since they both only exist in one place. For now, we'll just use a LIKE operator for MATCH, I think.
Summary: Replace CONCAT with Bugzilla::DB function call → Replace CONCAT and MATCH with Bugzilla::DB function call
Priority: -- → P1
Target Milestone: --- → Bugzilla 2.20
Attached patch V1 (obsolete) (deleted) — Splinter Review
Here we go...
Attachment #176118 - Flags: review?
(In reply to comment #2) > Note that you must explicitly cast both arguments to varchar for the PostgreSQL > version -- PostgreSQL cannot CONCAT a varchar and a char, whereas MySQL can. Hmmm, weird. It "works for me"(TM) :-). varchar(50) and char(1), Postgres 7.4.7. Can you test it again, please? And I forgot to mention: the patch is well tested on MySQL, but I have done just very basic testing on Postgres.
Status: NEW → ASSIGNED
I'm using PostgreSQL 7.3, so that's probably the difference.
(In reply to comment #6) > I'm using PostgreSQL 7.3, so that's probably the difference. Ok, so... Should we add the type cast, or should we bump the version up to 7.4? Adding the type cast is not that simple, IMHO. If we cast it in the Pg module to varchar, the function is no longer generic - what if you pass in e.g. text? And what limit will we put on the varchar? As it is used at exactly one place ATM, it does not matter so much, but we can hit a problem if someone use it somewhere else. Maybe casting both arguments to text would help?
Yeah, both arguments should be casted to the same thing, probably text. As you'll see in the PostgreSQL manual, the CONCAT operation always returns a "text" type. So if we cast both of the arguments to text, there's no problem.
(In reply to comment #8) > Yeah, both arguments should be casted to the same thing, probably text. > > As you'll see in the PostgreSQL manual, the CONCAT operation always returns a > "text" type. So if we cast both of the arguments to text, there's no problem. Ok, sounds good, I will fix the patch.
Attachment #176118 - Flags: review?
Attached patch V1.1 (obsolete) (deleted) — Splinter Review
The only change from the previous version is that Pg specific implementation was added. It's similar to ANSI version, it just casts both parameters to text prior to joining them.
Attachment #176118 - Attachment is obsolete: true
Attachment #176269 - Flags: review?
Comment on attachment 176269 [details] [diff] [review] V1.1 >+our @_abstract_methods = qw(new REQUIRED_VERSION PROGRAM_NAME I'd rather the constants were on their own line. That's why I put them on their own line. I plan to add more later. >+sub sql_string_concat { >+ my ($self, @params) = @_; >+ >+ return join(' || ', @params); >+} >+ Are there no existing sql_ methods in this thing? I forget if this is where this ought to go -- see my patch on bug 284352; I'm working on adding a little bit of organization to Bugzilla::DB. >- push(@supptables, "INNER JOIN longdescs $table ON bugs.bug_id " . >- "= $table.bug_id"); >+ push(@supptables, "INNER JOIN longdescs AS $table " . >+ "ON bugs.bug_id = $table.bug_id"); It looks like this change has nothing to do with this patch...? >- my $term1 = "MATCH($table.thetext) AGAINST(".&::SqlQuote($v).")"; >- my $term2 = "MATCH(bugs.short_desc) AGAINST(".&::SqlQuote($v).")"; >+ my $term1 = $dbh->sql_fulltext_search("$table.thetext", >+ ::SqlQuote($v)); Nit: Put ${table}.thetext so it's clearer that "thetext" isn't part of the variable. :-) >+ # Generate the condition by running the operator-specific I think you mean "database-specific"? >+ $ff = $dbh->sql_string_concat("$flagtypes.name", "$flags.status"); Nit: Same thing there, about the {}. >+ # This is a quick hack to get at least some searching with postgres >+ # working. We need to revisit this and implement proper full text >+ # search (probably employing the tsearch2 contrib module). >+ return "$column ~* $text"; That should be LIKE "%$text%" -- it's much faster, usually. Or maybe ILIKE, if the match is supposed to be case-insensitive. >+ return 'CAST(' . join(' AS text) || CAST(', @params) . ' AS text)'; Can you not use ::text for some reason, instead of CAST? Also, you didn't join @params, so that will just give us something like ARRAY. We probably need to foreach over @params there, I'd think. -Max
Attachment #176269 - Flags: review? → review-
Attached patch V1.2 (obsolete) (deleted) — Splinter Review
Attachment #176269 - Attachment is obsolete: true
Attachment #176331 - Flags: review?(mkanat)
Comment on attachment 176331 [details] [diff] [review] V1.2 The regex search thing really does need to be changed to LIKE for PostgreSQL.
Attachment #176331 - Flags: review?(mkanat) → review-
You are too fast, Max, let me at least comment on your points ;-). (In reply to comment #11) > (From update of attachment 176269 [details] [diff] [review] [edit]) > >+our @_abstract_methods = qw(new REQUIRED_VERSION PROGRAM_NAME > > I'd rather the constants were on their own line. That's why I put them on > their own line. I plan to add more later. Agreed, sorry, I overlooked the new. I am used to have constants first and methods next :-), and I would prefer to keep them in the same order in the list as in the modules, so that's the reason I changed this. > > >+sub sql_string_concat { > >+ my ($self, @params) = @_; > >+ > >+ return join(' || ', @params); > >+} > >+ > > Are there no existing sql_ methods in this thing? I forget if this is where > this ought to go -- see my patch on bug 284352; I'm working on adding a little > bit of organization to Bugzilla::DB. Currently, all sql_ methods are abstract, these are the first. My intention was to have sql_ first, and bz_ next, but I didn't quite suceed :-). Fixed. It's hard to keep them in order when there are several patches modifying the same file :-). > > >- push(@supptables, "INNER JOIN longdescs $table ON bugs.bug_id " . > >- "= $table.bug_id"); > >+ push(@supptables, "INNER JOIN longdescs AS $table " . > >+ "ON bugs.bug_id = $table.bug_id"); > > It looks like this change has nothing to do with this patch...? Well, yes, you are right. It's just few lines above the change I was doing and I think this is more readable. I will undo it if you insist :-). > > >- my $term1 = "MATCH($table.thetext) AGAINST(".&::SqlQuote($v).")"; > >- my $term2 = "MATCH(bugs.short_desc) AGAINST(".&::SqlQuote($v).")"; > >+ my $term1 = $dbh->sql_fulltext_search("$table.thetext", > >+ ::SqlQuote($v)); > > Nit: Put ${table}.thetext so it's clearer that "thetext" isn't part of the > variable. :-) This was just copy of the old code, but fixed anyway :-). > > >+ # Generate the condition by running the operator-specific > > I think you mean "database-specific"? I don't know what the original author meant, I was just re-wrapping the lines to make them reasonably short :-). Again, will undo if you want. > > >+ $ff = $dbh->sql_string_concat("$flagtypes.name", "$flags.status"); > > Nit: Same thing there, about the {}. Done. > > >+ # This is a quick hack to get at least some searching with postgres > >+ # working. We need to revisit this and implement proper full text > >+ # search (probably employing the tsearch2 contrib module). > >+ return "$column ~* $text"; > > That should be LIKE "%$text%" -- it's much faster, usually. Or maybe ILIKE, > if the match is supposed to be case-insensitive. Well, that's not so simple. The $text is already quoted, so you will need to use string concatenation to add the % marks. I think regexp is simpler, and we need to put proper full text there as soon as posible anyway. But I will change it if you insist hard enough :-). > > >+ return 'CAST(' . join(' AS text) || CAST(', @params) . ' AS text)'; > > Can you not use ::text for some reason, instead of CAST? > Yes, I can, but according to Postgres doco, CAST is the standart and prefered way of diong it, ::text is there just for historical reasons. I prefer not to use obsoleted stuff :-). At the end it will produce the same result and I am sure there is no performance difference :-). > Also, you didn't join @params, so that will just give us something like > ARRAY. We probably need to foreach over @params there, I'd think. I did, look again. It's tested and it works :-).
OK. Yes, please change it from a regex to an ILIKE, even if it takes string contatenation.
Attached patch V1.3 (obsolete) (deleted) — Splinter Review
Ok, the Pg full text search now splits the phrase we are looking for up to separate words and searches for all of them using ILIKE. It's probably the closest we can get to full blown full text search without actually using one :-). Do not expect this to be fast, though.
Attachment #176331 - Attachment is obsolete: true
Attachment #176337 - Flags: review?(mkanat)
Comment on attachment 176337 [details] [diff] [review] V1.3 $text is already sql-quoted. :-)
Attachment #176337 - Flags: review?(mkanat) → review-
Attached patch V2 (obsolete) (deleted) — Splinter Review
Another iteration. The simulated full text search is now fully ANSI SQL to provide fall back for any DB not supporting full text search natively, and as such has been moved to DB.pm. Problem with quoting fixed.
Attachment #176337 - Attachment is obsolete: true
Attachment #176342 - Flags: review?
Comment on attachment 176342 [details] [diff] [review] V2 >+ # convert all words to lowercase >+ my @lower_words = map(lc, @words); I think that has to be \&lc, doesn't it? Otherwise, this looks fine. :-)
Attachment #176342 - Flags: review? → review+
(In reply to comment #19) > (From update of attachment 176342 [details] [diff] [review] [edit]) > >+ # convert all words to lowercase > >+ my @lower_words = map(lc, @words); > > I think that has to be \&lc, doesn't it? > > Otherwise, this looks fine. :-) > No, it's ok as it is. See http://perldoc.perldrunks.org/functions/map.html for examples. But as you suggested, it will be simpler to do lc on the whole text before the split. I will change that...
Attached patch V2.1 (obsolete) (deleted) — Splinter Review
Fixed the lower case conversion on the text, no other changes.
Attachment #176342 - Attachment is obsolete: true
Attachment #176352 - Flags: review?(mkanat)
Comment on attachment 176352 [details] [diff] [review] V2.1 Looks great! :-)
Attachment #176352 - Flags: review?(mkanat) → review+
Flags: approval?
Bugger, just realized another thing: this generic implementation of full text search using LIKE will do the search, but the relevance computation which is done later is broken. Search.pm is using the feature of MySQL which returns relevance when you actually select the MATCH function (opposed to using it in the WHERE clause). But if you do select on "text LIKE expr", you will get boolean. The best solution I can think of right now is to enclose the generic implementation in CASE statement returning 1 on match and 0 on no match. That way we'll get relevance 0 or 1 - not perfect, but better than nothing.
Can we just return relevance 0 or 1 just for the generic function, but keep everything else the same for MySQL?
(In reply to comment #24) > Can we just return relevance 0 or 1 just for the generic function, but keep > everything else the same for MySQL? Yes, MySQL is not affected by this at all. What I am just not sure is on Postgres (and other DBs), what will happen when the integer result gets in the WHERE clause - will something like "SELECT * FROM bugs WHERE (CASE WHEN id > 100 THEN 1 ELSE 0)" work? Does WHERE accept integer and treat is boolean, or will it break? I will have to test it tonight.
Attached patch V2.2 (obsolete) (deleted) — Splinter Review
Ok, here is a version I believe should work correctly - always. I modified the generic full text search function to return 1 or 0 as a result (MySQL is returning floating point relevance, so the meaning is similar), and the part in Search.pm which is going to WHERE clause is changed to result in boolean (as Posgtres requires). I believe that even the relevance as computed over all comments will make some sense now :-). No other changes were done (except for un-bitrotting).
Attachment #176352 - Attachment is obsolete: true
Attachment #176558 - Flags: review?(mkanat)
Comment on attachment 176558 [details] [diff] [review] V2.2 OK, this looks good to me. If it causes any real trouble on PostgreSQL, we can work on that when we have a running PostgreSQL Bugzilla. :-)
Attachment #176558 - Flags: review?(mkanat) → review+
Attached patch V2.3 (deleted) — Splinter Review
Ooops, typo in the postgres full text search. But now it's actually tested on both postgres and MySQL :-).
Attachment #176558 - Attachment is obsolete: true
Attachment #176706 - Flags: review?(mkanat)
Attachment #176706 - Flags: review?(mkanat) → review+
Flags: approval? → approval+
Checking in Bugzilla/DB.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm new revision: 1.27; previous revision: 1.26 done Checking in Bugzilla/Search.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm new revision: 1.88; previous revision: 1.87 done Checking in Bugzilla/DB/Mysql.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v <-- Mysql.pm new revision: 1.4; previous revision: 1.3 done Checking in Bugzilla/DB/Pg.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Pg.pm,v <-- Pg.pm new revision: 1.3; previous revision: 1.2 done
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: