Closed Bug 78148 Opened 24 years ago Closed 23 years ago

clean up directory listing stream converters

Categories

(Core :: Networking, defect, P2)

x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.5

People

(Reporter: bbaetz, Assigned: bbaetz)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files, 7 obsolete files)

Currently the stream converters take one ascii representation, parse it, convert it into another, and pass it off to something else which parses that, and then spits out the result. I got sick of hacking arround this while getting gopher to output html directory listings. So I'm fixing it. Doug, do we need to support nsFTPDirListingConverter::Convert? Nothing actually uses it AFAICS, and the indexedToHTML stuff doesn't support it either (ditto for gopher - I don't think that the gopher code has even been tested....). Currently, I have an nsIDirIndex interface which contains attributes for the various things we want to support, and an nsIDirIndexListener, which inherits off nsIRequestObserver, and provides the addition method: void onIndexAvailable(in nsIRequest aRequest, in nsISupports aCtxt, in nsIDirIndex aIndex); I haven't finished hooking this all up yet (I still have to connect the new FTPDirListingConverter to nsIndexedToHTML), but: [bbaetz@banana netwerk]$ cvs diff streamconv/converters/nsFTPDirListingConv.cpp | diffstat nsFTPDirListingConv.cpp | 307 +++++++------------------- 1 file changed, 93 insertions, 214 deletions [bbaetz@banana netwerk]$ cvs diff streamconv/converters/nsIndexedToHTML.cpp | diffstat nsIndexedToHTML.cpp | 242 ++++++++---------------------- 1 files change, 72 insertions, 170 deletions (I also NS_LITERAL_STRING'd nsIndexedToHTML while I was at it) Of course, I've added the new interfaces and implementation files, (but those just consists of getters and setters, + the license blurb). So I come out adding 23 more lines than I remove. I haven't pulled all the parsing stuff out of nsDirectoryViewer yet, though, or touched gopher. Or done anything more than verify that it all compiles - I know it won't work as it currently stands. Even if we do decide to scrap the XUL directory viewer, this still: a) Cleans stuff up b) Provides a common interface between ftp and gopher. c) Avoids uneeded parsing/escaping/unescaping (although I'm still escaping - I decided to try and check that it works, and change as little as possible, first) d) May give us a way to easily produce html listings for file:///, although the nsString vs nsCString stuff would have to be cleaned up - currently I've just done what the directory viewer did. The disadvantage of doing this is that it is theoretically now possible (since my directoryViewer changes from a few weeks ago) for a web server to send us application/http-index-format data, and have mozilla display the data as the XUL tree. I haven't tested that though yet. These changes would remove the parser, and so remove that feature - I don't know if anyone cares though. I still have a couple of streamconverter/nsIInputStream questions though. I'll code a bit more to try and get something actually displaying, and see what I discover, or I'll find some necko people on IRC and nag them :)
supporting Convert() allows for SYNC stream converting. Is anyone doing this: no. Will anyone want to? Don't know. Attach some diffs so we can see some of this great sounding work. :-)
Is there some sort of wrapper class which will make an async converter synchronous? I can imagine one, but I don't know if it already exists. I'll attach diffs once I get something displaying - probably later today, after I finish some other stuff.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
I believe one of the test stream converters I provide wraps a sync converter in async callbacks. In real-world situations though this defeats the whole purpose of asynchronous function calls because they just wind up blocking. It's best to provide async parsing. bradley writes: "Currently the stream converters take one ascii representation, parse it, convert it into another, and pass it off to something else which parses that, and then spits out the result. I got sick of hacking arround this while getting gopher to output html directory listings. So I'm fixing it." what's wrong w/ the stream converter model? and what/how were you hacking around whatever you found to be wrong w/ it?
> I believe one of the test stream converters I provide wraps a sync converter > in async callbacks. I meant the other way arround - given an async stream converter, wrap it so that it becomes a sync converter. I know how to do it, I was just wondering if there was something in the tree already. Nothings wrong with the stream converter model. However, it only passes text streams arround, and so we end up parsing and unparsing the same data multiple times. To expand on my summary: What currently happens (for ftp) is that nsFtpDirListingConv takes a text/ftp-dir-<servertype> stream, and parses it, putting it into an indexEntry (a local class defined in nsFtpDirListingConv.cpp). At the end of each line, it takes that structure, and puts out a application/http-index-format entry. Depending on the settings in the prefs, the ftp protocol may have arranged to get a chain of converters, so this data will be passed to nsIndexedToHTML, and html produced, or it will be left as application/http-index-format, where nsDirectoryViewer.cpp will parse it. That parser is seveal hundred lines long, and makes several assumptions about gopher and ftp. It also puts everything into a structure which looks remarkably like the indexEntry structure we started out with. The nsIndexedToHTML parser isn't really a parser for the entire format - it makes assumptions about the layout of the data lines. I could make it understand gopher as well, using a few ifs, and hardcode more stuff in. Thats what I meant by "hacking arround" - not the stream converter model, but the duplication of code, and the needless parsing/unparsing/etc. The onIndexAvailable is there because, AFAIK, there isn't an nsIInputStream which I can attach nsISupport objects to, rather than char*'s. I'm not sure that something like that would really fit the model of the input stream stuff. I'll attach some code later tonight.
Attached patch work-in-progress patch (obsolete) (deleted) — Splinter Review
I've attached a work in progress patch. With that patch gopher and ftp will use the html directory index. I haven't updated the XUL viewer yet - I'll do that tomorrow. The API changes are straight forward, and mainly involve removing lots of code. Any design/API comments?
hmmm, I'm a bit concerned here because http dir listing format is sort of standard internally (HTTP can use it), and I see some benefit to cononicalizing various listing formats into it as it gives us some common ground. have we isolated this extra parsing step for ftp as a performance hit?
Well, mozilla can only use the application/http-index stuff in the XUL viewer, and, until my last lot of cleanups for nsDirectoryViewer.cpp went in a few weeks ago, only ftp and gopher entries could be identified as directories - all other protocols would have just been files. I'm still not sure how generic it is, though, and it probably still needs some changes to handle that. Do you know of any sites which use this format? Despite the name, the current code doesn't actually handle http directory listings, and ns6 only handles ftp. I doubt that its a performance hit, although I haven't measured. The problem is that the parsing code would have had to be reproduced twice - once for the html listing, and once for the XUL one. If you feel that its an issue, then I suppose it could be possible to just move the current parser over, and get it to produce nsIDirIndex's. Another alternative would be to have nsDirectoryViewer.cpp use a different xul file for html listings (and then we only need one output formatter). (BTW, do directory listings give the time in GMT for a reason?)
i don't know about current gmt reasoning. Netscape enterprise http servers can emit http index format if you tell them to. I just want us to be aware that we're lopping off one of our stream converters. part of the stream converter model was to have the pool actually grow so you could convert from type A to Z w/ out knowing whether or not you had layered conversion going on. if you pull it out of mozilla by default, please leave the converter lying around in cvs (and in the build) so people can use it if the need arises.
OK, so I've just discovered nsDirectoryIndexStream, which is Yet Another producer of directory listings. Can I ask what this is used for? nsDirectoryViewer gets the data for files from rdf:file, and mentions (line 1290) that while it may be a good idea to combine them at one point, "rdf:file is less expensive in terms of memory and speed", so it isn't used for that (although it does get the data for the initial url, which is then probably discarded, although that may not be the case anymore. It also assumes that all filenames are ascii, calling GetLeafName, and I thought we supported non-ascii files) Its only used when someone tries to get an input stream for a directory - does anyone apart from the directory viewer do this? If my changes allowed file:/// to be able to be displayed in any format, would the objections to this be less?
Does that feature work with ns6? If thats the case, then I'll leave it in - I'll have the parser convert to nsIDirIndex, instead (and move it into netwerk). This will still have the advantage that the 4 directory input methods (application/http-index-format, file, ftp, and gopher) will be able to use either output format without problems.
after thinking this over some more, and talking this over w/ rpotts, I'm a bit concerned over this goal. again, the whole idea of http_index is that it is a cononical dir listing format that *any* protocol can generate and subsequently we'll be able to convert this connonicial format into another. this adds a nice layer between protocol writers and content delivery. it also allows for new converters, say text/xml, to come in and take *all* dir listings and turn them into some new format. I'd really prefer we maintain this std format.
I agree, but the problem is that as soon as you have more than one output format, you end up with either needing to duplicate the parser, or have a parser which will take an http-index-format input stream, and output into a data structure which the front ends can then use. Or you can skip the intermediate data format, which is what my (unfinished) patches do, which then avoids the need to escape, convert times to a standard representation, etc. The other option may be to always convert to rdf (using the parser now in nsDirectoryViewer.cpp), and then use a xul template to output xhtml (thus getting rid of nsIndexedToHTML.cpp) Thinking about it a bit more, that solution would probably be the cleanest. Would that be acceptable from the embedding POV?
"or have a parser which will take an http-index-format input stream, and output into a data structure which the front ends can then use. " that's what index->html already does. skipping the intermediary format kills the output format flexibility. I'm not speaking from an embedding POV, just a mozilla contributor POV. IMO, we already have the intermediary format, httpindex.
"that's what index->html already does." But the problem is that it doesn't do it properly. It handles a subset of the format which works for ftp, and only ftp - it assumes the ordering of the data lines, so it won't handle gopher, file, data from the web, etc. I could fix this by copying the parser from nsDirectoryViewer.cpp to nsIndexedToHTML, but that strikes me as a waste of effort. Would embedding (which I believe was part of the reason for creating the html interface in the first place) be happy with a XUL template being used to output html? Or is there another way to format rdf?
OK, so I've though about this a bit more, and now agree with valeski. Does anyone have any objections to just killing the XUL/rdf directory viewer, and moving everything to the HTML output? This would lose the ability to do the rdf ftp bookmarks, but that doesn't really bother me. dougt? Any objections? The bookmarks URI is really sucky, and the directory viewer is slow. According to waterson: <waterson> it was a research project gone bad I can then hook the application/http-index-format datatype directly up to that convertor, and remove the pref-testing code from the ftp directory listing convertor. If there's only one output format, then theres no need for all this abstraction. valeski and/or dougt - would a patch to do that meet your approval?
lemmee ask some embedding customer's what they think about losing the XUL listing. the last time I checked, there were some that actually liked it :-).
----------- PLEASE IGNORE ALL COMMENTS ABOVE THIS LINE ------------ OK, so I've redone this, and the only change is the abstracting of the parser. If you have the network.dir.generate_html pref set to true (as it is by default now), then ftp, gopher, and file will all appear as html. The dates are also reported using the current locale, as well, and with the current time (as opposed to GMT). file:// has the wrong dates - nsDirectoryIndexStream is generating dates in 1970. I'm also not sure how well it behaves with non-ASCII file names. Because the intermediate application/http-index-format is still generated, the code isn't any smaller - if you ignore the copyright lines in the new files, its about the same size. All protocols generate application/http-index-format, and then the viewer factory decides whether to generate html or XUL based on the pref. I want to fix the file:/// stuff first, but are there any obbjections to the new patch (which I'll attach now)?
Attached patch new patch (obsolete) (deleted) — Splinter Review
patch seems like a good direction to head in to me.
Blocks: 72724
Blocks: 78474
Blocks: 68651
I'm about to attach a patch which works for me. I don't know what it does on non-ascii file listings. As well, file listings aren't sorted. (well, none of them are, but it only matters for file listings). The dates are now correct as well (why does nsIFile return the time in milliseconds instead of microseconds)? The HTML listing hardcodes english strings - dougt said that he has a bug on that. So I'd like to check this in, possibly disabled for file uris (by just changing the check in the factory). Can I get someone who has access to a machine with non ascii filenames please check this? And possibly fix it? The functionality is the same, except that gopher will now use html instead of the XUL interface. I did replace the "directory"/"file" bits with pictures (the ones from ns4 that are still in the tree). Or at least, I did, if they actually worked. I know pavlov had them working at one stage - I'll go ask him about them. I've also just noticed that I took out the .. listings as well. I'll put them back in when my tree finishes rebuilding. So can I get an r= on this?
Attached patch new patch (obsolete) (deleted) — Splinter Review
the usage of the http-index only converter in FTP seems fine to me, as do the makefile changes (who's going to do the mac build changes?), and factory registration mods. however, I'm going to have to defer on the index/parser changes; dougt maybe?
Attached patch patch against current tree (obsolete) (deleted) — Splinter Review
pushing off - this is too late for 0.9.1
Target Milestone: mozilla0.9.1 → mozilla0.9.2
mass move, v2. qa to me.
QA Contact: tever → benc
Target Milestone: mozilla0.9.2 → mozilla1.0
Blocks: 83881
Blocks: 97358
I'm going to try to get this in for 0.9.5, but it depends on when I get net access at home - it will almost certainly end up as 0.9.6, I think. I've discovered that all the i18n file listing worries I had don't really matter, since the current viewer displays accented chars in ftp listings as ?, while as of last night my tree displays the real char (after hacking nsParser.cpp to look for a meta charset tag in application/http-dir-index as well. Yes, this is ugly)
Assignee: bbaetz → bbaetz
Status: ASSIGNED → NEW
Priority: -- → P2
Target Milestone: mozilla1.0 → mozilla0.9.5
OK, here goes another patch. Its not perfect, but it works. Try it out (in both xul and html mode) and let me know what you think. Ideally there shouldn't be any noticable differences - this is a backend patch, although the html view did change slightly. I also implemented view-source for the xul viewer, by just displaying the html source :) Theres some stuff in nsDirectoryIndexStream.cpp to do collation-based sorting which is #ifdef'd out. See the comments for why. Also see bug 99478, and bug 85836 for why I'm reasonably sure that everything is OK. I'm going to post to npm.i18n to check that I haven't regressed anything from that side of things, (except for what I note above) I'd like to get this in soonish, so can I get some review on the assumption that there are not i18n regressions? I think I have everything; let me know if it doesn't build :) dougt & valeski: I haven't gotten a response to my mail yet, either.
Status: NEW → ASSIGNED
Attached patch patch v0.9 (obsolete) (deleted) — Splinter Review
CC'ing some i18n people.
nsServiceManager::ReleaseService is deprecated. Just use NS_RELEASE + if (gTextToSubURI) { + nsServiceManager::ReleaseService(NS_ITEXTTOSUBURI_CONTRACTID, gTextToSubURI); + gTextToSubURI = nsnull; + } You can optimize this: + *aListener = mListener.get(); + NS_IF_ADDREF(*aListener); by just doing something like NS_IF_ADDREF(*aListener = mListener.get()) You should probably check against null here: + delete[] mFormat; What does this comment mean: + + // So? - bbaetz For whatever reason, I can not apply this diff. Patch/Diff just doesn't like me today. I would like to try it out before giving you a r/sr. Darin, can you superreview?
Blocks: 77969
Blocks: 38014
Blocks: 61235
Blocks: 100691
Comment on attachment 50024 [details] [diff] [review] patch v0.9 some nits: nsDirectoryIndexStream.h: 1- how about using for mArray? nsDirectoryIndexStream.cpp: 2- to bad we don't have a nsVoidArray::InsertSorted method... glib does. 3- bug 56354 has been fixed.. so nsIFile::GetURL should be ok, right? 4- fortunately RemoveElementAt doesn't do much when the element is removed from the end of the array, but if it did, your algorithm in ~nsDirectoryIndexStream might want to change. 5- can the #if 0'd code be removed? nsNetModule.cpp: 6- why not "Directory Index" instead of "nsDirIndex"? nsDirIndex.h: 7- wouldn't nsXPIDL{C}String's be a better choice for storing those strings? aren't ns{C}String's large in size by comparison? nsIndexedToHTML.cpp: 8- please keep a consistent opening brace style for function definitions. streamconv/public/makefile.win 9- check your indentation nsIDirIndex.idl: 10- how about moving the CID definition into nsNetCID.h? nsIDirIndexListener.idl: 11- why include nsIRequestObserver.h? 12- please use a consistent opening brace style. 13- how about including nsIStreamListener.idl at the top of the file. 14- how about moving the CID and ContractID into nsNetCID.h? nsDirectoryViewer.h 15- looks like the member variables could be lined up better. how about a new patch with these things + dougt's comments resolved?
by the way, thank you very much for cleaning up this code... it really has needed some good lovin ;-)
A lot of this code was just moved arround. That said: > 1- how about using for mArray? you're missing a word there, I think. I can't use an ISupportsArray because I can't sort one of those. > 3- bug 56354 has been fixed.. so nsIFile::GetURL should be ok, right? Looks that way. Fixed. > 4- fortunately RemoveElementAt doesn't do much when the element > is removed from the end of the array, but if it did, your algorithm in > ~nsDirectoryIndexStream might want to change. I'm not sure what you mean by this. I remove from the end on purpose, so that we don't constantly have to move every element in the array. > 5 #if 0 code I forgot about that. I was thinking only doing so on unix, or something, to match the default mac + windows behaviour. This was checked in as part of bug 22244. I have trouble understanding how this could have fixed that bug, though. Changed to #ifndef XP_UNIX in my build, which is probably the correct thing to do anyway. 6: Fixed - I was just copying the ones above 7: fixed 8: It was inconsistent to start with, but fixed 9: fixed10: None of the streamconverters do 11: Bogus left over code. Removed 12. fixed 13. fixed 14. See 10. 15. Fixed. New patch coming, this time diffed against cvs-mirror
1- sorry... i meant: should mArray be of type nsAutoVoidArray? 4- i should have been more precise: imagine if RemoveElementAt were to shrink the allocated array size every so often. 10,14- no problem... maybe someday we'll go and make the shift to nsNetCID.h
1 - no, there will usually be > 4 (or 8, can't remember) entries. 4 - Its making it smaller, so I hope that the overhead is minimal. I could iterate through without deleting, and call NS_RELEASE, then just free the array. In fact, I will. New patch will arrive after my tree rebuilds all the license foo from yesterday.
Attached patch new patch (obsolete) (deleted) — Splinter Review
jbetak, please help to test it.
I'm going to attach another patch. Only differences are added REQUIRES lines for uconv. For testing instructions, see news://news.mozilla.org/slrn9qim6d.p41.bbaetz@banana.home
Attached patch updated for cvs conflicts/build system changes (obsolete) (deleted) — Splinter Review
Attachment #32728 - Attachment is obsolete: true
Attachment #32728 - Flags: needs-work+
Attachment #33251 - Attachment is obsolete: true
Attachment #33941 - Attachment is obsolete: true
Attachment #34585 - Attachment is obsolete: true
Attachment #50024 - Attachment is obsolete: true
Attachment #50202 - Attachment is obsolete: true
Attachment #51277 - Attachment description: updated for cvs conflicts/buils system changes → updated for cvs conflicts/build system changes
Comment on attachment 51277 [details] [diff] [review] updated for cvs conflicts/build system changes r=darin
Attachment #51277 - Flags: review+
Attachment #51277 - Attachment is obsolete: true
Comment on attachment 51334 [details] [diff] [review] ...and update for license changes, too! this just resolves cvs conflicts with the license landing - still has r=darin
Attachment #51334 - Flags: review+
Attachment #51334 - Flags: superreview+
I really want to land this before the tree closes on Tuesday, so I'd appreciate the i18n feedback before then. If there are problems with file:/// urls and the html viewer, then I will check this in with xul remaining the default for file until those can be resolved, assuming that the xul viewer still ends up working correctly.
Bradley, I'm using a 094 commercial Netscape build. While looking at an internal ftp site (ftp://kaze/pub), the directory and file names appear either escaped or garbled (treated as Latin-1 and rendered using incorrect fonts) in both HTML and XUL view. I'll doublecheck with i18n QA as this seems to be broken w/o the patch already. I'll apply the patch next, repeat the test and look for any adverse effects.
jbetak: ftp will definately be broken, and theres a bug on that - I need file tested, mainly.
thanks for the clarification - I'll test a local directory next.
I went through the test scenarion w/o the patch and local directory listings work great. However, I was not able to switch from XUL to hmtl via the pref setting - am I missing something? Also, you wouldn't happen to have an non-debug version of necko.dll, would you? It might take me a while to compile an optimized build plus the patch seems to require some manual interaction (half a dozen of hunks fail on recent branch/trunk trees).
You can't change it on the trunk - thats (part of) what this patch does. The latest patch is against a current trunk tree. Note that you have to apply it _without_ the usual posix environment variable, since it creates files. Theres a small possiblity of a makefile conflict with requires stuff, but cvs merged those into my tree. They should just be one liners, and easy to resolve. I only have a linux build environment, unfortunately, otherwise I'd make a test build available.
alecf checked in a change this afternoon which broke this. To fix, just add necko2 to the list of REQUIRES in xpfe/components/build/makefile.win after applying. (and Makefile.in for unix builds)
Bradley, my apologies for all the delays - I finally got an optimized build with your patch running on Japanese W2K. Seems like your patch breaks the HTML view for non-Latin-1 file and directory names. The functionality is still preserved, however file and folder names appear in the wrong encoding, which renders them unreadable. File and directory hierarchies can still be navigated, but the usability is severely impacted - the extent of the problem resembles what I have seen for ftp the other day. Hope this helps a bit - perhaps you could still get the XUL view checked in before the 095 freeze? I and I'm sure the rest of the i18n team would be happy to help to address the remaining issues with the patch, although it might not happen in the M095 time frame.
OK, great. What I'll do is check this in, but disable the html stuff for file urls, and file a new bug. If it ends up being a simple fix, then I'll still try to get it in for 0.9.5. jbetak: one more test, and if it doesn't work you don't need to attach a screenshot. In netwerk/base/src/nsDirectoryIndexStream.cpp arround line 76 with my patch applied is a #define for THREADSAFE_I18N. What happens if you uncomment that line and then test in the html mode? If that works, then I know what the problem is, and can probably work arround the other problem. Also, what happens if you change the charset encoding via the View menu? I am surprised that it worked if it didn't display the names correctly, though, but I won't complain :) Thank you very much for doing this testing for me.
Mass change: Bug 78148 has been checked in, and so it and these dependancies should have been fixed. If you disagree, please reopen. Note that the xul view is still enabled for file because of i18n problems. The bug to fix this is bug 102812.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: