126782 - [FIX]Binary file with unknown type displayed as text/plain rather than saved

Reporter

Description

•

23 years ago

Going to only this particulair URL, mozilla will load the file in the browser window, hence never making it to my hard disk w/ no work around. I pasted the URL into IE finally to download the file. All the other files on this page came as expected.

gavin long

Comment 1

•

23 years ago

Win98SE, 2002022203, this file loads into the window for me too. Reporter: You can then save the file with File->Save page as... Observing the output from wget, there doesn't seem to be any indication in the server headers of what this file actually is (i.e. no mime type). Given that the extension ".w02" is hardly well-known, and given that making assumptions based on the extension is A Bad Thing, what _should_ moz do with it?

Severity: major → normal

shwag

Reporter

Comment 2

•

23 years ago

I thought the File-->Save As might mess up the contents of the file, regarding text to binary conversion. If you view the directory on the site that the file is stored in, you will see that there are files named .W01 .W02 .W03, which are compressed files. All of the other files load properly. There is even many other .W02 files which downloads fine! It is just that one link that loads wrong. Weird, huh?

gavin long

Comment 3

•

23 years ago

> I thought the File-->Save As might mess up the contents of the file, > regarding text to binary conversion. I've done it many times with several formats (notably .asf, .wmv, both binary formats) without a hitch. However, you're correct. Other files with the same extension elsewhere on the site immediately pop up a "save file" dialog, but this one loads straight to the browser window. The page info dialog shows that moz thinks the file is text/plain. So, a question for the developers: how does moz decide what to do with these files? And how come it's doing different things with similar files from the same site?

Assignee: bbaetz → law

Status: UNCONFIRMED → NEW

Component: Networking: FTP → File Handling

Ever confirmed: true

QA Contact: benc → sairuh

Summary: MIME bug maybe ? → Binary file with unknown type displayed as text/plain rather than saved

Bradley Baetz (:bbaetz)

Comment 4

•

23 years ago

ftp doesn't have content type, so we guess. bz, is the mime service getting this wrong here?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

23 years ago

This is sort of funny, actually... The mime service says nothing about this file (since it has no useful extension), so it gets passed on to the unknown content decoder. The way the unknown content decoder tells text/plain apart from application/octet-stream is by looking for null bytes. The first null byte in this file is the 1168th byte. The unknown content decoder only looks at the first 1024 bytes of the file (since 99% of the time that's enough to determine what needs to be determined). In fact we're considering decreasing that 1024 to something like 512 or 256 so it won't be so eager to decide things are HTML... Over to rpotts... I'm not sure what a good solution is here, exactly. No matter what we do, unless we sniff the entire file there is no way to tell whether it's text or binary data (one can always come up with a more pathological case). Maybe we should special-case FTP somehow or something?

Assignee: law → rpotts

Component: File Handling → Networking

gavin long

Comment 6

•

23 years ago

> The unknown content decoder only looks at the first 1024 bytes of the file > (since 99% of the time that's enough to determine what needs to be determined) I'm going to be nitpicky here, and say that unless I've completely forgotten what I was taught about statistics, looking at the first 1024 bytes is only going to work about 98% of the time, or 49 times out of every 50. Which isn't actually very certain. Cutting down to the first 256 bytes will cut that to 63%, less than 2 times out of three. Not good. I'm assuming a totally random distribution of the individual bytes in the range 0-255 for the purposes of this calculation. This isn't always going to be the case of course, and if a file format has a bias AGAINST null characters, things are going to get worse.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 7

•

23 years ago

suggest a better approach, given that we have to make the decision before we have all the data and the decision is irreversible...

gavin long

Comment 8

•

23 years ago

I ain't got one. I agree, there isn't much that can realistically be done in these circumstances, since we're essentially blindfolded in a dark room and someone's stolen our torch batteries.

rpotts (gone)

Comment 9

•

23 years ago

I think that the bottom like is that ALL the unknown decoder can do is *guess*!! By the time we get to the unknown decoder, we've exhausted ALL other (more accurate) options for determining the content-type of the data... So, all we have left is a collection of heuristics that we use to 'guess' the content-type... Sometimes we guess wrong :-( Is there some way that we can modify these heuristics to guess better? Currently, i believe that our least reliable heuristic is that for detecting 'text/plain'... Initially, I chose to *only* key off of embedded NULLs because various character set encoding use the 8th bit... Maybe, this isn't an issue?? Since we have NO character encoding information available, we can't deal with these characters very well anyways (all we can use is the 'default' encoding)... So, maybe we should modify the code to disallow *anything* in the 8th bit... The argument for doing so is that it would limit false positive 'text/plain' hits. It may very well, reject streams that 'could' be rendered as text/plain using the default character encoding... I guess the question is which is more desirable: 1. occationally rendering binary data in a window... or 2. occationally bringing up the 'Save As' dialog box for text files... Once we decide which is the desired behavior, we can fine tune our heuristics... -- rick

gavin long

Comment 10

•

23 years ago

I _may_ be able to help come up with something better, but I'm gonna need to clarify a few things first: 1) what groups are we categorising files into? From the comments above, we're after at least text/html, text/plain and [everything else] - any more? If it's a fairly short list, then we can see about knocking up a list of conditions for each of them. 2) presumably we have to worry about every language/alphabet under the sun, which is where the 8-bit stuff comes from. I can only claim to know anything about languages that use the latin alphabet, so to be really thorough we'll need some input from the i18n guys. > I guess the question is which is more desirable: > 1. occasionally rendering binary data in a window, or > 2. occasionally bringing up the 'Save As' dialog box for text files... Personally, I'd prefer 2. But then I'm not a typical user. Can we get anyone to go out into the world and mercilessly interrogate a couple of thousand typical users? :) Seriously, users without any technical knowledge are just going to run away screaming when they see "garbage" in the browser window, and many slightly-technically-savvy users "know" that opening a binary file in a text viewer, then saving it, is a quick way to break the binary file, and won't bother trying. In many cases, they're right. Moz is unusual here. At least if we offer to save to disc, the user can save it with a .txt extension (or whatever) and open it in their favourite text editor. Random thought: at the point where this code gets invoked, _we_ have *absolutely no idea* what the incoming file is. How likely is it that the user is as clueless as we are? Assuming that there's a short(ish) list of file categories to worry about, a few ideas: - text/plain. What about whitespace? How many text files are going to have no whitespace (space/tab/cr/lf) characters _at all_ in the first 256 bytes, let alone the first 1024? If it's got less than about one whitespace character per 60 bytes in the first 1024 bytes, it almost certainly isn't plain text (probably not HTML, either). That'll stand for just about every latin-alphabet language, I think. If it isn't a human language (e.g. base 64 encoded, or whatever), then the user is probably going to want to save it anyway, since mozilla's not going to be able to do much useful with it. Course, if it's an ASCII-art kitchen sink, we're in trouble :-D. - text/html. It's gonna have tags in it, surely? Can't we go looking for "<html", "<head", "<body", or even <...>...<...>...<...> patterns? On the other hand, how often does this code actually get handed an HTML file? To get here, it's got to be coming in without any content headers (which I believe means it's probably not coming via HTTP[S]?), and it's not got any kind of recognised HTML file name extension. It'd be really nice if we could get some kind of data on what files actually hit this code. Not likely, I know, but it would be really nice. > Initially, I chose to *only* key off of embedded NULLs because various > character set encoding use the 8th bit... Why only nulls? What about the other control characters, ascii 01-31? OK, there will be CR, LF, and TAB floating around, but what about some of the others? 05 (enquiry), 06 (Acknowledge), 07 (bell), and several others I don't even know the purpose of, are going to frightfully rare in text files, aren't they? OK, enough wibbling from me.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

23 years ago

> 1) what groups are we categorising files into? At the moment we detect: application/pdf, application/postscript, text/html, all the image types Mozilla supports, text/plain, application/octet-stream > Can't we go looking for "<html", "<head", "<body", or even > <...>...<...>...<...> patterns? We do. http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp#333 > On the other hand, how often does this code actually get handed an HTML file? A lot. 90% of the ad servers out there don't send any content-type. More to the point, every single ebay URL goes through this code (ebay seems to feel it's above sending content-type headers). I think I agree that I'd rather err on the side of letting the user save than on the side of showing in browser. Especially if we ever get a "view as text" option hooked up for the helper app dialog. :)

sairuh (rarely reading bugmail)

Updated

•

23 years ago

QA Contact: sairuh → benc

gavin long

Comment 12

•

23 years ago

>> 1) what groups are we categorising files into? > > At the moment we detect: application/pdf, application/postscript, text/html, > all the image types Mozilla supports, text/plain, application/octet-stream OK. Most of those have headers that are being explicitly sniffed, which makes life easier. From personal experience, I'd say it's probably worth adding .asf (http://www.microsoft.com/windows/windowsmedia/WM7/format/asfspec11300e.asp) and .wmv (which has the same internal format as .asf, according to http://support.microsoft.com/default.aspx?scid=kb;EN-US;q284094). Yes, they're MS-proprietary, but they're out there in substantial numbers, and they're the formats that give me the most grief. The spec linked on the above page appears to be in office 2000 format, so I can't read it, but I'd be surprised if there wasn't a sniffable header in there. I know there's a limit to how many types we can be reasonably be expected to sniff, but presumably PDF/PS get in because they're common on the net? What about other things that are common? Can we get data[1] on what file types are out there? [1] data that's more meaningful than me going "i wanna .asf and a .wmv and a .exe and a .zip and a .tar and a ....." >> Can't we go looking for "<html", "<head", "<body", or even >> <...>...<...>...<...> patterns? > > We do. [snip] And a few more I hadn't thought of. Jolly good. Looking at the code, it's basically: 1. [PDF or Postscript headers] -> appropriate types 2. [local file] -> go to step 4 for security reasons 3. [html tags?] -> HTML 4. [known image headers] -> appropriate types 5. [No nulls in it?] -> plain text 6. [everything else] -> octet-stream Apart from quibbles about other explicitly sniffable types, I've little to add beyond the possible improvements to plain text sniffing I listed above. > I think I agree that I'd rather err on the side of letting the user save > than on the side of showing in browser. Any chance we can ping some usability gurus on this? > Especially if we ever get a "view as text" option hooked up for the helper app > dialog. :) Yeah, that would help. The more I think about it, the more I think that if a file makes it down to step 5, the user probably has a better idea of what it is[2] than we do, so the best solution might be to just ask them. [2] not least because we're completely clueless at this point.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

23 years ago

mpt, what do you think about comment #9?

gavin long

Comment 14

•

23 years ago

*** Bug 129918 has been marked as a duplicate of this bug. ***

gavin long

Comment 15

•

23 years ago

From comment 4, above: > The mime service says nothing about this file (since it has no useful > extension) Hang on a minute. Does this mean that if the file has a recognised file extension, moz should figure out whether the file can be displayed or not? So the unknown decoder only kicks in if the extension isn't recognised? If so, it looks like .asf and .wmv aren't on that list. Adding them to that list would best be filed as a different bug, since this one is rapidly heading in the direction of "what we should do with files in the unknown content decoder", which is a different issue from preventing them hitting the decoder in the first place. If someone can confirm the above, give me a shout and I'll spin off a separate bug for that. [sorry, brain go slow, should have spotted this earlier]

Boris Zbarsky [:bzbarsky]

Assignee

Comment 16

•

23 years ago

> So the unknown decoder only kicks in if the extension isn't recognised? Correct. If nothing else ever uses those extensions then we can just add them to our "extensions we know" list at http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#124

gavin long

Comment 17

•

23 years ago

>> Adding [.asf, .wmv] to [the lsit of known extensions] would best be filed as >> a different bug. > > If nothing else ever uses those extensions then we can just add them to our > "extensions we know" list [...] Well, www.wotsit.org doesn't know any other uses of .asf. And it's not even heard of .wmv or .wma (.wmv's audio cousin). Dunno if that's a good sign or a bad sign :-/ Anyhow, logged as bug 129982.

rpotts (gone)

Comment 18

•

23 years ago

Ok... so it sounds like tightening up our text/plain detection is desirable. Let me summarize what i'm hearing... 1. In addition to NUL, check for other 'low ascii' control characters to reject text/plain. 2. Add a whitespace heuristic... Some amount of <SP> and/or <TAB> should be present (ideally one or more per line :-) ) any other suggestions to sniff out text/plain ?? I suppose we could add explicit detection of base64 encoding to limit the number of text/plain misses because of this encoding too.. -- rick

gavin long

Comment 19

•

23 years ago

That's my best shot for now. The ASF/WMV thing should be covered by bug 129982. The only other thing is the suggestion to switch from: if (known binary) [octet stream] else [plaintext] to: if (known plaintext) [plaintext] else [octet stream] So that the "unknown" cases get saved to disc rather than loaded into the browser window. That's my preferred behaviour and Boris's, too, I believe. But, of course, Boris and I aren't typical users, so PDT and MPT might have different ideas.

Frederic Bezies

Comment 20

•

23 years ago

According to comment #11 and others, it would be great to add those extension to mozilla : .ace -> Ace archives files (http://www.winace.com/) .rar -> Rar archives files (http://www.rarsoft.com/) Is this possible ? All archives we can download are not only .zip :-)

shwag

Reporter

Comment 21

•

23 years ago

Files that are .ISO always end up in my window. I don't know if already discussed solutions will fix this too.

gavin long

Comment 22

•

23 years ago

Frederic, shwag, those issues are probably best covered by logging separate bugs for those extensions (similar to my bug 129982 for windows media), since this bug is covering what happens once moz decides it's got no idea what it's dealing with.

Frederic Bezies

Updated

•

23 years ago

Blocks: 138000

Andrew Hagen

Comment 23

•

23 years ago

Would the following work as a fix for this bug? First, add several known binary file extensions, including ISO, bz2, and others to the mime service. Second, set the unknown content decoder to look at 0.05% of any file it gets for null characters. For a one million byte file, it would look at 50,000 bytes.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 24

•

23 years ago

Removing bogus dependency that was added by a non-driver.

No longer blocks: 138000

Boris Zbarsky [:bzbarsky]

Assignee

Comment 25

•

23 years ago

In response to comment #23 -- yes, that could be doable.... Rick, what do you think? We probably want to do PR_MAX(512, something*datasize) (othewise for a small file we'd only look at a few chars... I'm assuming you meant 5%, not 0.05%, since 0.05% of 10^6 is 500, not 50000.... I think 5% is a little big. That would be on the order of 500000 bytes (that would need to be allocated in memory!) for downloading Mozilla, and would be on the order of 20-30 megabytes (that would need to be allocated in memory) for ISO images.... But the general approach could certainly be tried; I'd like to see whether that approach has any more success with the various file types listed in this bug. Perhaps something more like: PR_MIN(PR_MAX(512, something*datasize), 20000) would be a thought? That way ridiculously huge files are capped....

Bradley Baetz (:bbaetz)

Comment 26

•

23 years ago

That wouldonly be valid for ftp, or the unknown content type. We have to trust the server, if it lies, its a server issue, and not our problem.

Andrew Hagen

Comment 27

•

23 years ago

That sounds good. Once implemented, we could fine tune it, if necessary.

rpotts (gone)

Comment 28

•

23 years ago

having a variable length buffer based on the content-length (that is clamped as boris suggests) sounds fine to me. However, this is exactly the opposite of what bug #119942 is all about :-) It suggests that a *smaller* buffer be used ;-) lets decide on a strategy... and mark bug #199942 as either a dup of this bug... or invalid... -- rick

gavin long

Comment 29

•

23 years ago

Buffer size: Firstly, let's keep things sane for those on slow connections. In europe, most people are still on dial-up. If they're downloading things from a slow server, on the other side of the world, even 1024 bytes can take a few seconds. A 20,000 byte buffer could mean clicking the "save this link" option, then waiting *15-20 seconds* for the filename dialog to come up. Even from a fast server, with a fast modem, they're gonna be waiting 4-5 seconds with no sign that their click did anything. That's too long. It'll confuse users, and make them think moz is glacially slow at downloading. Ideally, we could use something like the "getting file information" intermediate dialog IE6/Win has, but that's probably gonna be loads of work, and best covered by another bug. Conversely, the buffer's got to be big enough so that, statistically, it is going to correctly figure out binary/text _most_ of the time by whatever method is being used. Obviously, 100% would be good, but that ain't gonna happen. The present method is good for about 98% with a 1024 byte buffer, but a 256 byte buffer will cut that to under 70%, which is terrible. If we improve the detection method, as discussed above, we can probably get better detection, with a smaller buffer than is currently being used, especially if we can catch some of the common culprits via other methods (e.g. windows media, bug 129982 ) So, summary of what I think needs doing: 1. Improve plain text detection heuristics as discussed here. 2. consider adding other sniffable headers to those checked 3. amend default to [save] rather than [display] (i.e. if we can't figure it out, treat it as binary, not as text) 4. reconsider buffer size given improved heuristics.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 30

•

23 years ago

> A 20,000 byte buffer could mean clicking the "save this link" option This code is never called for that option. The _only_ time this code is called is when you actually load a url (click on a link, type in URL bar, submit form, etc). Any "save link", "mail link", etc. options do not use it.

gavin long

Comment 31

•

23 years ago

>> A 20,000 byte buffer could mean clicking the "save this link" option > > This code is never called for that option. Doh! Of course, at that point, they're ASKING to save it, aren't they? So much for that objection. [mental note to self: WAKE UP!] If the heuristics are improved, however, would we relly _need_ a bigger buffer? If we remove a couple of the worst-offending filetypes by checking for headers and/or extensions, add a whitespace check, and add a check for half-a-dozen different ascii 0-31 characters, we could get our accuracy better than 99.99%, all with a 1024 character buffer. We could probably even get better than 99.7% with only the 256-character buffer proposed in bug 119942 - which is to say, a quarter of the error rate of the current system with a 1024-byte buffer. I think that extending the null check to cover other characters may be the best single improvement, if we can do so. Even expending it to check for 2/3 characters, rather than just the one, out of the 8-bit ascii range, will make a huge difference to our accuracy. *all my statistics are assuming random distribution of characters, yada, yada.

Frederic Bezies

Comment 32

•

23 years ago

I don't know if it is related, but every file with unknown extension (from groups.yahoo.com) are saved like .exe files (in 2002043010 nightly trunk build). Strange ?!

Boris Zbarsky [:bzbarsky]

Assignee

Comment 33

•

23 years ago

Totally unrelated bug (bug 120327)

Andy Lyttle

Comment 34

•

23 years ago

Of ASCII 0-31, which characters are valid in text/plain files? 9 = \t (tab) 10 = \n (linefeed aka newline) 12 = \f (formfeed, is this actually used in text files?) 13 = \r (carriage return) Did I miss any? A file should only be considered text if there are no characters in the 0-31 range other than these. IIRC 127 isn't printable either, so should also identify a binary file. So, we should check for 0-8,11,14-31,127 (adjust as needed) and only if none of those characters are present, AND there are spaces or \t or \n or \r scattered appropriately, then it's text, otherwise it's binary. Right? Re: Comment #18, rpotts: are you saying base64-encoded files *should* be displayed as text? Why? Seems to me that displaying them as text is useless; I can't read base64, but if I save the file I can extract it with StuffIt Expander or whatever.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 35

•

23 years ago

> 12 = \f (formfeed, is this actually used in text files?) It sure is. Newsgroup posts, for example. You forgot 11 -- Vertical Tab (\v) Comment 18 meant that we will currently detect base64 as plaintext (since it's 7-bit-clean printable ascii). We should therefore attempt to detect it as non-text/plain, for best results. :) Any idea what the magic numbers that identify a base64-encoded file are?

Cameron Simpson

Comment 36

•

23 years ago

Let me add a voice for user control. Specificly: Let the use specify a preferred handler for an unknown type. (BTW, _is_ there a MIME type for "unknown"?) Once loaded (or loading), let the user hand the URL to a specific handler. For exmple, bring the URL in as app/octet-stream (my conservative preference) and in the Save dialogue, off a "recast to type and handler" option. Also, can the .ext -> mime/type mapping be exposed and manually extensible? To be used only in the guess-this-type code of course, since the server's MIME claims should be respected. I'd also like to second the vote for Gavin Long's comment #19, to change: if (known binary) [octet stream] else [plaintext] to: if (known plaintext) [plaintext] else [octet stream] It seems much safer and saner to me.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 37

•

23 years ago

> BTW, _is_ there a MIME type for "unknown" application/octet-stream is it. The definition is "unknown data of some sort". The rest of what you suggest is already covered in 3 or 4 different RFEs. The extension to type mapping is extensible through helper app preferences already.

Andrew Hagen

Comment 38

•

23 years ago

*** Bug 119942 has been marked as a duplicate of this bug. ***

Andrew Hagen

Comment 39

•

23 years ago

Proposed relnote: Mozilla will sometimes not detect that an opened file is binary, and will attempt to display it as a web page. To download such a file, right-click on the link and select "Save Link Target As."

Keywords: relnote

Fabian Lau

Comment 40

•

23 years ago

Can't you also take the filesize into account? I mean, if a file is larger than 1 or 2 MB, I'm pretty sure users want to save that file (or open it with another application) rather than read it in the browser window. And I doubt there are that many large textfiles around...

Boris Zbarsky [:bzbarsky]

Assignee

Comment 41

•

23 years ago

We could, but large logfiles or message archives are actually very common.. Easily multi-megabyte.

Markus Hübner

Updated

•

23 years ago

Keywords: mozilla1.0

Andrew Hagen

Updated

•

22 years ago

Keywords: mozilla1.1

Peter Lubczynski

Comment 42

•

22 years ago

Plugins also have this problem on Win32, for example: http://slip.mcom.com/shrir/edittext4.swf Should we not be looking at the extensions? Nominating nsbeta1.

Keywords: nsbeta1

benc

Comment 43

•

22 years ago

-> ftp (may end up in File Handling) peter: In FTP, yes. For the example you give, what does that extension map to?

Component: Networking → Networking: FTP

Peter Lubczynski

Comment 44

•

22 years ago

My testcase works in FTP mode. It does not work in HTTP. That extension in only mapped to a mime type in plugin code. Calling |nsIPluginHost::IsPluginEnabledForExtension| will check for a mapping.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 45

•

22 years ago

For HTTP, if the server tells us it's text/plain then we should not be looking at extension.

benc

Comment 46

•

22 years ago

okay ->file handling, if I'm reading this correctly.

Component: Networking: FTP → File Handling

QA Contact: benc → sairuh

Matthias Versen [:Matti]

Comment 47

•

22 years ago

*** Bug 152203 has been marked as a duplicate of this bug. ***

Andrew Schultz

Comment 48

•

22 years ago

*** Bug 156020 has been marked as a duplicate of this bug. ***

Andrew Schultz

Comment 49

•

22 years ago

this occurs on Linux as well OS=>All

OS: Windows XP → All

Spider

Comment 50

•

22 years ago

according to bug #156020 this is true on Mac (OS X and 9 ) as well. ( http ) Now, referring to that bug as well, this happens on the .gz format as well, and that format is well recognized with a .gz file ending, And has a very applyable header. Though this seems to barf quite hard with things like this as well: spider@Darkmere spider $ wget http://www.mitzpettel.com/download/IcyJuice0.9d2.dmg.gz --18:02:37-- http://www.mitzpettel.com/download/IcyJuice0.9d2.dmg.gz => `IcyJuice0.9d2.dmg.gz' Resolving www.mitzpettel.com... done. Connecting to www.mitzpettel.com[161.58.237.23]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 606,600 [text/plain] The text/plain would suggest a malconfigured (unconfigured?) http server, but how come it gets attached as text/plain with mozilla? why do we trust the server in this case?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 51

•

22 years ago

We trust the server because that's what the HTTP specification says we MUST do. Let's keep this bug focused on the issue at hand, please...

Simon Fraser [no longer active]

Comment 52

•

22 years ago

I see this in Chimera too, so it hits embedding apps as well. Yet another testcase: <http://ftp.mozilla.org/pub/chimera/nightly/2002-07-22-05/Chimera.dmg.gz>

Hardware: PC → All

Matthias Versen [:Matti]

Comment 53

•

22 years ago

Simon: Mozilla/chimera use the Http protocol for this URl and the server sends : text/plain....

sairuh (rarely reading bugmail)

Updated

•

22 years ago

Blocks: 150046

Steve Dagley

Updated

•

22 years ago

No longer blocks: 150046

sairuh (rarely reading bugmail)

Updated

•

22 years ago

Blocks: 127253

jlarsen

Comment 54

•

22 years ago

This bug hasn't been touched in months? Its marked mozilla1.0? Anyone care to make a patch for assuming save as and providig view as text in the save as options?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 55

•

22 years ago

> make a patch for assuming save as What does that have to do with this bug? > providig view as text This part is a large piece of work... (trust me, I've tried two or three times). Is there a comment after comment 18 that actually has a useful suggestion other than the banter about buffer sizes?

Daniel Hyde

Comment 56

•

22 years ago

>> I think I agree that I'd rather err on the side of letting the user save >> than on the side of showing in browser. >Any chance we can ping some usability gurus on this? I wouldn't claim to be a usability guru, but I'm certainly a user. Why not simply add an option to force-save the file in raw format, regardless of the mime type sent, to the "save as type" menu. That way, if mozilla incorrectly identifies a binary file as text, or the server erroneously sends a text mime type for a binary file (like with those RAR archives), the user has some control over how the data is saved -- if they know the file is binary, they have a means of safely saving it as binary data that doesn't involve pasting the address into IE. The same could be added in reverse: on the off-chance that mozilla, for whatever reason, interprets a text file as binary data, the user can force-save as text if s/he so desires.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 57

•

22 years ago

>Why not simply add an option to force-save the file in raw format imho, saving a file should ALWAYS save it in raw format (unless "web page complete" is chosen, of course)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 58

•

22 years ago

In case you all missed it, saving in Mozilla _is_ in raw format. we don't even do newline conversion (though we should, imo, in some cases).

sairuh (rarely reading bugmail)

Updated

•

22 years ago

QA Contact: sairuh → petersen

shwag

Reporter

Comment 59

•

22 years ago

Here is another file that does the same ol' thing we've all seen for months. http://205.122.23.229/peng/linusq-a.ogg

Boris Zbarsky [:bzbarsky]

Assignee

Comment 60

•

22 years ago

Bad example -- that one the server claims to be text/plain. Fix the buggy server, please.

shwag

Reporter

Comment 61

•

22 years ago

Its not my server to fix, and since there are other servers out there that are also likely misconfigured, it would be foolish to say that it is not worth looking at a way to have mozilla detext files by extension. Workaround: open the URL up in IE.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 62

•

22 years ago

No, you do not understand. Doing what you suggest wouldbe a gross and blatant violation of the spec that _no_ browser other than IE does (I've tested Mozilla, Opera, Konqueror, Netscape 4, Mosaic, lynx, links, w3m). We _can_ detect these files by extension or even data sniffing. However we will _not_ be doing it. Please stop spamming this bug with rehashes of discussions that have happened in the newsgroups many times over.

Cameron Simpson

Comment 63

•

22 years ago

Workaround is to save it with File->Save or Ctrl-S in the window.

Adam Hauner

Updated

•

22 years ago

Keywords: mozilla1.0, mozilla1.1

Paul Wyskoczka

Comment 64

•

22 years ago

adt: nsbeta1-

Keywords: nsbeta1 → nsbeta1-

Jon Henry

Comment 65

•

21 years ago

*** Bug 210973 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Assignee

Comment 66

•

21 years ago

OK, taking. We've talked a lot, and lots of good ideas here, and I'm going to implement the simplest one -- filtering out known-not-text chars.

Assignee: rpotts → bz-vacation

Boris Zbarsky [:bzbarsky]

Assignee

Comment 67

•

21 years ago

Attached patch Proposed patch (deleted) — Details — Splinter Review

For the curious, with this patch we detect the file in the URL field as binary three bytes in.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 68

•

21 years ago

Comment on attachment 135573 [details] [diff] [review] Proposed patch Er, ignore that first hunk; I've not updated this tree to tip in a few days... ;) IS_TEXT_CHAR treats 127 and 8-bit chars as text for now, because various codepages may use them (though they probably should not be using 127, I can't guarantee that they are not). Thoughts?

Attachment #135573 - Flags: superreview?(darin)

Attachment #135573 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Priority: -- → P1

Summary: Binary file with unknown type displayed as text/plain rather than saved → [FIX]Binary file with unknown type displayed as text/plain rather than saved

Target Milestone: --- → mozilla1.6beta

Darin Fisher

Comment 69

•

21 years ago

Comment on attachment 135573 [details] [diff] [review] Proposed patch this is better than nothing. i agree that matching 127 here might be risky. i think this is a good heuristic that should help catch a lot of cases. r+sr=darin

Attachment #135573 - Flags: superreview?(darin)

Attachment #135573 - Flags: superreview+

Attachment #135573 - Flags: review?(darin)

Attachment #135573 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 70

•

21 years ago

Checked in. The next step is to add sniffers for common formats, per comment 29 (which I think has a good summary of the situation). Please file bugs on those and assign them to me? So far we have base64 on the list, right?

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Asa Dotzler [:asa]

Updated

•

21 years ago

Keywords: relnote

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Core → Core Graveyard