Closed
Bug 119942
Opened 23 years ago
Closed 23 years ago
Unknown decoder should not sniff 1024 bytes of the file
Categories
(Core Graveyard :: File Handling, defect, P4)
Core Graveyard
File Handling
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 126782
mozilla1.2alpha
People
(Reporter: bzbarsky, Assigned: bzbarsky)
Details
UnknownDecoder uses a 1024 byte buffer. This leads to recognizing some text
files as HTML (incorrectly).
We should consider using a smaller buffer. 128 or 256 are probably the best
possibilities. Do many web pages have over 128 chars of whitespace at the
beginning?
You have to read past the first newline or 128 bytes to handle Unix #!
interpreter lines (bug 110767). I have seen web pages with >70 blank
lines at the beginning in a simple-minded attempt to hide code. With
CRLF newlines, that's > 128 bytes. That would indicate at least 256
bytes. Maybe the decoder should be smarter.
Assignee | ||
Comment 2•23 years ago
|
||
Ok... Does 256 sound as a reasonable compromise?
Any hints on making the decoder smarter are much appreciated, btw. :)
Priority: -- → P4
Target Milestone: --- → mozilla1.0
Assignee | ||
Comment 3•23 years ago
|
||
A thought. Perhaps we should look for "<tagname" as the first non-whitespace
text instead of just anywhere in the 1024/256/whatever bytes?
I would think 256 is probably fine unless some unix reads 256 bytes for
the shbang hack. In that case, I would go for 512.
Testing for <tag may be good but beware of perversities.
#!/bin/sh
<foo cat >bar
...
is obviously a shell script but without the first line it becomes
<foo cat >bar
...
which could be hard to detect.
I think the only proper solution is to look for a text-like distribution
in the first n bytes but that may be yet harder still. That is what the
unknown decoder is trying to do now. It's just applying a simplistic
statistical model.
Comment 6•23 years ago
|
||
Resolving as a duplicate. While this bug calls for changes that are different
than those in bug 126782, the changes affect the exact same thing. The technical
discussion should occur in just one bug. If my thinking is off-base in resolving
this as a duplicate, please reopen.
*** This bug has been marked as a duplicate of 126782 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
-> file handling
Component: Networking → File Handling
QA Contact: benc → sairuh
Updated•22 years ago
|
QA Contact: sairuh → petersen
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•