Open Bug 1209390 Opened 9 years ago Updated 1 years ago

Use standard lz4 file format instead of the non-standard jsonlz4/mozlz4

Categories

(Toolkit :: General, defect, P5)

41 Branch
defect

Tracking

()

People

(Reporter: 1utyu4+43h7ypa654vuo, Unassigned)

References

Details

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0 Build ID: 20150918100310 Steps to reproduce: Use Firefox. Actual results: Bookmark backup files (in "bookmarkbackups/") and other files (such as things in "crashes/") are lz4-compressed files, but they use a non-standard format. Result: Users cannot avail themselves of standard, commonly available tools to inspect these files, which contain *their* data. Instead they have to resort to Firefox-specific (or Mozilla-specific, same point) hacks to access their data. [1] [1] Such as using the Library GUI in Firefox to export bookmarks; or using Mozilla's lz4 interfaces through XPCOM. Expected results: Mozilla should use standard file formats. You promised to switch to a standard format, once one was defined [2]. One was defined a while ago [3]. Standard tools are available since some time [4]. Why are you still delaying? [2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/workerlz4/lz4.js#49 [3] https://github.com/Cyan4973/lz4/blob/master/lz4_Block_format.md https://github.com/Cyan4973/lz4/blob/master/lz4_Frame_format.md [4] For example: https://packages.debian.org/jessie/liblz4-tool
Component: General → Places
Product: Core → Toolkit
lz4.js file has been moved to /toolkit/components/lz4/lz4.js [1] [1] http://mxr.mozilla.org/mozilla-central/source/toolkit/components/lz4/lz4.js
Blocks: 818587
Places will use a different format, when the platform (toolkit) will move to a different format. And I think it will be the same for all the consumers. So this is a more general Toolkit bug.
Component: Places → General
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Marco Bonardo [::mak] from comment #2) > Places will use a different format, when the platform (toolkit) will move to > a different format. To clarify, if the compressor starts creating files in the more widely supported format, and the decompressor can still support the old format (I don't see why not, since there's an header it can detect), it will be enough to fix the lz4 component in toolkit to automatically move all the consumers to the new format.
I've created an unofficial stand-alone decompressor for `.jsonlz4` files. The project with source code is hosted here: https://github.com/avih/dejsonlz4 . Initial v0.1 release can be found here https://github.com/avih/dejsonlz4/releases and includes a Windows executable `dejsonlz4.exe`. It should hopefully compile easily elsewhere too. Please take any discussions regarding this project to the project page on github.
(In reply to Anthony Thyssen from comment #5) > Could you include in your source code "README.md" file ... (In reply to Avi Halachmi (:avih) from comment #4) > Please take any discussions regarding this project to the project page on github.
(In reply to S from comment #7) > (In reply to Avi Halachmi (:avih) from comment #6) > > (In reply to Avi Halachmi (:avih) from comment #4) > > > Please take any discussions regarding this project to the project page on github. > > Thanks for the piece of code to decompress. But do you have any idea to go > the other way and compress? Please take this to GitHub, this can't be answered here.
The current obfuscation by changing the magic field in the lz4 compressed files doesn't provide any security enhancements. For advanced users it is now more complicated to "patch" or "synchronise" configuration files. So +1 for migrate to standard lz4 compressed or uncompressed files.
(In reply to H.-Dirk Schmitt from comment #9) > The current obfuscation by changing the magic field in the lz4 compressed > files doesn't provide any security enhancements. There is no obfuscation will, nor security enhancement here. At the time when lz4 was added there wasn't a standard format, thus a very simple header had been created in front of the payload. Now a standard exists, but nobody internally had the time to convert the encoder/decoder for it, nor anyone volunteered to do that yet.
I would like to support this change in regards to search engines. Currently, I'm forced to use side tools and export/import is severely limited. Also, I don't really see how compression/signing really prevents search hijacking when (from my experience) most of the time it's done by side software with all the needed tools for that.
Are there any news on this? Is Mozilla planning to replace the Firefox-specific lz4 format by a standard format? I definitely want to vote for a standard format, because it's then much easier to work with in other tools. For instance, I've written a Scala/Java library that reads the state of the Firefox session file. I've chosen the JVM to be platform independent. With a standard lz4 format there would not be any problem, since there are multiple lz4 libraries for Java. However, this Firefox-specific format makes the whole decoding now much more complicated. I can either try to reimplement the Firefox-specific implementation in Java or I have to deploy the library with several platform dependent tools like dejsonlz4 which do the decompression task for the specific platform. Both solutions are cumbersome and therefore I really hope for a format change. Are there any workarounds for tool developers like me? For instance, is it possible to decompress the Firefox-specific format somehow with some standard lz4 decompressors by slightly changing the format or something like that?
OK, forget my last question. Fortunately, the Firefox-specific format is actually the same as lz4 except that there is a 12 bytes prefix which can be just skipped for decompressing the file. So there is a workaround for now. :)
yes, the only non-standard thing is the header.
Priority: -- → P5
FWIW, I'm using a PHP code for several years: https://gist.github.com/vlakoff/3139e310664285c6c83b Also, note it's not the latest LZ4, but v1.3, which is not upward compatible.
(In reply to Stefan Endrullis from comment #12) > … workarounds … mozlz4-edit – Add-ons for Firefox <https://addons.mozilla.org/addon/mozlz4-edit/ > … open, edit and save …

Another utility, for people trying to work around this: https://gist.github.com/Tblue/62ff47bef7f894e92ed5

What I'm wondering now - why compress at all? For me, and I guess for most other users, this file doesn't even reach a Megabyte. Yes, lz4 can compress it to around 10%, but who cares at that size?

It does less I/O when storing and retrieving on disk.

Ok from a quick benchmark, I'd zuggest switching to zstd? It seems to compress/decompress slightly faster than LZ4 with triple the compression ratio...
(I unpacked a ~4MiB upgrade.jsonlz4 I had in my profile and duplicated its contents random times, than ran multitime with mozlz4 (python3 implemented, that may skew the results), zstd and pigz commands packing and unpacking the test file.. on a tmpfs, with BOINC paused on my hexacore machine)

652M test.json
123M test.json.gz
144M test.json.lz4
 43M test.json.zst

Compressing:

++ multitime -v zstd -kf test.json
===> Executing zstd -kf test.json
===> multitime results
1: zstd -kf test.json
            Mean        Std.Dev.    Min         Median      Max
real        1.266       0.000       1.266       1.266       1.266       
user        1.363       0.000       1.363       1.363       1.363       
sys         0.178       0.000       0.178       0.178       0.178       

++ multitime -v sh -c 'mozlz4 -c < '\''test.json'\'' > '\''test.json'\''.lz4'
===> Executing sh -c "mozlz4 -c < 'test.json' > 'test.json'.lz4"
===> multitime results
1: sh -c "mozlz4 -c < 'test.json' > 'test.json'.lz4"
            Mean        Std.Dev.    Min         Median      Max
real        1.353       0.000       1.353       1.353       1.353       
user        0.852       0.000       0.852       0.852       0.852       
sys         0.497       0.000       0.497       0.497       0.497       

++ multitime -v pigz -kf test.json
===> Executing pigz -kf test.json
===> multitime results
1: pigz -kf test.json
            Mean        Std.Dev.    Min         Median      Max
real        2.010       0.000       2.010       2.010       2.010       
user        22.482      0.000       22.482      22.482      22.482      
sys         0.454       0.000       0.454       0.454       0.454       

Decompression:

++ multitime -v zstd -dc test.json.zst
===> Executing zstd -dc test.json.zst
===> multitime results
1: zstd -dc test.json.zst
            Mean        Std.Dev.    Min         Median      Max
real        0.368       0.000       0.368       0.368       0.368       
user        0.363       0.000       0.363       0.363       0.363       
sys         0.004       0.000       0.004       0.004       0.004       

++ multitime -v sh -c 'mozlz4 -d < test.json.lz4 > /dev/null'
===> Executing sh -c "mozlz4 -d < test.json.lz4 > /dev/null"
===> multitime results
1: sh -c "mozlz4 -d < test.json.lz4 > /dev/null"
            Mean        Std.Dev.    Min         Median      Max
real        0.863       0.000       0.863       0.863       0.863       
user        0.408       0.000       0.408       0.408       0.408       
sys         0.455       0.000       0.455       0.455       0.455       

++ multitime -v pigz -dc test.json.gz
===> Executing pigz -dc test.json.gz
===> multitime results
1: pigz -dc test.json.gz
            Mean        Std.Dev.    Min         Median      Max
real        1.467       0.000       1.467       1.467       1.467       
user        2.184       0.000       2.184       2.184       2.184       
sys         0.174       0.000       0.174       0.174       0.174       

For anyone coming across this needing to decompress these files:

/* 
NOTE: BEFORE RUNNING THIS SCRIPT, CHECK THIS SETTING:
Type or paste about:config into the address bar and press Enter
Click the button promising to be careful
In the search box type devt and pause while Firefox filters the list
If devtools.chrome.enabled is false, double-click it to toggle to true

Paste this entire script into the command line at the bottom of the Browser Console (Windows: Ctrl+Shift+j)
Then press Enter to run the script. A file picker should promptly open.
*/

async function convert() {
  // Set up file chooser
  var fp = Components.classes["@mozilla.org/filepicker;1"]
    .createInstance(Components.interfaces.nsIFilePicker);
  fp.init(window, "Open File", Components.interfaces.nsIFilePicker.modeOpen);
  fp.appendFilter("Bookmark Backup Files", "*.jsonlz4");
  var result = await new Promise(resolve => fp.open(resolve));
  // Call file choose, proceed if a file was chosen
  if (result == Components.interfaces.nsIFilePicker.returnOK) {   var file = fp.file;
    // Check that file can be used
    if (file.exists() && file.isFile() && file.isReadable()) {
      var oldfile = fp.file.path;
      // Construct output file name
      var newfile = oldfile.replace(".jsonlz4", "_converted.json");
      // See: http://forums.mozillazine.org/viewtopic.php?p=14111285#p14111285
      var {utils:Cu} = Components;
      Cu.import("resource://gre/modules/osfile.jsm");
      var jsonString = await OS.File.read(oldfile,{ compression: "lz4" });
      // console.log(jsonString);
      OS.File.writeAtomic(newfile, jsonString);
    }
  }
}
convert()

Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 25 votes and 52 CCs.
:mossop, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dtownsend)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(dtownsend)

I don't think there is any value in posting further comments in support of this, we're on board with doing it, we just don't currently have the resources available to do it. If someone wants to work on implementing it then we would review a patch.

You need to log in before you can comment on or make changes to this bug.