Javascript Decompressor Roundup

by Miguel de Icaza

Thanks to everyone that emailed me answers to the Javascript decompressor issue. This is a reply in case other people are looking at ways of de-obfuscating or have to debug some compressed Javascript code.

I included the names of the nice folks that emailed me, and some comments for those that I actually tried out.

Some annotations:

  • [CMD] works from the command line, my favorite kind.
  • [WEB] Provides a web UI.
  • [GUI] GUI
  • [SOURCE] comes with source code.
  • [WINDOWS] Windows-only

Here they go:

  • [WEB, CMD, SOURCE] Beautify is a PHP-based, server-side decompressor. This was the one I used to debug some of the problems we were having, and the results are very good. Source code is available for you to run on your local server, or you can reuse it from the command line (some assembly required for command-line use).
  • [CMD, SOURCE]Qooxdoo's pretty printer written in Python (Fabian Jakobs, Sebastian Werner), to use:
    $ INSTALL_PATH/frontend/framework/tool/modules/compiler.py file.js
    		

    Alternatively, you can get only the pretty printer from SVN:

    $ svn co https://qooxdoo.svn.sourceforge.net/svnroot/qooxdoo/trunk/qooxdoo/frontend/framework/tool/
    $ modules/compiler.py -w originalFile.js
    		
  • [WEB, SOURCE] Beautify.aspx. Get the source code as the link from their web page is broken (Jokin Cuadrado, Steven Coffman).
  • [GUI, WINDOWS] A Plug-in for Fiddler: Fiddler is a Windows HTTP debugger, if you are using Windows and Fiddler this plugin might be for you (Jokin Cuadrado, Steinar Herland).

If you are a VIM user, this VIM script provides Javascript indentation. but it seems like a lot of work for general-purpose decompression of javascript (Kjartan Maraas).

If you feel that none of the above is good for you and you want to prepare for your interview at Google, Jeff Walden suggests a hard-core approach:

One of the less well-known aspects of SpiderMonkey, Mozilla's C JavaScript engine, is that it includes a decompiler which translates from SpiderMonkey bytecode to JavaScript (most people only use it the other way around). You can see it at work any time you convert a function to a string. Most JavaScript engines, when asked to convert a function to a string, do one of two things: return the exact source text (I believe IE does this, but I haven't double-checked), or return a string provides the minimum ECMAScript requires -- that the string have the syntax of a function declaration, i.e. that it be be evaluable to create a function (I think this is what Safari does). SpiderMonkey's choice to eliminate the overhead of storing source text after converting means that it can't do the former, and the latter is unpalatable from a developer standpoint. Instead, it decompiles the bytecode back to a JavaScript string representing the function as exactly as possible, while at the s ame time formatting the decompiled source to be reasonably readable. How would you use SpiderMonkey to reformat obfuscated source? First, you get a copy of SpiderMonkey:
  export CVSROOT=:pserver:[email protected]:/cvsroot
  cvs co mozilla/js/src
  cd mozilla/js/src
  make -f Makefile.ref clean && make -f Makefile.ref  # work around broken dependency system
  .obj/js # to run the interpreter

Next, you dump the JS code you want to reformat into a function, and you have SpiderMonkey pretty-print it:

  echo "function container() {" > obfuscated.js
  cat file-to-clean-up.js >> obfuscated.js
  echo "} print(container.toString());" >> obfuscated.js
  path/to/js -f obfuscated.js

SpiderMonkey will then print the container function's string representation, adjusting indentation and such to create a readable, if still name-obfuscated, version.

A couple things to know about this: first, SpiderMonkey doesn't pretty-print functions found in expression context:

  (function() {
     print("this won't get cleaned up");
   })();
  call_method(function() {
    print("this will probably be crunched to one line");
    print("not pretty-printed");
  });

These examples are converted (once stripped of the containing function) to:

  (function () {print("this won't get cleaned up");}());
  call_method(function () {print("this will probably be crunched to one line");print("not pretty-printed");});

The former pattern's become fairly common for reducing namespace collisions (unfortunately for the decompiler), and the latter's become more popular as the functional aspects of JavaScript have been more played up recently in libraries. For now at least I think you just have to tweak the original source file to fix these problems. The decompiler could do a better job on these given some changes, but I don't see this happening any time soon. The decompiler is generally agreed to be one of the hairiest and least-well-understood pieces of code in SpiderMonkey, and people don't touch it that often.

Incidentally, the decompiler is also what allows SpiderMonkey to give the informative error messages it gives when your code throws an uncaught exception; the error messages I've seen in any other JavaScript interpreter are woefully less useful than the ones SpiderMonkey gives you using the decompiler.

Posted on 16 Nov 2007