Attaboy Media

Musings on assorted geekery by Luke Andrews when he's not writing at attaboy.ca or on Twitter
30 November 2007
Permalink

Beautify your JavaScript in BBEdit

BBEdit has nice tools built-in for formatting HTML and CSS into nice, hierarchically-indented blocks. It doesn’t do JavaScript though, which I often lament when I’m copying and pasting other people’s source code (or just my own lazily-written code).

Enter the JavaScript beautifier written by Einars “elfz” Lielmanis. It works very well. But I’m lazy, and I don’t want to visit a web site every time I want to format some code. I want to press a shortcut key and have some highlighted code be instantly formatted. elfz has thoughtfully made the source code available, so the question is: how do you get BBEdit to talk to PHP?

Luckily, OS X can run PHP as a shell script language. With a couple of extra lines of code tacked on the end, I was able to turn elfz’s script into a BBEdit-friendly “filter” which I can pass JavaScript through.

#!/usr/bin/php
<?php

[...the beautifier code...]

$BBEditInput = file_get_contents($argv[1]) or die($php_errormsg);
print(js_beautify($BBEditInput));
?>

Install said PHP file into ~/Library/Application Support/BBEdit/Unix Support/Unix Filters/ and then give it a keyboard shortcut (I chose Cmd-Ctrl-J since it wasn’t doing anything else yet).

Yay! Nicely-formatted JavaScript at the stroke of a key.

Update: the developer removed the PHP version in favour of a JavaScript version, which doesn’t work as a shell script without a lot of extra effort. Here’s the original PHP source code:

#!/usr/bin/php
<?php // $Id: beautify.php 37 2007-11-30 22:00:12Z einars $

/*

JS Beautifier

(c) 2007, Einars "elfz" Lielmanis

http://elfz.laacz.lv/beautify/


You are free to use this in any way you want, in case you find this useful or working for you.

Usage: 
    require('beautify.php'); 
    echo js_beautify($js_source_text);

*/

error_reporting(E_ALL);
ini_set('display_errors', 'on');


$n = 1;
define('IN_EXPR',          ++$n);
define('IN_BLOCK',         ++$n);


define('TK_UNKNOWN',       ++$n);
define('TK_WORD',          ++$n);
define('TK_START_EXPR',    ++$n);
define('TK_END_EXPR',      ++$n);
define('TK_START_BLOCK',   ++$n);
define('TK_END_BLOCK',     ++$n);
define('TK_END_COMMAND',   ++$n);
define('TK_EOF',           ++$n);
define('TK_STRING',        ++$n);

define('TK_BLOCK_COMMENT', ++$n);
define('TK_COMMENT',       ++$n);

define('TK_OPERATOR',      ++$n);

// internal flags
define('PRINT_NONE',       ++$n);
define('PRINT_SPACE',      ++$n);
define('PRINT_NL',         ++$n);


function js_beautify($js_source_text, $tab_size = 4)
{
    global $output, $token_text, $last_type, $last_text, $in, $ins, $indent, $tab_string, $is_last_nl;

    global $input, $input_length;

//    $tab_string = str_repeat(' ', $tab_size); 
    $tab_string = "\t";

    $input = $js_source_text;
    $input_length = strlen($input);

    $last_word = '';     // last TK_WORD passed
    $last_type = TK_START_EXPR; // last token type
    $last_text = '';     // last token text
    $output    = '';

    $is_last_nl = true;  // was the last character written a newline?

    // words which should always start on new line. 
    // simple hack for cases when lines aren't ending with semicolon.
    // feel free to tell me about the ones that need to be added. 
    $line_starters = explode(',', 'continue,try,throw,return,var,if,switch,case,default,for,while,break,function');

    // states showing if we are currently in expression (i.e. "if" case) - IN_EXPR, or in usual block (like, procedure), IN_BLOCK.
    // some formatting depends on that.
    $in       = IN_BLOCK;
    $ins      = array($in);


    $indent   = 0;
    $pos      = 0; // parser position
    $in_case  = false; // flag for parser that case/default has been processed, and next colon needs special attention

    while (true) {
        list($token_text, $token_type) = get_next_token($pos);
        if ($token_type == TK_EOF) {
            break;
        }

        // $output .= " [$token_type:$last_type]";

        switch($token_type) {

        case TK_START_EXPR:

            in(IN_EXPR);
            if ($last_type == TK_END_EXPR or $last_type == TK_START_EXPR) {
                // do nothing on (( and )( and ][ and ]( .. 
            } elseif ($last_type != TK_WORD and $last_type != TK_OPERATOR) {
                space();
            } elseif (in_array($last_word, $line_starters) and $last_word != 'function') { 
                space();
            }
            token();
            break;

        case TK_END_EXPR:

            token();
            in_pop();
            break;

        case TK_START_BLOCK:

            in(IN_BLOCK);
            if ($last_type != TK_OPERATOR and $last_type != TK_START_EXPR) {
                if ($last_type == TK_START_BLOCK) {
                    nl();
                } else {
                    space();
                }
            }
            token();
            indent();
            break;

        case TK_END_BLOCK:

            if ($last_type == TK_END_EXPR) {
                unindent();
                nl(false);
            } elseif ($last_type == TK_END_BLOCK) {
                unindent();
                nl(false);
            } elseif ($last_type == TK_START_BLOCK) {
                // nothing
                unindent();
            } else {
                unindent();
                nl(false);
            }
            token();
            in_pop();
            break;

        case TK_WORD:

            if ($token_text == 'case' or $token_text == 'default') {
                if ($last_text == ':') {
                    // switch cases following one another
                    remove_indent();
                } else {
                    $indent--;
                    nl();
                    $indent++;
                }
                token();
                $in_case = true;
                break;
            }

            $prefix = PRINT_NONE;
            if ($last_type == TK_END_BLOCK) {
                if (!in_array(strtolower($token_text), array('else', 'catch', 'finally'))) {
                    $prefix = PRINT_NL;
                } else {
                    $prefix = PRINT_SPACE;
                    space();
                }
            } elseif ($last_type == TK_END_COMMAND && $in == IN_BLOCK) {
                $prefix = PRINT_NL;
            } elseif ($last_type == TK_END_COMMAND && $in == IN_EXPR) {
                $prefix = PRINT_SPACE;
            } elseif ($last_type == TK_WORD) {
                if ($last_word == 'else') { // else if
                    $prefix = PRINT_SPACE;
                } else {
                    $prefix = PRINT_SPACE; 
                }
            } elseif ($last_type == TK_START_BLOCK) {
                $prefix = PRINT_NL;
            } elseif ($last_type == TK_END_EXPR) {
                space();
            }

            if (in_array($token_text, $line_starters) or $prefix == PRINT_NL) {

                if ($last_text == 'else') {
                    // no need to force newline on else break
                    // DONOTHING
                    space();
                } elseif (($last_type == TK_START_EXPR or $last_text == '=') and $token_text == 'function') {
                    // no need to force newline on 'function': (function
                    // DONOTHING
                } elseif ($last_type == TK_WORD and ($last_text == 'return' or $last_text == 'throw')) {
                    // no newline between 'return nnn'
                    space();
                } else
                    if ($last_type != TK_END_EXPR) {
                        if (($last_type != TK_START_EXPR or $token_text != 'var') and $last_text != ':') { 
                            // no need to force newline on 'var': for (var x = 0...)
                            if ($token_text == 'if' and $last_type == TK_WORD and $last_word == 'else') {
                                // no newline for } else if {
                                space();
                            } else {
                                nl();
                            }
                        }
                    }
            } elseif ($prefix == PRINT_SPACE) {
                space();
            }
            token();
            $last_word = strtolower($token_text);
            break;

        case TK_END_COMMAND:

            token();
            break;

        case TK_STRING:

            if ($last_type == TK_START_BLOCK or $last_type == TK_END_BLOCK) {
                nl();
            } elseif ($last_type == TK_WORD) {
                space();
            }
            token();
            break;

        case TK_OPERATOR:
            $start_delim = true;
            $end_delim   = true;

            if ($token_text == ':' and $in_case) {
                token(); // colon really asks for separate treatment
                nl();
                $expecting_case = false;
                break;
            }

            $in_case = false;

            
            
            if ($token_text == ',') {
                if ($last_type == TK_END_BLOCK) {
                    token();
                    nl();
                } else {
                    if ($in == IN_BLOCK) {
                        token();
                        nl();
                    } else {
                        token();
                        space();
                    }
                }
                break;
            } elseif ($token_text == '--' or $token_text == '++') { // unary operators special case
                if ($last_text == ';') {
                    // space for (;; ++i)  
                    $start_delim = true;
                    $end_delim = false;
                } else {                
                    $start_delim = false;
                    $end_delim = false;
                }
            } elseif ($token_text == '!' and $last_type == TK_START_EXPR) {
                // special case handling: if (!a)
                $start_delim = false;
                $end_delim = false;
            } elseif ($last_type == TK_OPERATOR) {
                $start_delim = false;
                $end_delim = false;
            } elseif ($last_type == TK_END_EXPR) {
                $start_delim = true;
                $end_delim = true;
            } elseif ($token_text == '.') {
                // decimal digits or object.property
                $start_delim = false;
                $end_delim   = false;

            } elseif ($token_text == ':') {
                // zz: xx
                // can't differentiate ternary op, so for now it's a ? b: c; without space before colon
                $start_delim = false;
            }
            if ($start_delim) {
                space();
            }

            token();
            
            if ($end_delim) {
                space();
            }
            break;

        case TK_BLOCK_COMMENT:

            nl();
            token();
            nl();
            break;

        case TK_COMMENT:

            //if ($last_type != TK_COMMENT) {
            nl();
            //}
            token();
            nl();
            break;

        case TK_UNKNOWN:
            token();
            break;
        }

        if ($token_type != TK_COMMENT) {
            $last_type = $token_type;
            $last_text = $token_text;
        }
    }

    // clean empty lines from redundant spaces
    $output = preg_replace('/^ +$/m', '', $output);

    return $output;
}




function nl($ignore_repeated = true)
{
    global $indent, $output, $tab_string, $is_last_nl;

    if ($output == '') return; // no newline on start of file

    if ($ignore_repeated and $is_last_nl) {
        return;
    }

    $is_last_nl = true;

    $output .= "\n" . str_repeat($tab_string, $indent);
}


// hack for correct multiple newline handling 
function safe_nl($newlines = 1)
{
    global $output;
    if ($newlines) {
        $output .= str_repeat("\n", $newlines - 1);
    }
    nl();
}


function space()
{
    global $output, $is_last_nl;
    
    $is_last_nl = false;

    if ($output and substr($output, -1) != ' ') { // prevent occassional duplicate space
        $output .= ' ';
    }
}


function token()
{
    global $token_text, $output, $is_last_nl;
    $output .= $token_text;
    $is_last_nl = false;
}

function indent()
{
    global $indent;
    $indent ++;
}


function unindent()
{
    global $indent;
    if ($indent) {
        $indent --;
    }
}


function remove_indent()
{
    global $tab_string, $output;
    $tab_string_len = strlen($tab_string);
    if (substr($output, -$tab_string_len) == $tab_string) {
        $output = substr($output, 0, -$tab_string_len);
    }
}


function in($where)
{
    global $ins, $in;
    array_push($ins, $in);
    $in = $where;
}


function in_pop()
{
    global $ins, $in;
    $in = array_pop($ins);
}



function make_array($str)
{
    $res = array();
    for ($i = 0; $i < strlen($str); $i++) {
        $res[] = $str[$i];
    }
    return $res;
}



function get_next_token(&$pos)
{
    global $last_type, $last_text;
    global $whitespace, $wordchar, $punct;
    global $input, $input_length, $is_last_nl;


    if (!$whitespace) $whitespace = make_array("\n\r\t ");
    if (!$wordchar)   $wordchar   = make_array('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_$');
    if (!$punct)      $punct  = explode(' ', '+ - * / % & ++ -- = += -= *= /= %= == === != !== > < >= <= >> << >>> >>>= >>= <<= && &= | || ! !! , : ? ^ ^= |='); 

    $n_newlines = 0;
    do {
        if ($pos >= $input_length) {
            return array('', TK_EOF);
        }
        $c = $input[$pos];
        $pos += 1;
        if ($c == "\n") {
            $n_newlines += 1;
        }
    } while (in_array($c, $whitespace));
    
    if ($n_newlines) {
        safe_nl($n_newlines);
    }

    if (in_array($c, $wordchar)) {
        if ($pos < $input_length) {
            while (in_array($input[$pos], $wordchar)) {
                $c .= $input[$pos];
                $pos += 1;
                if ($pos == $input_length) break;
            }
        }
        if ($c == 'in') { // hack for 'in' operator
            return array($c, TK_OPERATOR);
        }
        return array($c, TK_WORD);
    }

    if ($c == '(' || $c == '[') {
        return array($c, TK_START_EXPR);
    }

    if ($c == ')' || $c == ']') {
        return array($c, TK_END_EXPR);
    }

    if ($c == '{') {
        return array($c, TK_START_BLOCK);
    }

    if ($c == '}') {
        return array($c, TK_END_BLOCK);
    }

    if ($c == ';') {
        return array($c, TK_END_COMMAND);
    }

    if ($c == '/') {
        // peek for comment /* ... */
        if ($input[$pos] == '*') {
            $comment = '';
            $pos += 1;
            if ($pos < $input_length){
                while (!($input[$pos] == '*' && isset($input[$pos + 1]) && $input[$pos + 1] == '/') && $pos < $input_length) {
                    $comment .= $input[$pos];
                    $pos += 1;
                    if ($pos >= $input_length) break;
                }
            }
            $pos +=2;
            return array("/*$comment*/", TK_BLOCK_COMMENT);
        }
        // peek for comment // ...
        if ($input[$pos] == '/') {
            $comment = $c;
            while ($input[$pos] != "\x0d" && $input[$pos] != "\x0a") {
                $comment .= $input[$pos];
                $pos += 1;
                if ($pos >= $input_length) break;
            }
            $pos += 1;
            return array($comment, TK_COMMENT);
        }

    }

    if ($c == "'" || // string
        $c == '"' || // string
        ($c == '/' && 
            (($last_type == TK_WORD and $last_text == 'return') or ($last_type == TK_START_EXPR || $last_type == TK_END_BLOCK || $last_type == TK_OPERATOR || $last_type == TK_EOF || $last_type == TK_END_COMMAND)))) { // regexp
        $sep = $c;
        $c   = '';
        $esc = false;

        if ($pos < $input_length) {

            while ($esc || $input[$pos] != $sep) {
                $c .= $input[$pos];
                if (!$esc) {
                    $esc = $input[$pos] == '\\';
                } else {
                    $esc = false;
                }
                $pos += 1;
                if ($pos >= $input_length) break;
            }

        }

        $pos += 1;
        if ($last_type == TK_END_COMMAND) {
            nl();
        }
        return array($sep . $c . $sep, TK_STRING);
    }

    if (in_array($c, $punct)) {
        while (in_array($c . $input[$pos], $punct)) {
            $c .= $input[$pos];
            $pos += 1;
            if ($pos >= $input_length) break;
        }
        return array($c, TK_OPERATOR);
    }

    return array($c, TK_UNKNOWN);
}

$BBEditInput = file_get_contents($argv[1]) or die($php_errormsg);
print(js_beautify($BBEditInput));

?>