How a Vim user converts to Emacs

June 21, 2007

For many years I thought that the heretics of the Emacs clan [1] were clearly insane – the followers of Vim were obviously following the True Path. And then I was snared by the dark side.

My primary motivation for starting to use Emacs was so I could use the Slime[2] Lisp debugger. One part of Slime resides on the Lisp side, and a larger part resides on the Emacs side as an interface. I worked on a project called Slim-Vim that tried to emulate the interface side of Slime in Vim. We started by integrating a Common Lisp system, ECL[3] into Vim and exposing some of Vim’s functions. Then we could code in Lisp. We managed to get a fair way there before running into painful problems. Vim is not designed to be easily extended. There is a lot of code in there that works in a strange manner (likely for performance/portability reasons), and we basically got stuck.

While working on Slim-Vim I needed to get familiar with what I was trying to provide, so I used Slime on Emacs. To get by while in Emacs, I enabled Viper mode – an Emacs mode that emulates Vi keys. There were a couple of things that I didn’t like about Viper, so I added a couple of functions. It was at this point that I had unknowingly set my path.

Lets look at my programming philosophy over this period of time (6+ months). At the start of the Slim-Vim project, I was very new to Common Lisp and still thinking very much in a C style mindset. The only way of working that I knew was the C way – compile/link/run. As I learned Lisp & tried to make Slim-Vim work I read a lot about software design using Lisp, about how open, extensible and malleable systems[4] are the best way to work. And I agreed, I couldn’t wait to start working with a Lisp system, with a proper editor (Vim) and a good debugger (Slim-Vim) – if only I could get Slim-Vim working.

As I got more disheartened with ever getting Vim to work properly[5], I tinkered with my Viper changes some more. I finally started to grok what these Lispers were talking about. I could edit my Viper changes and instantly test them, instantly see the effects. The open system that I wanted to work with was already right in front of me in the form of Emacs, but I hadn’t seen it! I was looking for something better than C, but it took me 6+ months to really see it. It’s hard to describe, and in-fact I’m not sure that you can really convince people that open systems are better – it is something that you really need to find for yourself.

The second factor for me choosing Emacs is that I want to work with open and dynamic applications. This means you should be able to inspect how the app works, tinker with it, see your results right away. You should be able to live in it. And now that I actually grok that it seems crazy to me to choose to use closed systems.
I used to be a Vim bigot, so I’m sure somebody will say that Vim can do all the things that Emacs can do. Maybe it even can. But it doesn’t have the same feel. If you want to know what an open, malleable, dynamic system is like – use Emacs & write some Emacs Lisp. Trust me, you won’t want to go back.

[1] – http://www.dina.kvl.dk/~abraham/religion/
[2] – http://common-lisp.net/project/slime/
[3] – http://ecls.sourceforge.net/
[4] – I’ll post about open systems in the future.
[5] – If you’ve ever had the misfortune to mess around with Vim’s internals, you’ll probably understand why it is hard to change.

Grep cache

June 7, 2007

I mentioned in my last post that I’m working for a games company now. I’m also working with the largest code base that I’ve ever seen, or really heard of. To give you an idea, the mainline code base is over 85Mb of text in 14,000 files. If you include the 3rd party packages that we link against, then we have 600+Mb of code text in something like 85k files. I think that this is large by any reasonable standard. Oh, I just did a line count. 18 million lines of text, though I am counting multiple versions of some packages.

Most developers at work use Visual Studio and Visual Assist to edit and navigate their code. I can’t really stand Visual Studio as an editor, so I use Emacs with Vimpulse mode (this is basically Vim key bindings). For a while I was using Visual Assist to navigate the code. On a code base this large I think that the single most useful tool VA provides is Alt+g, which jumps to the definition of the symbol you are on. The context info is quite smart, for instance if the cursor is on something like someVar->SomeFunction() then VA will figure out the type of someVar and put the cursor on typeof_someVar::SomeFunction. Not too shabby, and generating the VA database is not too painful really. But there are two things here, how do I find every place that SomeFunction is used? And, besides I use Emacs.

So I do what any Unix guy in a Windows world would do – install Cygwin and the unix tools for Win32 & try the standard code indexing tools. Tools like ctags, cscope, GNU Global, and source navigator. All of them choke, there are just too many files. I did manage to build a subset database for ctags, which resulted in a 70Mb tags file. Emacs just took too long looking up tags. Using find and grep also takes far too long, something in the order of 5+ minutes. None of this works in a satisfactory manner, and I end up using Visual Studio to navigate the code and Emacs to edit. After much grumbling at my situation I looked at the raw code size numbers and thought to myself “how long does it actually take to grep a 500Mb file?”. I did some tests, and it turns out not that long. On my machine, grepping a single 500Mb file is less than 10 seconds. Grepping an 80Mb file is so fast as to be instant.

Enter Grep Cache

The basic idea of grep cache is that you concatenate all of your code data into one huge file and grep that file instead. Grepping a single file, you will get back the line number of the big file and the matching line of text. The information that you don’t get directly back is the original file that contained that grep hit. My solution to this is to create a dumb index file, which simply has text in the format

<number-of-line-in-file> <filename>\n

Then a Perl script to linear search the index file, as soon as the input line number is greater than the sum of all previous line numbers, we have found the original file. A little bit of text massage to get the Perl script output into a grep format that Emacs expects & we’re done.

How fast is this then? I can grep the smaller (85Mb) code base that I usually work with pretty much instantly. The bigger database takes 10-20 seconds. Most of the time is spent in the Perl script doing the dumb-as-a-rock linear search. If I wanted to grep the bigger file faster I’d write a proper index file where I could use a binary search. But for right now, I’m happy with my new tool. Super fast grepping actually lets me do pretty much everything that a proper tags based system does.

Code

Here’s the code, this runs on Windows with the Unix tools installed. Thanks to Dean for the Perl code.

Bat file to replace grep (fast-grep.bat):
grep -n %* database.txt | perl lookup-file-name.pl

Generate database (gendatabase.bat):

set ROOT=C:\dev\project

del files.lst
del index.txt
del database.txt
ufind %ROOT% -iname *.cpp -or -iname *.h -or -iname *.c | sed s_\\_/_g > %ROOT%\files.lst
for /f %%i in (%ROOT%\files.lst) do @wc -l %%i >> index.txt
for /f %%i in (%ROOT%\files.lst) do @cat %%i >> database.txt

Perl code (lookup-file-name.pl):

#! perl.exe -w

use strict;

my $indexName = “all-index.txt”;
my $index;
open $index, $indexName or die “Can’t open index file $indexName\n”;

my $a = 0;
my @files;

sub addItem {
my ($numLines,$filepath) = @_;

push @files, [$a, $filepath];
$a += $numLines;
}

sub findItem {
my $num = shift;
my $filename;
my $offset;
map { my @a = @{$_}; if ( $a[0] <= $num) { $filename = $a[1]; $offset = $num – $a[0]; } } @files;

($offset,$filename);
}

LINE: while ( ) {
next LINE unless ( /^\s+(\d+) (.*)$/ );

my $numLines = $1;
my $filepath = $2;

#print “$numLines $filepath\n”;
addItem( $numLines, $filepath);
}

#map { print “@{$_}\n”; } @files;
#print findItem( 1000);

FILE: while ( ) {
next FILE unless ( /^(\d+):(.*)/);

my $num = $1;
my $line = $2;

my ($offset,$filepath) = findItem($num);

print “${filepath}:${offset}:$line\n”;
}

First Post

June 4, 2007

I think that’s the obligatory title of the first post to any blog, right?

So I guess that I’m going to try and have another go at this blogging thing. My previous attempt was at http://bradbev.livejournal.com/ which didn’t last that long or get that much attention.  I think that my problem is that I am inherently against much of my personal information getting out onto the web, it just creeps me out.  So this particular blog will not even attempt to have personal information.  This is a tech blog where I can write whatever crap that I like.

Let’s get my personal info out of the way.  I was born in 1979, I’m from New Zealand.  I have a bachelors degree in computer science & I’ve been working since 2000.  Until 2007 I worked exclusively in the embedded systems industry.  In March 2007 I changed industries and I’m working for a games company now as a systems programmer.  I guess the theory goes that embedded systems and games have something in common.   That’s about all you’ll get out of me.

What am I going to talk about here?  Anything that strikes my fancy, which will probably be tech related.  I’m an ex-Vim user who switched to Emacs so I could use Slime with Lisp programming.  For those Vim fans out there, I still use Vim keybindings 🙂  Lisp is my current favourite language, and I use SBCL on OS X with Aquamacs + Slime.  C++ is the language that I use at work.  Ick is probably the word there.  Once you’ve grokked a language like Lisp (Smalltalk and other ‘whole environment’ systems are probably the same feel), then it is really hard to live in the C world of edit/compile/link/run.  Maybe I should rant about that at some stage.

Anyhow, let’s get on with the show.