A better vim 'go to declaration' with ASTs.

There are loads of little commands in vim for navigating around your files. One of the ones I like the most is gd, for "go to local declaration". What that does - or is supposed to do - is find the place at which the identifier under the cursor is declared or defined, and go to it.

However, as the documentation states, this is "not guaranteed to work" - and in fact rarely does. The issue is that vim does not parse the code and find the actual definition: what it does it to find the beginning of the current function (which rarely works in Python anyway) or the top of the file, and search forward from there. Even if that did work reliably, it still wouldn't cope with definitions at class or module level, and doesn't distinguish between variables defined at different scopes.

No doubt the proper fix here is to use something like ctags, that processes your file and outputs a tags list that can be used by vim's tag matching. But that still wouldn't really understand scope. Anyway, I wanted to see if this could be done dynamically, by using Python itself to try and extract some meaning from the code, and determine where the variable was actually defined using its understanding of the scope. So, in my insanity, I turned to ASTs.

Abstract Syntax Trees are a way of representing code as a series of objects, arranged in a tree, which represent the language's grammar in an abstract way. Given this simple code:

def foo(bar):
  baz = bar * 2
  return baz

and calling ast.parse() on it (as a string), you get this series of objects (reformatted for clarity):

        args=[Name(id='bar', ctx=Param())],
        vararg=None, kwarg=None, defaults=[]),
            targets=[Name(id='baz', ctx=Store())],
                left=Name(id='bar', ctx=Load()),
        Return(value=Name(id='baz', ctx=Load()))
    ], decorator_list=[]

That's not particularly easy to read, but you can at least see that it's represented the various identifiers - foo, bar and baz - as objects with attributes defining things like their names, and that the objects form a tree with the Module at the top, a FunctionDef inside that, and then the Assign and Return statements inside the body attribute.

So, it looks like we can use this tree to find the definition of the variable we're on. The process will look like this: first, we need to scan through the tree to find our current file position and locate the AST representation of our current identifier. Then, we need to progressively widen our scope until we find the object that first assigns that identifier. Finally, we can move vim's cursor to that object's position.

The code for doing this is borrowed mainly from the excellent pyflakes project, which uses AST parsing to detect syntax or style errors. (The vim interface code is inspired by the much-lamented pyflakes.vim - I've never got on with Syntastic in quite the same way I did with that). Pyflakes already does a great job of implementing a Visitor class to step through the tree and call relevant methods for each node, so all we need to do is override the relevant methods to search for our identifier, rather than check for style.

Pyflakes also very helpfully keeps track of an object's scope. It uses this to check for errors like identifiers being referenced before they have been defined, or style issues like objects that are defined but never referenced. So, every time the visitor encounters a new definition, it stores it in a dictionary representing the current scope (module, function or class), and whenever it encounters a further use of that identifier, it looks the object up in the scope stack and annotates it with the scope where it was found.

Clearly, that is exactly what we need to implement our 'go to definition' functionality. We just need to override the Visitor methods such that we compare each node with our identifier, and if they match we take that node, find its scope (which has already been recorded by the visitor) and then extract the definition's line/column number from there. Note that when matching, we compare both name and line number - clearly, names can be defined multiple times in a file, and we want the correct node for our target position.

Things are complicated slightly by the fact that our target identifier might not be a Node at all, but an attribute of another node. This happens with dot notation: for example, the attribute reference foo.quux is represented in an AST like this:

Attribute(value=Name(id='baz', ctx=Load()), attr='quux', ctx=Load())

If you were searching for the original definition of the quux attribute, you would need to start from the Attribute node and check for matches to the attr attribute.

We deal with this by overriding both handleChildren and NAME methods from Pyflake's Checker class. The first is called for attributes, and the second for Name nodes, ie everything else. In each one we check the line number, as well as either the id or the attr atribute.

def handleChildren(self, tree):
    if (hasattr(tree, 'lineno') and tree.lineno == self.lineno
        and hasattr(tree, 'attr') and tree.attr == self.name):
            self.target = tree
            scope = self.getScope(tree.attr)
            self.targetScope = scope

    for node in checker.iter_child_nodes(tree):
        self.handleNode(node, tree)

def NAME(self, node):
    super(MyChecker, self).NAME(node)
    if node.lineno == self.lineno and node.id == self.name:
      self.target = node
      self.targetScope = self.getScope(node.id)

The getScope method is a simple check upwards through the relevant scope objects:

def getScope(self, target):
    if target in self.scope:
        return self.scope[target]
    for scope in self.scopeStack[-2::-1]:
        if target in scope:
            return scope[target]
    print 'Sorry, no definition found.'

All that's left is the wrapper to grab the word under the vim cursor using vim.eval('expand("<cword>")'), pass it to the Checker, extract the target definition, and go to the relevant line/column. One last tiny tweak is that not all nodes have an actual child node for the definition itself - eg the FunctionDef above starts on col 0, but the name itself is on col 4 - so we use a simply Python str.find to move to the actual column position of the name.

Now, of course this is still very far from perfect. It won't resolve names imported via from module import * - but of course no-one would ever use that anyway. Much more seriously though, it can't find names that are defined outside the current scope tree. For example, if you call a method that defines a series of attributes on an object, then return it, and then pass it to another method, inside that second method the scope chain has no record of where the new attributes were originally defined. Similarly, it won't find methods that are defined on an object's superclass. I haven't yet come up with any good way of fixing that, although in practice it doesn't seem to be a huge problem.

You can find the code on GitHub.

Comments !