From 67fd0ddbbeb7f6418cb151c1b4217dd29eafc62e Mon Sep 17 00:00:00 2001 From: Paul Tagliamonte Date: Mon, 30 Dec 2013 17:32:57 -0500 Subject: [PATCH] Document the compiler a little. --- docs/language/internals.rst | 167 ++++++++++++++++++++++++++++++++++++ 1 file changed, 167 insertions(+) diff --git a/docs/language/internals.rst b/docs/language/internals.rst index cf4f99c..df43061 100644 --- a/docs/language/internals.rst +++ b/docs/language/internals.rst @@ -13,6 +13,173 @@ Hy Models Write this. +Hy Internal Theory +================== + +.. _overview: + +Overview +-------- + +The Hy internals work by acting as a front-end to Python bytecode, so that +Hy it's self compiles down to Python Bytecode, allowing an unmodified Python +runtime to run Hy. + +The way we do this is by translating Hy into Python AST, and building that AST +down into Python bytecode using standard internals, so that we don't have +to duplicate all the work of the Python internals for every single Python +release. + +Hy works in four stages. The following sections will cover each step of Hy +from source to runtime. + +.. _lexing: + +Lexing / tokenizing +------------------- + +The first stage of compiling hy is to lex the source into tokens that we can +deal with. We use a project called rply, which is a really nice (and fast) +parser, written in a subset of Python called rply. + +The lexing code is all defined in ``hy.lex.lexer``. This code is mostly just +defining the Hy grammer, and all the actual hard parts are taken care of by +rply -- we just define "callbacks" for rply in ``hy.lex.parser``, which take +the tokens generated, and return the Hy models. + +You can think of the Hy models as the "AST" for Hy, it's what Macros operate +on (directly), and it's what the compiler uses when it compiles Hy down. + +Check the documentation for more information on the Hy models for more +information regarding the Hy models, and what they mean. + +.. TODO: Uh, we should, like, document models. + + +.. _compiling: + +Compiling +--------- + +This is where most of the magic in Hy happens. This is where we take Hy AST +(the models), and compile them into Python AST. A couple of funky things happen +here to work past a few problems in AST, and working in the compiler is some +of the most important work we do have. + +The compiler is a bit complex, so don't feel bad if you don't grok it on the +first shot, it may take a bit of time to get right. + +The main entry-point to the Compiler is ``HyASTCompiler.compile``. This method +is invoked, and the only real "public" method on the class (that is to say, +we don't really promise the API beyond that method). + +In fact, even internally, we don't recurse directly hardly ever, we almost +always force the Hy tree through ``compile``, and will often do this with +sub-elements of an expression that we have. It's up to the Type-based dispatcher +to properly dispatch sub-elements. + +All methods that preform a compilation are marked with the ``@builds()`` +decorator. You can either pass the class of the Hy model that it compiles, +or you can use a string for expressions. I'll clear this up in a second. + +First stage type-dispatch +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Let's start in the ``compile`` method. The first thing we do is check the +Type of the thing we're building. We look up to see if we have a method that +can build the ``type()`` that we have, and dispatch to the method that can +handle it. If we don't have any methods that can build that type, we raise +an internal ``Exception``. + +For instance, if we have a ``HyString``, we have an almost 1-to-1 mapping of +Hy AST to Python AST. The ``compile_string`` method takes the ``HyString``, and +returns an ``ast.Str()`` that's populated with the correct line-numbers and +content. + +Macro-expand +~~~~~~~~~~~~ + +If we get a ``HyExpression``, we'll attempt to see if this is a known +Macro, and push to have it expanded by invoking ``hy.macros.macroexpand``, then +push the result back into ``HyASTCompiler.compile``. + +Second stage expression-dispatch +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The only special case is the ``HyExpression``, since we need to create different +AST depending on the special form in question. For instance, when we hit an +``(if true true false)``, we need to generate a ``ast.If``, and properly +compile the sub-nodes. This is where the ``@builds()`` with a String as an +argument comes in. + +For the ``compile_expression`` (which is defined with an +``@builds(HyExpression)``) will dispatch based on the string of the first +argument. If, for some reason, the first argument is not a string, it will +properly handle that case as well (most likely by raising an ``Exception``). + +If the String isn't known to Hy, it will default to create an ``ast.Call``, +which will try to do a runtime call (in Python, something like ``foo()``). + +Issues hit with Python AST +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Python AST is great; it's what's enabled us to write such a powerful project +on top of Python without having to fight Python too hard. Like anything, we've +had our fair share of issues, and here's a short list of the common ones you +might run into. + +*Python differentiates between Statements and Expressions*. + +This might not sound like a big deal -- in fact, to most Python programmers, +this will shortly become a "Well, yeah" moment. + +In Python, doing something like: + +``print for x in range(10): pass``, because ``print`` prints expressions, and +``for`` isn't an expression, it's a control flow statement. Things like +``1 + 1`` are Expressions, as is ``lambda x: 1 + x``, but other language +features, such as ``if``, ``for``, or ``while`` are statements. + +Since they have no "value" to Python, this makes working in Hy hard, since +doing something like ``(print (if true true false))`` is not just common, it's +expected. + +As a result, we auto-mangle things using a ``Result`` object, where we offer +up any ``ast.stmt`` that need to get run, and a single ``ast.expr`` that can +be used to get the value of whatever was just run. Hy does this by forcing +assignment to things while running. + +As example, the Hy:: + + (print (if true true false)) + +Will turn into:: + + if True: + _mangled_name_here = True + else: + _mangled_name_here = False + + print _mangled_name_here + + +OK, that was a bit of a lie, since we actually turn that statement +into:: + + print True if True else False + +By forcing things into an ``ast.expr`` if we can, but the general idea holds. + + +Runtime +------- + +After we have a Python AST tree that's complete, we can try and compile it to +Python bytecode by pushing it through ``eval``. From here on out, we're no +longer in control, and Python is taking care of everything. This is why things +like Python tracebacks, pdb and django apps work. + + Hy Macros =========