diff --git a/docs/language/core.rst b/docs/language/core.rst index 8838153..3b089ea 100644 --- a/docs/language/core.rst +++ b/docs/language/core.rst @@ -699,6 +699,20 @@ Returns the single step macro expansion of *form*. HySymbol('e'), HySymbol('f')])]) +.. _mangle-fn: + +mangle +------ + +Usage: ``(mangle x)`` + +Stringify the input and translate it according to :ref:`Hy's mangling rules +`. + +.. code-block:: hylang + + => (mangle "foo-bar") + 'foo_bar' .. _merge-with-fn: @@ -1431,6 +1445,22 @@ Returns an iterator from *coll* as long as *pred* returns ``True``. => (list (take-while neg? [ 1 2 3 -4 5])) [] +.. _unmangle-fn: + +unmangle +-------- + +Usage: ``(unmangle x)`` + +Stringify the input and return a string that would :ref:`mangle ` to +it. Note that this isn't a one-to-one operation, and nor is ``mangle``, so +``mangle`` and ``unmangle`` don't always round-trip. + +.. code-block:: hylang + + => (unmangle "foo_bar") + 'foo-bar' + Included itertools ================== diff --git a/docs/language/internals.rst b/docs/language/internals.rst index 48a4985..155ab0a 100644 --- a/docs/language/internals.rst +++ b/docs/language/internals.rst @@ -157,17 +157,8 @@ HySymbol ``hy.models.HySymbol`` is the model used to represent symbols in the Hy language. It inherits :ref:`HyString`. -``HySymbol`` objects are mangled in the parsing phase, to help Python -interoperability: - - - Symbols surrounded by asterisks (``*``) are turned into uppercase; - - Dashes (``-``) are turned into underscores (``_``); - - One trailing question mark (``?``) is turned into a leading ``is_``. - -Caveat: as the mangling is done during the parsing phase, it is possible -to programmatically generate HySymbols that can't be generated with Hy -source code. Such a mechanism is used by :ref:`gensym` to generate -"uninterned" symbols. +Symbols are :ref:`mangled ` when they are compiled +to Python variable names. .. _hykeyword: @@ -340,7 +331,7 @@ Since they have no "value" to Python, this makes working in Hy hard, since doing something like ``(print (if True True False))`` is not just common, it's expected. -As a result, we auto-mangle things using a ``Result`` object, where we offer +As a result, we reconfigure things using a ``Result`` object, where we offer up any ``ast.stmt`` that need to get run, and a single ``ast.expr`` that can be used to get the value of whatever was just run. Hy does this by forcing assignment to things while running. @@ -352,11 +343,11 @@ As example, the Hy:: Will turn into:: if True: - _mangled_name_here = True + _temp_name_here = True else: - _mangled_name_here = False + _temp_name_here = False - print _mangled_name_here + print _temp_name_here OK, that was a bit of a lie, since we actually turn that statement diff --git a/docs/language/interop.rst b/docs/language/interop.rst index df34016..34d61ea 100644 --- a/docs/language/interop.rst +++ b/docs/language/interop.rst @@ -8,6 +8,12 @@ Hy <-> Python interop Despite being a Lisp, Hy aims to be fully compatible with Python. That means every Python module or package can be imported in Hy code, and vice versa. +:ref:`Mangling ` allows variable names to be spelled differently in +Hy and Python. For example, Python's ``str.format_map`` can be written +``str.format-map`` in Hy, and a Hy function named ``valid?`` would be called +``is_valid`` in Python. In Python, you can import Hy's core functions +``mangle`` and ``unmangle`` directly from the ``hy`` package. + Using Python from Hy ==================== @@ -27,41 +33,6 @@ You can use it in Hy: You can also import ``.pyc`` bytecode files, of course. -A quick note about mangling --------- - -In Python, snake_case is used by convention. Lisp dialects tend to use dashes -instead of underscores, so Hy does some magic to give you more pleasant names. - -In the same way, ``UPPERCASE_NAMES`` from Python can be used ``*with-earmuffs*`` -instead. - -You can use either the original names or the new ones. - -Imagine ``example.py``:: - - def function_with_a_long_name(): - print(42) - - FOO = "bar" - -Then, in Hy: - -.. code-block:: clj - - (import example) - (.function-with-a-long-name example) ; prints "42" - (.function_with_a_long_name example) ; also prints "42" - - (print (. example *foo*)) ; prints "bar" - (print (. example FOO)) ; also prints "bar" - -.. warning:: - Mangling isn’t that simple; there is more to discuss about it, yet it doesn’t - belong in this section. -.. TODO: link to mangling section, when it is done - - Using Hy from Python ==================== diff --git a/docs/language/syntax.rst b/docs/language/syntax.rst index 149265a..deed2cb 100644 --- a/docs/language/syntax.rst +++ b/docs/language/syntax.rst @@ -2,25 +2,10 @@ Syntax ============== -Hy maintains, over everything else, 100% compatibility in both directions -with Python itself. All Hy code follows a few simple rules. Memorize -this, as it's going to come in handy. +identifiers +----------- -These rules help ensure that Hy code is idiomatic and interfaceable in both -languages. - - * Symbols in earmuffs will be translated to the upper-cased version of that - string. For example, ``foo`` will become ``FOO``. - - * UTF-8 entities will be encoded using - `punycode `_ and prefixed with - ``hy_``. For instance, ``⚘`` will become ``hy_w7h``, ``♥`` will become - ``hy_g6h``, and ``i♥u`` will become ``hy_iu_t0x``. - - * Symbols that contain dashes will have them replaced with underscores. For - example, ``render-template`` will become ``render_template``. This means - that symbols with dashes will shadow their underscore equivalents, and vice - versa. +An identifier consists of a nonempty sequence of Unicode characters that are not whitespace nor any of the following: ``( ) [ ] { } ' "``. Hy first tries to parse each identifier into a numeric literal, then into a keyword if that fails, and finally into a symbol if that fails. numeric literals ---------------- @@ -98,6 +83,53 @@ the error ``Keyword argument :foo needs a value``. To avoid this, you can quote the keyword, as in ``(f ':foo)``, or use it as the value of another keyword argument, as in ``(f :arg :foo)``. +.. _mangling: + +symbols +------- + +Symbols are identifiers that are neither legal numeric literals nor legal +keywords. In most contexts, symbols are compiled to Python variable names. Some +example symbols are ``hello``, ``+++``, ``3fiddy``, ``$40``, ``just✈wrong``, +and ``🦑``. + +Since the rules for Hy symbols are much more permissive than the rules for +Python identifiers, Hy uses a mangling algorithm to convert its own names to +Python-legal names. The rules are: + +- Convert all hyphens (``-``) to underscores (``_``). Thus, ``foo-bar`` becomes + ``foo_bar``. +- If the name ends with ``?``, remove it and prepend ``is``. Thus, ``tasty?`` + becomes ``is_tasty``. +- If the name still isn't Python-legal, make the following changes. A name + could be Python-illegal because it contains a character that's never legal in + a Python name, it contains a character that's illegal in that position, or + it's equal to a Python reserved word. + + - Prepend ``hyx_`` to the name. + - Replace each illegal character with ``ΔfooΔ`` (or on Python 2, ``XfooX``), + where ``foo`` is the the Unicode character name in lowercase, with spaces + replaced by underscores and hyphens replaced by ``H``. Replace ``Δ`` itself + (or on Python 2, ``X``) the same way. If the character doesn't have a name, + use ``U`` followed by its code point in lowercase hexadecimal. + + Thus, ``green☘`` becomes ``hyx_greenΔshamrockΔ`` and ``if`` becomes + ``hyx_if``. + +- Finally, any added ``hyx_`` or ``is_`` is added after any leading + underscores, because leading underscores have special significance to Python. + Thus, ``_tasty?`` becomes ``_is_tasty`` instead of ``is__tasty``. + +Mangling isn't something you should have to think about often, but you may see +mangled names in error messages, the output of ``hy2py``, etc. A catch to be +aware of is that mangling, as well as the inverse "unmangling" operation +offered by the ``unmangle`` function, isn't one-to-one. Two different symbols +can mangle to the same string and hence compile to the same Python variable. +The chief practical consequence of this is that ``-`` and ``_`` are +interchangeable in all symbol names, so you shouldn't assign to the +one-character name ``_`` , or else you'll interfere with certain uses of +subtraction. + discard prefix --------------