497 lines
18 KiB
ReStructuredText
497 lines
18 KiB
ReStructuredText
:orphan:
|
|
|
|
==================================
|
|
Python 3 compatibility/conversions
|
|
==================================
|
|
|
|
Official compatibility: Flectra 1.0 will be the first LTS release to introduce
|
|
Python 3 compatibility, starting with Python 3.5. It will also be the first
|
|
LTS release to drop official support for Python 2.
|
|
|
|
Rationale: Python 3 has been around since 2008, and all Python libraries
|
|
used by the official Flectra distribution have been ported and are considered
|
|
stable. Most supported platforms have a Python 3.5 package, or a similar
|
|
way to deploy it. Preserving dual compatibility is therefore considered
|
|
unnecessary, and would represent a significant overhead in testing for the
|
|
lifetime of Flectra 1.0.
|
|
|
|
Python 2 and Python 3 are somewhat different language, but following
|
|
backports, forward ports and cross-compatibility library it is possible to
|
|
use a subset of Python 2 and Python 3 in order to have a system compatible
|
|
with both.
|
|
|
|
Here are a few useful steps or reminders to make Python 2 code compatible
|
|
with Python 3.
|
|
|
|
.. important::
|
|
|
|
This is not a general-purpose guide for porting Python 2 to Python 3, it's
|
|
a guide to write 2/3-compatible Flectra code. It does not go through all the
|
|
changes in Python but rather through issues which have been found in the
|
|
standard Flectra distribution in order to show how to evolve such code such
|
|
that it works on both Python 2 and Python 3.
|
|
|
|
References/useful documents:
|
|
|
|
* `What's new in Python 3? <https://docs.python.org/3.0/whatsnew/3.0.html>`_
|
|
covers many of the changes between Python 2 and Python 3, though it is
|
|
missing a number of changes which `were backported to Python 2.7 <https://docs.python.org/3/whatsnew/2.7.html#python-3-1-features>`_
|
|
as well as :ref:`some feature reintroductions <p3support>` of later Python 3
|
|
revisions
|
|
* `How do I port to Python 3? <https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/>`_
|
|
* `Python-Future <http://python-future.org/index.html>`_
|
|
* `Porting Python 2 code to Python 3 <https://docs.python.org/3/howto/pyporting.html>`_
|
|
* `Porting to Python 3: A Guide <http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/>`_ (a bit outdated but useful for the extensive comments on strings and IO)
|
|
|
|
.. _p3support:
|
|
|
|
Versions Support
|
|
================
|
|
|
|
A cross compatible Flectra would only support Python 2.7 and Python 3.5 and
|
|
above: Python 2.7 backported some Python 3 features, and Python 2 features
|
|
were reintroduced in various Python 3 in order to make conversion easier.
|
|
Python 3.6 adds great features (f-strings, ...) and performance improvements
|
|
(ordered compact dicts) but does not seem to reintroduce compatibility
|
|
features whereas:
|
|
|
|
* Python 3.5 reintroduced ``%`` for bytes/bytestrings (:pep:`461`)
|
|
* Python 3.4 has no specific compatibility improvement but is the lowest P3
|
|
version for PyLint
|
|
* Python 3.3 reintroduced the "u" prefix for proper (unicode) strings
|
|
* Python 3.2 made ``range`` views more list-like (backported to 2.7)and
|
|
reintroduced ``callable``
|
|
|
|
.. warning::
|
|
|
|
While Python 3 adds plenty of great features (keyword-only parameters,
|
|
generator delegation, pathlib, ...), you must *not* use them in Flectra
|
|
until Python 2 support is dropped
|
|
|
|
.. note::
|
|
|
|
In the *very rare* cases where you *need* to differentiate between
|
|
Python 2 and Python 3, use the :data:`flectra.tools.pycompat.PY2` flag.
|
|
|
|
Semantics changes
|
|
=================
|
|
|
|
Dict & set iteration order ("Hash Randomisation")
|
|
-------------------------------------------------
|
|
|
|
In Python 2, the iteration order depends on the value's hash (modulo the
|
|
collection's capacity and conflict resolution), which provides a
|
|
spec-undefined but implementation-defined order. While that's not supposed to
|
|
happen, it turns out code may depend on the specific order of iteration over
|
|
a hash collection (``dict`` or ``set``).
|
|
|
|
Python 3.3 enables `hash randomisation`_ by default (this can be optionally
|
|
enabled on previous versions including Python 2 by providing the ``-R``
|
|
command-line parameter), which means *the order of iteration changes from one
|
|
run to the next*.
|
|
|
|
When discovered, this can be fixed by one of:
|
|
|
|
* making iteration steps properly independent (removing the dependency of
|
|
order of iteration)
|
|
* using different checking method (e.g. when serialising sets or dictionaries
|
|
and checking against the specific serialised value)
|
|
* fixing dependencies
|
|
* using a ``collections.OrderedDict`` or ``flectra.tools.misc.OrderedSet`` instead
|
|
of a regular one, they guarantee order of iteration is order of insertion
|
|
* sorting the collection's items before iterating over them (this may require
|
|
adding some sort of iteration key to the items)
|
|
|
|
Moved and removed
|
|
=================
|
|
|
|
Standard Library Modules
|
|
------------------------
|
|
|
|
Python 3 reorganised, moved or removed a number of modules in the standard
|
|
library:
|
|
|
|
* ``StringIO`` and ``cStringIO`` were removed, you can use ``io.BytesIO`` and
|
|
``io.StringIO`` to replace them in a cross-version manner (``io.BytesIO``
|
|
for binary data, ``io.StringIO`` for text/unicode data).
|
|
* ``urllib``, ``urllib2`` and ``urlparse`` were redistributed across
|
|
``urllib.parse`` and ``urllib.request``.
|
|
|
|
Since `requests`_ and `werkzeug`_ are already hard dependencies of Flectra,
|
|
replace ``urllib[2].urlopen``/``urllib2.Request`` uses by `requests`_, and
|
|
``urlparse`` and a few utilty functions (``urllib.quote``,
|
|
``urllib.urlencode``) are available through ``werkzeug.urls``, a backport
|
|
of Python 3's ``urllib.parse``.
|
|
|
|
.. warning:: `requests`_ does not raise by default on non-200 responses
|
|
|
|
* ``cgi.escape`` (HTML escaping) is deprecated in Python 3, prefer Flectra's own
|
|
:func:`flectra.tools.misc.html_encode`.
|
|
* Most of ``types``'s content has been stripped out in Python 3: only
|
|
"internal" interpreter types (e.g. CodeType, FrameType, ...) have been left
|
|
in, other types can be obtained directly from the corresponding builtin or
|
|
by getting the ``type()`` of a literal value.
|
|
|
|
Absolute Imports (:pep:`328`)
|
|
-----------------------------
|
|
|
|
.. important::
|
|
|
|
In Python 3, ``import foo`` can only import from a "top-level" library
|
|
(absolute path). If trying to import a sibling or sub-module you *must*
|
|
use an explicitly *relative import* e.g. ``from . import foo`` or
|
|
``from .foo import bar``.
|
|
|
|
In Python 2 ``import`` statements are ambiguous: if a file ``a.py`` contains
|
|
``import b``, the import system will first check if there's a ``b.py`` file
|
|
next to it before checking if there is a package called that on the
|
|
PYTHONPATH.
|
|
|
|
Furthermore if a sibling file is named the same as top-level package, the
|
|
library becomes inaccessible to both the file itself ans siblings, this has
|
|
actually happened in Flectra with :mod:`flectra.tools.mimetypes`.
|
|
|
|
Additionally, relative imports allow navigating "up" the tree by using
|
|
multiple leading ``.``.
|
|
|
|
.. note::
|
|
|
|
Explicitly relative imports are always available in Python 2, and should
|
|
be used everywhere.
|
|
|
|
You can ensure you are not using any implicitly relative import by adding
|
|
``from __future__ import absolute_import`` at the top of your files, or by
|
|
running the ``relative-import`` PyLint.
|
|
|
|
Exception Handlers
|
|
------------------
|
|
|
|
.. important::
|
|
|
|
All exception handlers must be converted to ``except ... as ..``. Valid
|
|
forms are::
|
|
|
|
except Exception:
|
|
except (Exception1, ...):
|
|
except Exception as name:
|
|
except (Exception1, ...) as name:
|
|
|
|
In Python 2, ``except`` statements are of the form::
|
|
|
|
except Exception[, name]:
|
|
|
|
or::
|
|
|
|
except (Exception1, Exception2)[, name]:
|
|
|
|
But because the name is optional, this gets confusing and people can stumble
|
|
into the first form when trying for the second and write::
|
|
|
|
except Exception1, Exception:
|
|
|
|
which will *not* yield the expected result.
|
|
|
|
Python 3 changes this syntax to::
|
|
|
|
except Exception[ as name]:
|
|
|
|
or::
|
|
|
|
except (Exception1, Exception2)[ as name]:
|
|
|
|
This form was implemented in Python 2.5 and is thus compatible across the
|
|
board.
|
|
|
|
Operators & keywords
|
|
--------------------
|
|
|
|
.. important:: The backtick operator ```foo``` must be converted to an
|
|
explicit call to the ``repr()`` builtin
|
|
|
|
.. important:: The ``<>`` operator must be replaced by ``!=``
|
|
|
|
These two operators were long recommended against/deprecated in Python 2,
|
|
Python 3 removed them from the language.
|
|
|
|
.. _changed-exec:
|
|
|
|
.. important:: ``exec`` is now a builtin
|
|
|
|
In Python 2, ``exec`` is a statement/keyword. Much like ``print``, it's been
|
|
converted to a builtin function in Python 3. However because the Python 2
|
|
version can take a tuple parameter it is easy to convert the odd ``exec``
|
|
statement to the following cross-language forms::
|
|
|
|
exec(source)
|
|
exec(source, globals)
|
|
exec(source, globals, locals)
|
|
|
|
List/iteration builtins and methods
|
|
-----------------------------------
|
|
|
|
In Python 3, a number of builtins and methods formerly returning *lists* were
|
|
converted to return *iterators* or *views*, with the corresponding redundant
|
|
methods or functions having been *removed entirely*:
|
|
|
|
* In Python 3, ``map``, ``filter`` and ``zip`` return iterators,
|
|
``itertools.imap``, ``itertools.ifilter`` and ``itertools.izip`` have been
|
|
removed.
|
|
|
|
.. important::
|
|
|
|
When possible, use comprehensions (list, generator, ...) rather than
|
|
``map`` or ``filter``.
|
|
|
|
* In Python 3, ``dict.keys``, ``dict.values`` and ``dict.items`` return
|
|
*views* rather than lists, and the ``iter*`` and ``view*`` methods have
|
|
been removed.
|
|
|
|
.. important::
|
|
|
|
When the result of the above methods is used for more than a one-shot
|
|
loop (e.g. to be included in returned value), or when the dict needs
|
|
to be modified during iteration, wrap the calls in a ``list()``.
|
|
|
|
builtins
|
|
--------
|
|
|
|
``cmp``
|
|
#######
|
|
|
|
The ``cmp`` builtin function has been removed from Python 3.
|
|
|
|
* Most of its uses are in ``cmp=`` parameters to sort functions where it can
|
|
usually be replaced by a key function.
|
|
* Other uses found were obtaining the sign of an item (``cmp(item, 0)``), this
|
|
can be replicated using the standard library's ``math.copysign`` e.g.
|
|
``math.copysign(1, item)`` will return ``1.0`` if ``item`` is positive and
|
|
``-1.0`` if ``item`` is negative.
|
|
|
|
``execfile``
|
|
############
|
|
|
|
``execfile(path)`` has been removed completely from Python 3 but it is
|
|
trivially replaceable in all cases by::
|
|
|
|
exec(open(path, 'rb').open())
|
|
|
|
of a variant thereof (see :ref:`exec changes <changed-exec>` for details)
|
|
|
|
``file``
|
|
########
|
|
|
|
The ``file`` builtin has been removed in Python 3. Generally, it can just
|
|
be replaced by the ``open`` builtin, although you may want to use ``io.open``
|
|
which is more flexible and better handles the binary/text dichotomy,
|
|
:ref:`a big issue in cross-version Python <changed-strings>`.
|
|
|
|
.. note::
|
|
|
|
In Python 3, the ``open`` builtin is actually an alias for ``io.open``.
|
|
|
|
``long``
|
|
########
|
|
|
|
In Python 2, integers can be either ``int`` or ``long``. Python 3 unifies this
|
|
under the single ``int`` type.
|
|
|
|
.. important::
|
|
|
|
* the ``L`` suffix for integer literals must be removed
|
|
* calls to ``long`` must be replaced by calls to ``int``
|
|
* ``(int, long)`` for type-checking purposes must be replaced by
|
|
:py:data:`flectra.tools.pycompat.integer_types`
|
|
|
|
|
|
* the ``L`` suffix on numbers is unsupported in Python 3, and unnecessary in
|
|
Python 2 as "overflowing" integer literals will implicitly instantiate long.
|
|
* in Python 2, a call to ``int()`` will implicitly create a ``long`` object if
|
|
necessary.
|
|
* type-testing is the last and bigger issue as in Python 2 ``long`` is not a
|
|
subtype of ``int`` (nor the reverse), and ``isinstance(value, (int, long))``
|
|
is thus generally necessary to catch all integrals.
|
|
|
|
For that case, Flectra 11 now provides a compatibility module with an
|
|
:py:data:`~flectra.tools.pycompat.integer_types` definition which can be used
|
|
for type-testing.
|
|
|
|
It is a tuple of types so when used with ``isinstance`` it can be provided
|
|
directly or inside an other tuple alongside other types e.g.
|
|
``isinstance(value, (BaseModel, integer_types))``.
|
|
|
|
However when used with ``type`` directly (which should be avoided) you
|
|
should use the ``in`` operator, and if you need other types you need to
|
|
concatenate ``integer_types`` to an other tuple.
|
|
|
|
``reduce``
|
|
##########
|
|
|
|
In Python 3, ``reduce`` has been demoted from builtin to ``functools.reduce``.
|
|
However this is because *most uses of ``reduce`` can be replaced by ``sum``,
|
|
``all``, ``any``* or a list comprehension for a more readable and faster
|
|
result.
|
|
|
|
It is easy enough to just add ``from functools import reduce`` to the file
|
|
and compatible with Python 2.6 and later, but consider whether you get better
|
|
code by replacing it with some other method altogether.
|
|
|
|
``xrange``
|
|
##########
|
|
|
|
In Python 3, ``range()`` behaves the same as Python 2's ``xrange``.
|
|
|
|
For cross-version code, you can just use ``range()`` everywhere: while this
|
|
will incur a slight allocation cost on Python 2, Python 3's ``range`` supports
|
|
the entire Sequence protocol and thus behaves very much like a regular
|
|
list or tuple.
|
|
|
|
Removed/renamed methods
|
|
-----------------------
|
|
|
|
.. important::
|
|
|
|
* the ``has_key`` method on dicts must be replaced by use of the ``in``
|
|
operator e.g. ``foo.has_key(bar)`` becomes ``bar in foo``.
|
|
|
|
``in`` for dicts was introduced in Python 2.3, leading to ``has_key`` being
|
|
redundant, and removed in Python 3.
|
|
|
|
Minor syntax changes
|
|
--------------------
|
|
|
|
* the ability to unpack a parameter (in the parameter declaration list) has
|
|
been removed in Python 3 e.g.::
|
|
|
|
def foo((bar, baz), qux):
|
|
…
|
|
|
|
is now invalid
|
|
|
|
* octal literals must be prefixed by ``0o`` (or ``0O``). Following the C
|
|
family, in Python 2 an octal literal simply has a leading 0, which can be
|
|
confusing and easy to get wrong when e.g. padding for readability (e.g.
|
|
``0013`` would be the decimal 11 rather than 13).
|
|
|
|
In Python 3, leading zeroes followed by neither a 0 nor a period is an
|
|
error, octal literals now follow the hexadecimal convention with a ``0o``
|
|
prefix.
|
|
|
|
.. _changed-strings:
|
|
|
|
Bytes/String/Text: The Big One
|
|
==============================
|
|
|
|
The most impactful Python 3 change by far is to the text model: for historical
|
|
reasons the distinction Python 2's bytestrings (``bytes``/``str``) and text
|
|
strings (``unicode``) is fuzzy and it will try to implicitly convert between
|
|
one and the other using the ASCII encoding.
|
|
|
|
Python 3 changes this, it removes the implicit conversions, removes APIs which
|
|
contribute to the fuzz and tends to strictly segregate other to work on either
|
|
bytes or text.
|
|
|
|
This is fundamentally good and mostly sensible, but it means lots of breakage:
|
|
|
|
the builtins
|
|
------------
|
|
|
|
Python 3 removes both ``unicode`` and ``basestring``, and ``str`` now
|
|
corresponds to *text* strings (the old ``unicode``) with ``bytes`` being
|
|
bytestrings in both languages [#bytes]_.
|
|
|
|
Both versions have the following prefixes for string literals:
|
|
|
|
* ``b'foo'`` is a bytestring (``bytes`` object).
|
|
|
|
* ``'foo'`` is that version's ``str`` type, which may be either a bytestring
|
|
or a text string [#native-string]_.
|
|
|
|
* ``u'foo'`` is that version's text string.
|
|
|
|
For best cross-version compatibility you should avoid unprefixed string
|
|
literals unless you *specifically* need a "native string" [#native-string]_.
|
|
|
|
For easier type-testing, :mod:`flectra.tools.pycompat` provides the following
|
|
constants:
|
|
|
|
* :data:`~flectra.tools.pycompat.string_types` is an alias/type tuple for testing
|
|
string types, essentially a replacement of testing for ``basestring`` or
|
|
``(str, unicode)``.
|
|
* :data:`~flectra.tools.pycompat.text_type` is the proper *text* type for the
|
|
current version, it should mostly be used for converting non-bytes objects
|
|
to text.
|
|
* ``bytes`` should be avoided for type conversions, though it can be used to
|
|
check if an object is a bytestring.
|
|
|
|
``open``
|
|
--------
|
|
|
|
.. important::
|
|
|
|
the ``open`` builtin should always be explicitly used in binary mode
|
|
(``rb``, ``wb``, ...)
|
|
|
|
To read *text* files, use ``io.open``.
|
|
|
|
On both P2 and P3, ``open`` defaults to returning *native strings* in default
|
|
("text") mode, however in P3 that means it actually decodes the file's bytes
|
|
using whatever encoding was set up (default: UTF-8) while on Python 2 it has
|
|
no concept of encoding.
|
|
|
|
Using ``open`` in binary mode provides bytestrings on both versions and works
|
|
fine. To read *text* files, use ``io.open`` and provide an explicit encoding.
|
|
|
|
base64
|
|
------
|
|
|
|
base64 is a bytes->bytes conversion. bytes->bytes codecs were removed from the
|
|
"native" encoding/decoding system which is now exclusively for bytes<->text
|
|
conversions: text is *encoded* to bytes and bytes are *decoded* to text.
|
|
|
|
.. important::
|
|
|
|
both ``bytes.encode('base64')`` and ``bytes.decode('base64')`` must be
|
|
migrated to using ``base64.b64encode`` and ``base64.b64decode``
|
|
respectively.
|
|
|
|
csv
|
|
---
|
|
|
|
``csv`` is a fairly vicious one: not only is it not a very good format, the
|
|
Python 2 and Python 3 versions of the library are text-model incompatible in
|
|
significant ways:
|
|
|
|
* Python 2's CSV only works on *ascii-compatible byte streams* (it has no
|
|
encoding support at all) and extracts bytestring values
|
|
* Python 3's CSV only works on *text streams* and extract text values
|
|
* And ``io`` doesn't provide "native string" streaming facilities.
|
|
|
|
However with respect to Flectra it turns out most or all uses of ``csv`` fit
|
|
inside a model of *byte stream to and from text values*.
|
|
|
|
The latter is thus a model implemented by cross-version wrappers
|
|
:func:`flectra.tools.pycompat.csv_reader` and
|
|
:func:`flectra.tools.pycompat.csv_writer`: they take a *UTF-8 byte stream* and
|
|
read or write *text* values.
|
|
|
|
.. _hash randomisation: http://bugs.python.org/issue13703
|
|
|
|
.. _requests: http://docs.python-requests.org/
|
|
|
|
.. _werkzeug: http://werkzeug.pocoo.org/docs/urls/
|
|
|
|
.. [#bytes]
|
|
|
|
with the caveat that Python 3 makes them less text-y and more byte-y e.g.
|
|
in Python 2 ``b"foo"[0]`` is ``b"f"``, but in Python 3 it's ``102`` (the
|
|
value of the first byte), you'll want to *slice* bytestrings for
|
|
compatibility.
|
|
|
|
.. [#native-string]
|
|
|
|
this is important because some API/contexts take a *native string* rather
|
|
than either bytes or text. The ``csv`` module of the standard library is
|
|
one such problematic API (it is also notoriously problematic for its
|
|
terrible support of non-ascii-compatible encodings in Python 2).
|
|
``email.message_from_string`` is an other one.
|