Skip to main content

Bicycle Repair Man - Rewrite Python Sources

Project description

Bicycle Repair Man

BRM is a python source modification library to perform lossless modifications with the guarantee of full-roundtripability. It is generally used for unstructured source parts, where the modification can be done directly on the tokens.

A simple example would be the TokenTransformer, where we change each + (plus) operator to a - (minus) operator.

class DestoryAllOfThem(TokenTransformer):
    
    # Replace each PLUS token with a MINUS
    def visit_plus(self, token):
        return token._replace(string="-")

transformer = DestoryAllOfThem()
assert transformer.transform("(2p) + 2 # with my precious comment") == "(2p) - 2 # with my precious comment"

One advantage of token based refactoring over any form of structured tree representation is that, you are much more liberal about what you can do. Do you want to prototype a new syntax idea, for example a operator; here you go:

class SquareRoot(TokenTransformer):

    # Register a new token called `squareroot`
    def register_squareroot(self):
        return "√"

    # Match a squareroot followed by a number
    @pattern("squareroot", "number")
    def remove_varprefix(self, operator, token):
        return self.quick_tokenize(f"int({token.string} ** 0.5)")

sqr = SquareRoot()
assert eval(sqr.transform("√9")) == 3

Why BRM

  • BRM is an extremely simple, dependency-free, pure-python tool with 500 LoC that you can easily vendor.
  • BRM supports each new Python syntax out of the box, no need to wait changes on our upstream.
  • BRM supports incomplete files (and files that contain invalid python syntax).
  • BRM supports introducing new syntax and making it permanent for prototypes.

If you need any of these, BRM might be the right fit. But I would warn against using it for complex refactoring tasks, since that is not a problem we intend to tackle. If you need such a tool, take a look at refactor or parso.

Permanency

If you loved the concept of transformers and use them in real world code, BRM exposes a custom encoding that will run your transformers automatically when specified.

  • Write a transformer
  • Copy it to the ~/.brm folder, or simply use cp <file>.py $(python -m brm)
  • Specify # coding: brm on each file

Example:

from brm import TokenTransformer, pattern

class AlwaysTrue(TokenTransformer):

    STRICT = False

    # Make every if/elif statement `True`
    @pattern("name", "*any", "colon")
    def always_true_if(self, *tokens):
        statement, *_, colon = tokens
        if statement.string not in {"if", "elif"}:
            return
        true, = self.quick_tokenize("True")
        return (statement, true, colon)

Let's put our transformer to the BRM's transformer folder, and run our example.

(.venv) [  9:12ÖS ]  [ isidentical@x200:~ ]
 $ cat -n r.py
     1  # coding: brm
     2
     3  a = 2
     4  if a > 2:
     5      print("LOL")
(.venv) [  9:12ÖS ]  [ isidentical@x200:~ ]
 $ cp test.py $(python -m brm)
(.venv) [  9:12ÖS ]  [ isidentical@x200:~ ]
 $ python r.py
LOL

TA-DA!

BRM Pattern Syntax

For BRM, a python source code is just a sequence of tokens. It doesn't create any relationships between them, or even verify the file is syntactically correct. For example take a look at the following file:

if a == x:
    2 + 2 # lol

For BRM, in an abstract fashion, the file is just the following text:

NAME NAME EQEQUAL NAME COLON NEWLINE INDENT NUMBER PLUS NUMBER COMMENT NEWLINE DEDENT ENDMARKER

And internally it is processed like this:

brm pattern show gif

If you want to match binary plus operation here (2 + 2), you can create pattern with number, plus, name.

Note: If you want to visualize your patterns and see what they match, give examples/visualize.py a shot.

Extras

If you are using the TokenTransformer, there are a few handy functions that you might check out:

Function Returns Description
quick_tokenize(source: str, *, strip: bool = True) List[TokenInfo] Break the given source text into a list of tokens. If strip is True, then the last 2 tokens (NEWLINE, EOF) will be omitted.
quick_untokenize(tokens: List[TokenInfo]) str Convert the given sequence of tokens back to a representation which would yield the same tokens back when tokenized (a lossy conversion). If you want a full round-trip / lossless conversion, use tokenize.untokenize.
directional_length(tokens: List[TokenInfo]) int Calculate the linear distance between the first and the last token of the sequence.
shift_all(tokens: List[TokenInfo], x_offset: int, y_offset: int) List[TokenInfo] Shift each token in the given sequence by x_offset in the column offsets, and by y_offset in the line numbers. Return the new list of tokens.
until(toktype: int, stream: List[TokenInfo]) Iterator[TokenInfo] Yield all tokens until a token of toktype is seen. If there are no such tokens seen, it will raise a ValueError
_get_type(token: TokenInfo) int Return the type of the given token. Useful with until(). (internal)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brm-0.3.0.tar.gz (9.7 kB view hashes)

Uploaded Source

Built Distribution

brm-0.3.0-py2.py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page