Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who did a CPython Bytecode → Java bytecode translator (https://timefold.ai/blog/java-vs-python-speed), I strongly recommend against the CPython Bytecode → PySM Assembly step:

- CPython Bytecode is far from stable; it changes every version, sometimes changing the behaviour of existing bytecodes. As a result, you are pinned to a specific version of Python unless you make multiple translators.

- CPython Bytecode is poorly documented, with some descriptions being misleading/incorrect.

- CPython Bytecode requires restoring the stack on exception, since it keeps a loop iterator on the stack instead of in a local variable.

I recommend instead doing CPython AST → PySM Assembly. CPython AST is significantly more stable.



Thanks — really appreciate your insights.

You're absolutely right that CPython bytecode changes over time and isn’t perfectly documented — I’ve also had to read the CPython source directly at times because of unclear docs.

That said, I intentionally chose to target bytecode instead of AST at this stage. Adhering to the AST would actually make me more vulnerable to changes in the Python language itself (new syntax, new constructs), whereas bytecode changes are usually contained to VM-level behavior. It also made it much easier early on, because the PyXL compiler behaves more like a simple transpiler — taking known bytecode and mapping it directly to PySM instructions — which made validation and iteration faster.

Either way, some adaptation will always be needed when Python evolves — but my goal is to eventually get to a point where only the compiler (the software part of PyXL) needs updates, while keeping the hardware stable.


CPython bytecode changes behaviour for no reason and very suddenly, so you will be vulnerable to changes in Python language versions. A few from the top of my head:

- In Python 3.10, jumps changed from absolute indices to relative indices

- In Python 3.11, cell variables index is calculated differently for cell variables corresponding to parameters and cell variables corresponding to local variables

- In Python 3.11, MAKE_FUNCTION has the code object at the TOS instead of the qualified name of the function

For what it's worth, I created a detailed behaviour of each opcode (along with example Python sources) here: https://github.com/TimefoldAI/timefold-solver/blob/main/pyth... (for up to Python 3.11).


This was my first thought as well. They will be stuck at a certain python version




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: