As someone who does data science, I roll my eyes every time I have to touch Python. It’s ubiquitous, but it actually sucks once you get used to better languages.
It is somewhat circular: it was preferred because your earlier alternatives were Java or C(++) both of which had their shortcomings. SKLearn is still one of the most feature-complete and powerful libraries and it was Python only and thus drew a crowd.
A lot of people who write data science code, I would be confident to bet that if you taught them Julia first, they’d prefer that.
There are lots of little things that add up. Things like expressing multiline closures in Python is clunky compared to almost any other dynamic language whether Ruby, Lua or Julia.
AsyncIO is quite complex in how it works. You use the same patterns for concurrency in Julia, but it is so much easier to grasp and work with.
Multiple-dispatch as used in Julia e.g. makes API design so much cleaner. You can see that almost anywhere. I make some comparison of making REST calls in Python and Julia here. Julia is much cleaner IMHO: https://erik-engheim.medium.com/explore-rest-apis-with-curl-...
The Python problem is that you cannot reuse the same function name easily for different types. Hence instead of creating one abstraction across many different types you need to invent all these different names which can be hard to guess. In Julia there are often far fewer core concepts to learn which can be re-applied in far more ways.
Things like calling shell programs is done more elegantly in Julia. Same with calling C functions. String interpolation is more obvious. There are not like 4 different ways of doing it.
Package management and environment management is much simpler and elegantly done.
I agree some of this may seem unfair as Python has baggage from being an older language. But that also counts in its favor with wider selection of libraries. Both should be taken into account when evaluating your choices.
> The Python problem is that you cannot reuse the same function name easily for different types. Hence instead of creating one abstraction across many different types you need to invent all these different names which can be hard to guess. In Julia there are often far fewer core concepts to learn which can be re-applied in far more ways.
At least there's `functools.singledispatch` in the standard library. There's apparently also multiple dispatch libraries. I've never used either, duck typing with some try/except (and some isinstance) has served me well so far, but I agree it's not as clean.
> There are not like 4 different ways of doing it.
Yet.
> Package management and environment management is much simpler and elegantly done.
Yeah, it's a nightmare in Python. After setting up projects dozens of times now I still don't grok it.
Not the OP, the community approach to performance, that rather rewrites code into C, still calling it Python (?), instead of being more supportive of ongoing JIT endeavours.
Yes Python is very dynamic, not more than Smalltalk, SELF or Common Lisp, all with quite good JIT engines.
That's my opinion, and coming from you (from what I get you're seasoned in many area of the computing field) it's not surprising. But the current mainstream is not really aware of all this. There's tiny python cult due to scipy et al.
There is another thing, most of the major libraries where Python is used as DSL, written in a mix of C, C++ and Fortran, can be used in other languages just as well, nothing special about Python there other than lack of awareness of what everyone else is doing.
In the ML, DL world or in the physical simulators, people just compose a task and throw it to a CPU/GPU/TPU or a cluster of these and let it run for a long time. I don't see how Julia will be different for this kind of tasks. I understand that in Julia you solve the 2-language problem and all the goodies that multiple dispatch brings but the Python ecosystem progressed a lot in the past 2 years, now you have Numba JIT, Jax JIT, PyTorch JIT, XLA JIT and many other proprietary JITs that are not open-sourced. Since JAX (as an example) is mostly numpy and Python, you can leverage your existing knowledge instead of having to learn a fundamentally new paradigm. I would say that Python has many "specialised" JIT engines and it seems to work great for the community. Don't get me wrong, Julia is interesting, I can't deny that but I expect a huge adoption period for it. It can find its niche as C++ did for extremely high performance computing or Scala for Big Data (though Java is starting to replace many use cases). If you ask me now, I would say that the world converges around Java, C++ and Python when it comes to data, the old trio and it will remain this way for at least another decade.
> but the Python ecosystem progressed a lot in the past 2 years, now you have Numba JIT, Jax JIT, PyTorch JIT, XLA JIT and many other proprietary JITs that are not open-sourced.
Python has a bunch of specific use-case, non-interoperable limited JIT’s that you have to learn separately.
Not the case with Julia: you write some arbitrary code, it gets optimised, so much simpler. The Julia community did some cool things with Flux where complex and field-specific equations were dropped wholesale into neural-network definitions without having to rewrite anything. That sort of power is invaluable.
Not OP, but I absolutely prefer R, the Scheme inspiration is obvious and allows for flexibility completely impossible in Python (here's to PEP 638, but there's a ton of hostility to it from what I can tell).
I also really really really (really) like Julia, but don't quite think it's there yet. I'm optimistic though, these things take time.
It's growing in a sprawling, disorganized fashion. It's developing generics (with horrid ambiguous syntax) for semantic typing that's not generally used outside some applications. The walrus operator is obscene.
This is the very recent trend. I think it's a global wave, most languages accelerated pace in the last decade. es6, php, heck.. even java went chrome-speed now. Can be a bit messy yeah.
Rust is hands down my favourite language and it’s what I write my personal projects in these days. Julia is second. Rusts philosophy of “the complexity doesn’t go away, so just be aware and pay it up front” and the type system and compiler feel aaaamazing to use once you grasp them: writing some code, knowing exactly how and where the failure points are and having it compile and know that it will be correct is hard to go back from.
I used to be a huge Python fan, and I’d dug through docs and guides for pretty much everything I could, so my frustrations aren’t “outsider criticisms” as such.
My biggest issue is the amount of “magic” that goes on and is actively encouraged; it happily lets you get away with anything, and I’ve read and had to fix too much horrible Python code that technically does what’s required, but in the most torturous and difficult-to-untangle manner possible. I’ve come to resonate with this idea of “what does the language/tooling encourage you to do” and in my experience, Python doesn’t encourage a lot of good things, but it does end up encouraging you to “hack around problems” rather than fixing them at their root, lean on magic wherever possible (which is never backed up by any kind of correctness guarantee) and let the programmer do whatever they want, regardless of how much of a bad or non-idiomatic thing it is. The “type system” leaves a lot to be desired, there’s optional type hints
now, but the larger community seems ambivalent at best-uptake is glacial in my observations and mypy is just sort of ok. The performance is pathetic, there’s no getting around that, and the response of “just write it in C if you need speed” is a poor answer. The Python core dev team seems insistent on continually stacking pointless new features in (walrus operator why?) whilst simultaneously not really doing anything about real issues (like the packaging situation). I’ve also come to really dislike exception-based error handling: having no compiler, type-checking and knowing that anything could explode anywhere isn’t a reassuring feeling once your codebase gets big enough. Yeah you can put try-catch, and code defensively to head-off issues, but it doesn’t take much before you’ve spent as much time and energy doing that as it would have taken to write it in a more suitable language but with maybe 1/10th of the guarantees and none of the performance.
If you want to write a web API, you’d be better off writing something in Golang, .Net, Typescript on NodeJS, possibly even Swift.
General purpose stuff you could replace with any of those languages + Rust. Admittedly it does still have prime position for ML frameworks, but I’d be using Julia at work if it was up to me.
Edit: a sibling comment mentioned Async in Python - an experience that was so frustrating I’d excised it from memory.
Yeah, I've used Python in university and before, for ML and DL, for scripting, at work... it's very inconsistent and annoying. The package management situation is horrible. Yet it's seen as this magical beginner friendly and clean or even beautiful and elegant language while the reality is quite different.
How are you finding it? I’d like to think that with some suitable package evolution/development doing production ML stuff in it would actually be pretty reasonable.
What packages are you using? Linfa looks like it’s developing strong legs and SmartCore seems be ticking away in the background quietly...
yeah I keep up with the Linfra group, they are making steady progress. I hadn't seen SmartCore yet but that looks promising.
I mainly use tch-rs which are just bindings around libtorch, there are a couple edges wrapping c++ (function overloading) but overall it works great. I've also used ndarray a fair amount which is nice.
It’s mostly a personal favourite, but once Ballista [1] gets a bit more developed, I expect we’ll tear out our Java/Spark pipelines and replace them with that.
The ML ecosystem in Rust is a bit underdeveloped at the moment, but work is ticking along on packages like Linfa and SmartCore, so maybe it’ll get there? In my field I’m mostly about it’s potential for correct, high-performance data pipelines that are straightforward to write in reasonable time, and hopefully a model-serving framework: I hate that so many of the current tools require annotating and shipping Python when really model-serving shouldn’t really need any Python code.
It is somewhat circular: it was preferred because your earlier alternatives were Java or C(++) both of which had their shortcomings. SKLearn is still one of the most feature-complete and powerful libraries and it was Python only and thus drew a crowd. A lot of people who write data science code, I would be confident to bet that if you taught them Julia first, they’d prefer that.