No, this comes from interacting with the community, companies, and large projects throughout the years, followed by research, publishing of papers, and careful analysis on the costs and benefits of introducing said feature! Only then we added it.
I reflected more and I think I just prefer languages with a formal spec. The GitHub projects are just going to evolve over time to wherever the wind blows, while a spec captures a concerted effort to unify in time.
> Honest question, in the era of vibe and AI assisted coding is there any advantages of using untyped programming languages, apart from the fact that non-typed languages has more traning data for the LLM?
Author here.
Type systems restrict which programs can be expressed and increasing expressiveness often requires increasing type-system complexity (which, speaking from experience, both humans and agents will struggle with). Plus they are not the only mechanism to assert correctness (they only validate a subset of your program correctness and do not replace tests) and you are still on your own when it comes to actually recovering from unexpected errors (something Erlang/Elixir were designed for).
I'd say there are two flip sides to your question:
1. Given types do not replace tests, if you can use AI to automate full test coverage, are there actual benefits in static typing for coding agents? The downside of tests for humans is that we suck at writing them (but guided agents can do better) and they can take time to run (which agents do not care)
2. Do we actually have any data or evaluations that show which typing discipline is better for agents? The only benchmark I am aware of [AutoCodeBenchmark] has Elixir come first (dynamic) and C# as second (static), so it doesn't answer the question. There are other benchmarks that show dynamic languages require fewer tokens to solve problems (but that's not a metric I particularly care about)
My gut feeling is that local structure, documentation, quality and quantity in the training data, etc are likely to play a more important role than typing for coding agents. I'd also love to measure how agents perform on specific domains. If you are writing concurrent software, how does Elixir/Java/Rust/Go compare? But without data, it's hard to say.
> 2. Do we actually have any data or evaluations that show which typing discipline is better for agents? The only benchmark I am aware of [AutoCodeBenchmark] has Elixir come first (dynamic) and C# as second (static), so it doesn't answer the question. There are other benchmarks that show dynamic languages require fewer tokens to solve problems (but that's not a metric I particularly care about)
I am actually writing a paper on this right now so nothing I can point you to yet but yes. LLMs are better (produce working code in fewer attempts controlling for the relative size of training corpus) when using type systems with inference and global unification. It is largely about the quality of the error feedback channel so languages with very good compiler errors (accurate, localized, include the correction with the failure) can close a lot of ground.
But inference + sound type system gives you a constraint propagation that genuinely restricts the ability of the LLM to get into trouble. Type systems that require annotation give up most of the benefit, since the annotations are themselves surface area for LLM mistakes. Unification also puts heavy limits on the expressiveness of the language which is a confounder and may actually be a big part of the benefit too.
Everyone has been on the "the training data is better" thing but I actually don't think so. All of the languages that people report as being better because of good training data actually have fairly restrictive type systems. Elixir is an exception, but it has exceptionally good error messages! And also, along with erlang, pretty unique runtime semantics that may contribute but that's outside my domain I'm on type systems. Debunking the training quality thing is not what I'm working on but I have deep suspicions about that common wisdom.
That’s very exciting! Is there anywhere I could follow you for updates? If you don’t want to share it publicly, and is ok with sharing it privately, my email is my username on gmail. Thank you!!
In my experience restricting programs that can be expressed is a good thing, even more so with agentic engineering. The more guardrails there are, strong typing/TDD/computer use/..., the solution space shrinks and chance of a robust solution increases. Sure maybe this burns more tokens going in circles but it feels less like a slot machine more like a robot searching for a solution for a well-defined problem.
Devs have very strong opinions about dynamically typed programming languages. But reasons such as "exploratory programming", "expressiveness", "taste" that makes them feel good to program in for humans does not matter for agents. Agents don't care that the language "limits them" and prevents them from expressing the code in a succint way because it would not type check.
Agreed on the guardrails bit. My point is that we still don't have much evidence that static types are an effective way to constrain the search space for coding agents, or how much value they add on top of other mechanisms. Redundancy can certainly be beneficial, but how much and at what cost?
On expressiveness, people often frame it as a dynamic-language goal, but a large portion of type system research is precisely about making type systems more expressive so they can describe a wider range of programs and invariants. This is clearly something both camps value. I suppose another interesting benchmark could be: how do coding agents perform across languages with different degrees of type-system expressiveness?
We may directionally agree, but it is hard to draw conclusions without measurements. Overall, I'd say this is much more of an open question than people give it credit for.
> if you can use AI to achieve full test coverage, are there actual benefits in static typing for coding agents?
Full test coverage doesn’t tell you if the tests behave correctly. So you could prompt an AI agent to write 100% test coverage where those tests could be exercising all code paths yet contribute 0% to the story of what the code does. You need human understanding of what the desired contract is that the tests check.
Imagine a contract lawyer who blindly signs any contract that they are given: they aren’t doing their job. They ought to have an idea in mind of what their client’s goals and limits are so they can determine if a given contract fulfils those needs.
Types are a declarative contract, so they can be a lighter yet more limited way to enforce a contract. The compiler can verify if all the declared types across the program agree with each other. This is especially helpful with refactoring, such as ensuring the adding a field has been rolled out everywhere.
Types aren’t to be just checked by the compiler, but checked by the human authors too. That’s why explicit type signatures are valuable, especially if they are kept intelligible. They encode the different variations in state and possible branching on that state. So you can whittle your types down as a way of whittling the solution down to be more focused. The problem in your head is reflected in the types, and any simplifications in the types then simplify the problem in your head, and any tests derived from that understanding.
I don't think these articles fully cover (pun intended) the claims being made.
First of all, we need to separate "types" from "static type checking". Elixir always had types and types by themselves won't eliminate tests. You can combine types with type checkers, as well as tests themselves (as described in the first article), to aid software verification. Plus many of the techniques discussed in the article (property-based testing, static analysis, etc) are available to dynamically typed languages too.
Some notes on the first article:
> For example, there is no test we could write that would show that our function never throws an exception or never goes in to an infinite loop, or contains no invalid references. Only static analysis can do this.
Static analysis is doing a lot of heavy lifting here. When applied to type checking, where it can prove absence of exceptions depends entirely on how expressive the type system and checker are.
For example, this Haskell function can fail at runtime even though it type checks:
maxPosInteger :: (Ord a, Num a) => [a] -> a
maxPosInteger xs = maximum (filter (> 0) xs)
If `xs` contains no positive elements, maximum fails. The type system does not rule this out.
As the article itself later discusses, proving stronger properties requires more expressive type systems, such as dependent types. Those systems can prove the absence of additional classes of failures, but they come with their own costs in complexity, ergonomics, inference, compile times, and so on. My recent ElixirConf talk touched on these trade-offs: https://www.youtube.com/watch?v=Ay-gnCqDw9o
But overall the article does not discuss coverage. Under some of the scenarios it presents, such as finite domains, exhaustive testing guided by coverage can prove the absence of bugs too. Additionally, some of the concerns the article has about Python, such as runtime redefinition and excessive polymorphism, do not really apply to dynamic languages like Elixir and Clojure.
> Correctness oracles abound. We have test suites, fuzzers and property-based testers, runtime sanitizers, static analyzers, linters, strong type systems, and formal verifiers. Any time such a tool can be made available to the LLM, we’ll reap the benefits in terms of not dealing with bugs the hard way, later on.
I completely agree with that framing. Static type systems are valuable tools, but they're one tool among many. My overall point is that I wouldn't draw the line at static typing as the "must have" mechanism for software quality, especially in the context of AI-assisted development where multiple correctness oracles can be composed together.
>Type systems restrict which programs can be expressed and increasing expressiveness often requires increasing type-system complexity (which, speaking from experience, both humans and agents will struggle with). Plus they are not the only mechanism to assert correctness (they only validate a subset of your program correctness and do not replace tests)
This articulates a lot of my own thinking wrt type systems, speaking as a downstream user without a lot of exposure to prog language theory, and I wish this debate were more often framed in these terms.
Another reply to this comment hinted that it might be more about giving LLMs feedback loops and that to me also seems like a more likely mechanism.
I'm not an elixir user but I've watched it from a distance over the years – thank you for your efforts and your experimentation.
>Type systems restrict which programs can be expressed and increasing expressiveness often requires increasing type-system complexity (which, speaking from experience, both humans and agents will struggle with).
I used to hold similar opinion but D language, and this article by Patrick Li (HN JITX co-founder) who's the original author of little known but very powerful language Stanza changed my mind [1],[2].
He argued that Ruby has enabled a very expressive language that enabled RoR, and when it was originally written other languages are less capable, and accordingly the proof is in the pudding.
In his new language Stanza for his PhD thesis he has designed an optional typed system supporting both typed and untyped, it seems very similar in concept to the OP article that you've written on Elixir. Groovy also deserved a special mention, and the pudding is Grails.
Interestingly both Elixir and Stanza have GC, but Stanza also support non-GC namely LoStanza in which Stanza GC is written.
Interestingly, D language pioneered this combination both GC (by default) and non-GC more seamlessly, even before Stanza.
In addition to Ruby, these four languages namely Elixir, Groovy, Stanza and D all have similar to or better expressive power than Ruby. Notably both Stanza and D are compiled languages. Above all D is an anomaly in a good way since it's a fully type programming language. Kudos to Walter and the team for giving birth to a highly expressive fully typed modern language, very fast in compilation and runtime, truly one of a kind [3].
Regarding the issue of comparatively smaller corpus for these languages as mentioned by others, I think the new self-distillation technique for LLM and code generation as proposed by Apple, MIT-ETH and UCLA can overcome this limitation [4].
> Groovy also deserved a special mention, and the pudding is Grails.
I vaguely remember that when Groovy became more typed (statically typed that is. I believe you could always put the types in but they were not checked.) there was a theory that it kind of hurt possible uptake of the language.
The reason being is that people felt well if we are adding types and a project is requiring it why don't we just use: Java, Scala, Kotlin etc. Like did Java getting more features or Kotlin coming really hurt Groovy or just that it became more of a typed language.
An analog (typed language stealing users) could happen to Elixer but I'm not really sure which language it would be.
> I think the new self-distillation technique for LLM and code generation as proposed by Apple
Speaking of Apple and eventual typing Dylan was an amazing language that just never got traction. Open Dylan still exists but few know about it. Its eventual typing is unique because Dylan does CLOS-like multimethod dispatch instead of pattern matching.
> Groovy also deserved a special mention, and the pudding is Grails.
Not sure it is much of a success. Groovy gets unreadable very fast, and the editor won’t help you. Gradle moved to Kotlin, and it’s 10x better in readability and maintainability.
One thing something like AutoCodeBenchmark cannot demonstrate is what happens when you have human-written type definitions defining the domain before the LLM writes a line of code.
That is something I have found very effective in F#, that I model the domain with types, I know what the type signatures of the functions I need are, and the LLM does the work of actually implementing those functions.
Here is a concrete example:
I have been playing around with a program to assist me with projects I make at home on my hobby-grade CNC router, which does not have an automatic toolchanger. I use a mix of Vectric VCarve and some older handwritten programs to generate GCode files. I end up with a USB drive with maybe 6 to 12 GCode files on it and a model in my head of "to make this product, I start with a board here, gotta install this square nose end mill and zero on this corner of the board, run files A and B. Then install a ball nose end mill and run file C. Then flip the board over lengthwise, switch to a smaller square nose end mill, zero here, run file D. etc. etc."
Although I try to name the GCode files in a self documenting way like 01_TopSide_25square.ngc, if I come back in 1 year and want to make the same thing again, I pretty much always have to open VCarve and eyeball what the hell all the files did and confirm where to zero, what size board to use, etc. So I'm making a tool where I can define those human-operator steps that go with the G-Code files, save it as a "project file", preview in 3d what each step will look like, and export to a printable PDF with screenshots and step-by-step instructions. Hopefully this will reduce the amount of rot that these projects suffer and the cognitive overhead of picking up an old one.
Modeling the steps as F# types was the very first step, like (small excerpt):
type WorkpiecePlacement =
{ Id : WorkpieceId
/// Corner of the workpiece we'll attach to the machine.
WorkpieceCorner : WorkpieceSpace.Corner3D
/// Point in machine-space we'll anchor this corner to.
MachinePoint : MachineSpace.Point
/// Which face of the workpiece is on top.
FaceUp : WorkpieceSpace.Face
/// Rotation around the up-axis.
Yaw : WorkpieceSpace.Yaw
}
type OperationType =
| PlaceWorkpiece of placement : Operation.WorkpiecePlacement
| InstallTool of id : ToolId * slot : int option
| ZeroAt of point : MachineSpace.Point
| RunGCode of source : GCode.Source
| RemoveWorkpiece of id : WorkpieceId
For the GCode simulator I needed a parser for GCode files, which produces a type with 1:1 equivalence to the GCode instruction set:
type GCodeInstruction =
// --- Motion ---
| G0_RapidMove of axisMoves : (Axis * float<gcodeunit>) array
| G1_Move of feedRate : float<gcodeunit/minute> option * axisMoves : (Axis * float<gcodeunit>) array
| G2_ClockwiseArc of ArcParams
| G3_CounterClockwiseArc of ArcParams
| G4_Dwell of seconds : double
// --- Plane selection ---
| G17_SelectXYPlane
| G18_SelectXZPlane
| G19_SelectYZPlane
// --- Unit selection ---
| G20_Inches
| G21_Millimeters
// --- Distance mode ---
| G90_AbsoluteDistance
| G91_RelativeDistance
// ... etc truncated, more instructions in real code
But my tool supports doing transforms on toolpaths, like rotating 90 degrees or offsetting so I can easily define that I want to make tiling copies of the same project.
To implement those transforms straight up as GCodeInstruction[] -> GCodeInstruction[] is a bad call. GCode is very stateful and lets you switch units, relative vs. absolute coordinate spaces, etc. in instructions. That makes the transform awkward and tricky to write.
So I have a ToolPath type that makes the transforms clean. It normalizes the many ways of expressing the same toolpath in GCode to a single representation with all absolute coordinates in metric units.
type ToolPathInstruction =
| Rapid of From : Point * To : Point
| Linear of From : Point * To : Point * Feed : FeedRate
| Arc of
From : Point *
To : Point *
Center : Point *
Plane : Plane *
Direction : ArcDirection *
Feed : FeedRate
| ... etc truncated
That is the appropriate level for the transforms like offset, rotate, scale, etc. to operate on.
Yet there is still ANOTHER level of toolpath-related operations that deserves its own type. When I'm doing simulation of material removal to check for crashes, or rendering the toolpath in 3d, I don't want to deal with arcs! The rendering/simulation is inherently an approximation. It will break down each arc into line segments. So sim code and rendering code shouldn't take a toolpath, it should take basically a line segment list, or in other words...
type ApproxMove =
{ From : Vector3
To : Vector3
FeedRate : double<m/minute>
IsRapid : bool
}
type ToolPathApproximation =
{ StartPosition : Vector3
Moves : ApproxMove[]
}
Having defined all these types it's clear that I need operations like:
And so on. An LLM is absolutely awesome at one-shotting the implementations.
I would find it quite frustrating trying to model the same domain without any types, either having all methods working on a single toolpathy data structure that's not really the right fit for any of the places it's used, or having them work on multiple data structures without any clear delineation of which layer is expecting which toolpathy-thing that are all subtly but importantly different.
You are mixing runtime and compile-time dependencies. Runtime dependencies (circular or not) have no impact on compilation performance and stability. Phoenix does include one circular dependency (the layout is rendered by your endpoint and it references your endpoint) but it is a runtime one.
I spent 3 months analyzing failures caused by - what looked like - dirty builds but was caused by unstable compilation order. Which is quite obvious.
The solution is dynamic dependency resolution but this causes problem with macros.
The problem is easy to validate. Compile application multiple time and compare hashes. I'm not sure if it's sufficiently visible in bootstrapped Phoenix but I saw it in as small as <1000 LoC toy apps.
That can be a concern indeed but it is worth noting that strong arrows compose/propagate. So if you have a function without guards that calls a function that guards on said types, the caller is also strong! We will likely have mechanisms to measure "strength" when we introduce type annotations.
Is it fair to think of this as the ability for type information to be propagated in both directions, e.g. both up and down the callstack? So callees down the callstack may receive any type information the caller might have, while callers up the stack may also receive any information callees further down the stack might have? Please correct me if my understanding of what you wrote is way off base!
Elixir is gradually typed, while Gleam is fully statically typed.
Elixir's type system does not have generics, while Gleam's type system does.
Elixir has a powerful macro system, Gleam has no metaprogramming features.
Elixir’s compiler is written in Erlang and Elixir, Gleam’s is written in Rust.
Gleam has a more traditional C family style syntax.
Elixir has a namespace for module functions and another for variables, Gleam has one unified namespace (so there’s no special fun.() syntax).
Gleam standard library is distributed as Hex packages, which makes interoperability with other BEAM languages easier.
Elixir is a larger language, featuring numerous language features not present in Gleam.
Elixir has an official test framework with excellent support for concurrency, partitioning, parameterized tests, integrated error reports, and more. Gleam has no official test framework, but there are multiple community-maintained frameworks.
Both languages compile to Erlang but Elixir compiles to Erlang abstract format, while Gleam compiles to Erlang source. Gleam can also compile to JavaScript.
Elixir has superior BEAM runtime integration, featuring accurate stack traces and full support for tools such as code coverage, profiling, and more. Gleam’s support is much weaker due to going via Erlang source, resulting in less accurate line numbers with these tools.
Elixir and Gleam both use Erlang's OTP framework. Both have additional modules for working with OTP, which provide APIs more in the style of each respective language. Both common use Erlang's OTP APIs directly, but Elixir can do so more conveniently and concisely due to having a less-strict type system.
Elixir currently has superior deployment tooling, including support for OTP releases and OTP umbrella applications.
Gleam’s editor tooling is superior due to having a more mature official language server, but Elixir has recently announced an official language server project which is in active development.
Elixir is more mature than Gleam and has a much larger ecosystem.
Gleam and Elixir compile at similar speeds due to using the Erlang compiler as their compiler backend. Elixir's macros are evaluated at compile time, so a program that uses macros will take longer to compile the larger the amount of work performed in macros. Gleam has no language features that result in slower compilation.
> some stuff being sold as "all these libs/packages that haven't had any updates for over a year is fine because Elixir" I just don't buy it
I maintain more than 20 packages and, except for the major ones, like Phoenix and Ecto, they haven't been updated in more than a year and yes, they are all fine.
The language has been extremely stable. There has been almost no breaking changes in over a decade. Case in point: we introduced a whole gradual type system without making any changes to the language surface! The language is still on v1.x!
The syntax you are commenting on has always existed in Elixir, before v1.0, as part of patterns and guards.
You are commenting as if we added this now but we have made no changes to the language surface. The difference is that we now leverage these same language constructs to extract precise type information.
The team that built Erlang (Joe, Robert, Mike, and Bjorn) didn't know the actor model was actually a thing. They wanted to build reliable distributed systems and came up with the isolated processes model you find in Erlang today. Eventually (probably when Erlang was open sourced?), folks connected the dots that the actor model was the most accurate description of what was going on!
reply