qexat's blog

My plans for Violette

A week ago, I shared on Bluesky a link to a new GitHub repository of mine, for the implementation of a programming language called Violette (link). Given the unusually high number of likes on the skeet (link), I thought y'all would be interested in what I would like to do with said language. So the present post details my plans for it, although it's likely that this becomes the first part of a series.

Like YouTubers say, without further ado, let's get into it.

A universal token structure

The idea is as follows: basically, there is one universal tokenizer that produces a universal token type ; each front-end signals the token types that they support by registering them to a global table. If an unsupported token is found1, we report a syntax error and suggest the extensions2 that support it.

Two-phase tokenization

In order to produce better error messages, tokenization will happen in two phases: the first one is extremely tolerating and the second checks its output more strictly. Why this helps with reporting: for example, when we are scanning a numeric literal, instead of stopping at the first character that looks incorrect, we can just eat it anyway in the first phase and report all the incorrect ones in the second phase instead ; less syntactic back-and-forth for those who are trying out the language (or those having a lil' skill issue), especially since we do not have tooling yet.

Custom infix operators

As of writing this, Violette does not have any infix operator at all. It is basically just a λ-calculus on (light) steroids. However, we obviously want the language to be useful in the future, so we will have to add them at some point.

The thing is, I am not a fan of hardcoded operators. They leave you with a specific set of privileged types that will enjoy them, unless you want the user to be able to use them as well for their own types, in which case you are forced to incorporate ad-hoc polymorphism in some way. I would like to keep the planned (limited) trait system for an extension, so we need to find an alternative solution for the base language. That answer is let the user define their own operators.

Syntactically, I don't see a reason to prefix identifiers with an underscore (Violette might have a proper visibility system anyway), so infix operators will be recognizable when standalone by one of these on each side surrounding the lexeme. Plus, Agda has shown us that a custom operator system is far from being incompatible with typeclasses, so it won't hinder the related language extension2.

A first-class module system

Another thing that I would like to have is a module system where modules are first-class expressions. I thought of reusing the block expression syntax prefixed with a keyword — i.e. module { expr1 ; ... ; exprn }. This makes sense to me because in Violette, everything — and I mean it — is an expression. Indeed, even files are expressions — I don't want to special-case them as modules as my belief is that files are an operating system abstraction, not a build system one.

Anyway, there is currently no namespacing mechanism, which means that every Violette file is functionally forced to be a program. What I expect libraries to be then is a file consisting of a module expression containing the desired functions and other declarations. Noteworthy, I haven't really thought yet about how side-effects in modules will be treated.

Module types

Since I want to statically type Violette, there should also be a way to encode the type of our modules. This is especially important and useful given that modules are truly first-class (no wrapping in OCaml fashion), and as such, functions will be able to take them as arguments and return them.

A visibility system? Maybe?

Another criticism I have with OCaml modules is the lack of granularity of interfaces3. As such, I have had having a simple visibility system in my notes for a while now. It would be allowing a per-definition public/private modifier, applicable on any binding expression. Furthermore, it should be mentioned that extra care will have to be taken regarding member visibility and potential leakage, notably with types and their constructors.

I would like to say that, being unsatisfied, I am quite circumspect in implementing that idea. I think it is something I will have to revisit later ; I guess we could just fall back to the OCaml way in the meantime.

A small language for the toplevel

Having a REPL is another opportunity to make a language, one for commands ; you know me, I'll take it. It will let the user enable and disable language extensions, use different backends, clear the console, manage the history and cache, and so on.

A featureful version of the toplevel

Speaking of the toplevel, I would love to have features like advanced line editing and code highlighting. However, I am not familiar with Windows, so it is likely that it won't be available on there ; I did exactly that in the past with the esoteric Vism (god, this thing is already two years old...).

Closing words

– I like type theory.
– Yet you did not mention anything adjacent in this blog post. Curious!

Honestly, this post exists really to declutter my mind from all the aforementioned ideas. I have had some thoughts on the type system, but I wanted to focus first on the infrastructure-oriented pieces, before plugging in semantic analysis. That's why I said in the beginning that there will be probably more of these as I continue to work on the compiler.

I think that is all. If you like my work, please consider supporting me on Ko-fi (link)! If you are a company with compiler-adjacent open positions, please hire me (email)! You can also follow me on Bluesky (link).

Interested in programming language development and you want to learn more about it? Join our awesome community of more than six thousand members (link)!


Addendum: what goals?

This post has pretty much been a list of either features or implementation details I want to add to Violette's compiler. You might have nodded at some of them and frowned at others. If you are familiar with the GHC Extension Circus®, you might have done the latter in particular when language extensions were mentioned. So let me explain.

I don't think Violette is my dream language ; I don't consider myself capable of building that ideal today. Instead, I am using it as a playground to experiment various ideas that aren't just a pile of small projects compiled in a repository4, but switches that I can turn on and off while still enjoying a relatively correct experience with a reusable set of tools that allow me to write programs.

I think that extensions, while opt-in, will be subtractive. Unless I think of something better, λ-core will implement all the features at once, so extensions will be mostly syntactic. That said, this imposes the constraint that they are necessarily all compatible with each other (unless I give up on the difficult idea of having relatively decidable type-checking).

  1. That could be either during an intermediate phase between tokenization and parsing, or during parsing itself.

  2. See the addendum.

  3. If you want to make some members private, or simply follow the convention of having the documentation and the type annotations in a .mli file, you will probably end up with a lot of redundancy, such as with public type definitions. Jane Street has a workaround for this using a intermediate module, but frankly, I find it utterly disgusting (sorry Jane Street, also please hire me to work on a compiler!).

  4. To be clear, this is not a slander to language gardens. I think that they are extremely cool. I'm eternally grateful to their authors as they help(ed) me learning a ton.

#ML modules #OCaml #PLDI #elaboration #parser #programming language #type systems #type theory