Unison langage

lundi 11 janvier 2021 • 17 minutes de lecture

Let's talk about the Unison language.

Unison is a statically typed langage whose main influences are Haskell, Erlang and Frank. It is built upon a simple idea that has really appealing consequences:

Code is content-addressed and immutable

In many langages, the main way to identify something is its name. In Unison, definitions are identified by a hash computed after its content. For instance, those two definitions below have different names but their body is identical. So they share the same hash.

inc x = x + 1

add1 z = z + 1

For each hash, an AST is stored in the unison codebase.

In fact, the text representation that we have seen above is not the primary representation for unison code. The AST is. Text is one of many possible way to create/view/update unison code.

For instance, if we need our previously defined inc function to define something else, its name will be substituted by its hash in the resulting AST.

add2 x = inc (inc x)

Assuming that the corresponding hash for inc is #12345 , the below code will be preprocessed as this:

add2 x = #12345 (#12345 x)

Then, argument names are substituted by normalized ones:

add2 arg1 = #12345 (#12345 arg1)

With this hash based representation of our definition, we can compute an AST that is independant of names declared by developers.

This implies that we can change the name inc for another name without changing the definition of add2 .

Also once inc has been compiled, if you change its name, then is hash will not be modified. This mean that you do not need to recompile that function or other already compiled definitions that depends on it!

Installation

It is much more fun to try those things on your own machine while following along this article. So I suggest to do so, knowing that Unison is really easy to install, so I invite you to do so:

OSX (with homebrew)

brew tap unisonweb/unison
brew install unison-language

Once Unison is installed on your machine, you need to create a directory that unison will be launched against and cd inside it.

mkdir Unison # create a dir to start with
cd Unison

then initialize unison and launch the unison codebase manager

ucm init # init the codebase
ucm # entering unison

You should see something like this:

   _____     _             
  |  |  |___|_|___ ___ ___ 
  |  |  |   | |_ -| . |   |
  |_____|_|_|_|___|___|_|_|
  
  Welcome to Unison!
  
  You are running version: release/M1m
  
  I'm currently watching for changes to .u files under ~/Code/Unison
  
  Type help to get help.

.>

Once in the Unison REPL is launched, we import the base library.

.> pull https://github.com/unisonweb/base:.releases._latest base

And we are good to go!

The unison codebase manager

Once you have launched ucm, any file within the current directory is watched. Let's first create a file named scratch.u with our inc definition in it.

inc x = x + 1

When saving this file, you can see the output of the ucm process.

I found and typechecked these definitions in scratch.u. If you do an `add` or `update`, here's
how your codebase would change:
  
  ⍟ These new definitions are ok to `add`:
    
    inc : Nat -> Nat

Indicating that our definition is valid.

Now, if we want to try our inc function, we can add a watch expression. To do so, just add the following line at the end of scratch.u (you have to include the '>' symbol).

> inc 10

On saving the file, ucm will spot it and display:

Now evaluating any watch expressions (lines starting with `>`)... Ctrl+C cancels.


    3 | > inc 10
          ⧩
          11

Since our inc function seems to work as intended, we can add it to the codebase.

At the ucm prompt just type add:

.> add 
  
  ⍟ I've added these definitions:
  
    inc : Nat -> Nat

Now that our inc function is in the codebase, we can delete the content of scratch.u.

We can ask ucm to find it like this:

.> find inc

  1. inc : Nat -> Nat
  2. base.Int.increment : Int -> Int
  3. base.List.distinct : [a] -> [a]
  4. base.Nat.increment : Nat -> Nat
  5. base.Store.modify.examples.increment : Nat

In the list of results, the first item is our function. We can ask ucm to view it.

.> view 1

  inc : Nat -> Nat
  inc x =
    use Nat +
    x + 1

This is it! Enhanced with its infered type declaration and import statement (+ function from module Nat).

The find operation is handful to explore the current codebase. For instance, we can search for all definitions of a given type:

.> find : Nat -> Nat

  1.  base.Nat.complement : Nat -> Nat
  2.  base.Nat.increment : Nat -> Nat
  3.  base.Nat.leadingZeros : Nat -> Nat
  4.  base.Nat.trailingZeros : Nat -> Nat
  5.  base.Nat.decrement : Nat -> Nat

Changing names

Let's define another useful function:

scratch.u

add2 x = inc (inc x)

Once add2 is added to our codebase (with the add command), we can retrieve its definition with the view command

.> view add2

  add2 : Nat -> Nat 
  add2 x = inc (inc x)

We suddenly decide that we want to change the name of the inc function to add1.

In any other langage we would had to navigate to the inc definition, change its name, then change all its usages accordingly, but in unison we only have to run this simple ucm command:

.> move.term inc add1

As we can see, the add2 definition has been correctly updated:

.> view add2

  add2 : Nat -> Nat 
  add2 x = add1 (add1 x)

No recompilation is needed neither, it just works!

Navigating the history

With immutability comes great benefits, as functional programmers often says.

Since our codebase is always a well-typed immutable value, we have time travel for free.

.> reflog

  Here is a log of the root namespace hashes, starting with the most recent, along with the command
  that got us there. Try:
  
    `fork 2 .old`             
    `fork #fpkooer5hq .old`   to make an old namespace accessible again,
                              
    `reset-root #fpkooer5hq`  to reset the root namespace and its history to that of the specified
                              namespace.
  
  1.  #fiviinv92c : move.term ..inc .add1
  2.  #1mlujphnkc : add
  3.  #cb6lu9ahad : add
  4.  #lqs21dco5d : pull https://github.com/unisonweb/base:.releases._latest .base
  5.  #7asfbtqmoj : (initial reflogged namespace)

Documentation

in Unison documentation is first class, it has type Doc and a dedicated literal.

As an example, let's write documentation for our inc function:

inc.docs = [:
  `@inc n` returns `n + 1`

  the source for @inc is
  @[source] inc

  the result of evaluating `(@inc 1)` is @[evaluate] inc.ex1

  its signature is:
  `@[signature] inc`

  :]

You can then display this docs in ucm:

.> display inc.docs

  `inc n` returns `n + 1`
  
  the source for inc is
  inc x =
    use Nat +
    x + 1
  
  the result of evaluating `(inc 1)` is 2
  
  its signature is: `inc : Nat -> Nat`

To get more details on documentation inner syntax please see this page.

One neat thing to keep in mind is that the names you use in your docs have the same behaviour in case of renaming. For instance if we rename inc to add1 with:

.> move.term inc add1

and then display the doc again:

.> display inc.docs

  `add1 n` returns `n + 1`
  
  the source for add1 is
  add1 x =
    use Nat +
    x + 1
  
  the result of evaluating `(add1 1)` is 2
  
  its signature is: `add1 : Nat -> Nat`

Every dynamic value (the one starting by @) are updated accordingly.

Abilities

Abilities lets you use the same ordinary Unison syntax for programs that do (asynchronous) I/O, stream processing, exception handling, parsing, distributed computation, and lots more...

Hello world!

scratch.u:

greet name = printLine ("Hello " ++ name ++ "!")

program = '(greet "world")

ucm:

.> run program

Hello world!

When saving our scratch.u file, we can see the type of the greet function:

greet : Text -> {IO} ()

which means that the greet function takes a Text as argument, needs the IO ability and return nothing (() or Unit).

The functions that we have seen up to this point are pure (inc and add2). If ucm display their type as Nat → Nat, their real type is Nat → {} Nat, indicating that no abilities were needed.

A function from type a to type b that needs several abilities A1, A2 and A3 will have the type like a → {A1,A2,A3} b

Motivation

There are three main motivations for abilities in Unison:

Make side effects visible in the type signature.
Decouple interface from implementation.
Customize control flow.

Lazyness

As you may have noticed in the hello world example, the '(greet "world") expression was prefixed by a single quote. This is syntax sugar for (_ → (greet "world")) which denotes a function taking () as argument and returning the result of evaluating (greet "world").

scratch.u :

expr = '(1 + 1)
-- is equivalent to 
expr = (_ -> (1 + 1))

-- and can be resumed/evaluated like this 
> expr ()
-- or with syntax sugar 
> !expr

This kind of quoting is often used in conjunction of abilities, as we have seen with the hello world example.

This is because only functions can use abilities:

Unison is a 'purely functional' language. That means that evaluation of terms cannot in itself be effectful. Having to add the ' delay to effectfully-computed values is a consequence of that. Any effectful code needs to pull its punches — it's not causing effects to happen during evaluation, but rather it's evaluating to a function which can then explain to a handler what effects it would like to be performed. We'll see how that handler in turn can only either (a) turn the requested effects into pure computations, ready for evaluation, or (b) pass the buck, by translating them to requests in another ability, yielding another function. If there are abilities like network or disk I/O, then the buck gets passed all the way to the very outside of the program, and into the Unison runtime system, using the IO ability. At no point does a Unison term's evaluation ever directly generate an effect.

the IO ability

IO is a special ability, built in to Unison. It is through IO that effectful Unison programs can actually interact with the outside world — writing to the console, reading from files and sockets, and any other behavior that goes beyond simply evaluating Unison expressions to a result. When you run your program (for example using ucm's run command), operations from the IO ability are handled by the Unison runtime system.

Asking ucm about IO:

.> view IO

  ability base.io.IO where
    getFileSize_ : FilePath ->{base.io.IO} Either Error Nat
    kill_ : ThreadId ->{base.io.IO} Either Error ()
    send_ : Socket -> Bytes ->{base.io.IO} Either Error ()
    bracket_ :
      '{base.io.IO} a -> (a ->{base.io.IO} b) -> (a ->{base.io.IO} c) ->{base.io.IO} Either Error c
    getLine_ : Handle ->{base.io.IO} Either Error Text
    getText_ : Handle ->{base.io.IO} Either Error Text
    getFileTimestamp_ : FilePath ->{base.io.IO} Either Error EpochTime
    closeFile_ : Handle ->{base.io.IO} Either Error ()
    getTemporaryDirectory_ : {base.io.IO} (Either Error FilePath)
    getCurrentDirectory_ : {base.io.IO} (Either Error FilePath)
    renameDirectory_ : FilePath -> FilePath ->{base.io.IO} Either Error ()
    renameFile_ : FilePath -> FilePath ->{base.io.IO} Either Error ()
    receive_ : Socket -> Nat ->{base.io.IO} Either Error (Optional Bytes)
    fileExists_ : FilePath ->{base.io.IO} Either Error Boolean
    isDirectory_ : FilePath ->{base.io.IO} Either Error Boolean
    directoryContents_ : FilePath ->{base.io.IO} Either Error [FilePath]
    listen_ : Socket ->{base.io.IO} Either Error ()
    closeSocket_ : Socket ->{base.io.IO} Either Error ()
    clientSocket_ : HostName -> ServiceName ->{base.io.IO} Either Error Socket
    delay_ : Nat ->{base.io.IO} Either Error ()
    seek_ : Handle -> SeekMode -> Int ->{base.io.IO} Either Error ()
    serverSocket_ : Optional HostName -> ServiceName ->{base.io.IO} Either Error Socket
    accept_ : Socket ->{base.io.IO} Either Error Socket
    setBuffering_ : Handle -> Optional BufferMode ->{base.io.IO} Either Error ()
    openFile_ : FilePath -> Mode ->{base.io.IO} Either Error Handle
    throw : Error ->{base.io.IO} a
    fork_ : '{base.io.IO} a ->{base.io.IO} Either Error ThreadId
    getBuffering_ : Handle ->{base.io.IO} Either Error (Optional BufferMode)
    position_ : Handle ->{base.io.IO} Either Error Int
    setCurrentDirectory_ : FilePath ->{base.io.IO} Either Error ()
    createDirectory_ : FilePath ->{base.io.IO} Either Error ()
    removeDirectory_ : FilePath ->{base.io.IO} Either Error ()
    removeFile_ : FilePath ->{base.io.IO} Either Error ()
    systemTime_ : {base.io.IO} (Either Error EpochTime)
    isFileEOF_ : Handle ->{base.io.IO} Either Error Boolean
    isFileOpen_ : Handle ->{base.io.IO} Either Error Boolean
    isSeekable_ : Handle ->{base.io.IO} Either Error Boolean
    putText_ : Handle -> Text ->{base.io.IO} Either Error ()

Other abilities

Abilities are declared in an abstract way like this:

ability Store v where
  get : v
  put : v -> ()

This declaration introduces a Store ability which contains 2 actions: get and put.

Unison has no builtin mutable state mechanism. So this ability is introducing one.

This declaration does not specify any implementation (it is abstract). One nice thing about this is that we can plug any implementation we need (depending on the use-case) without modifying the code that is using those actions.

In order to be able to use this ability, we have to handle it. In Unison this is done by an Ability handler.

If e is an expression containing action calls and h is an ability handler then handle e with h is the way to get the result of evaluating e.

An ability handler is a function that states how to handle each action we want to use in the handled code (e). A handler for the Store ability will have to take care of Store.get and Store.put individually. But we will delay this and start with a simpler ability.

Abort

One of the most basic ability one can imagine is the Abort ability which is able to cut the execution of a program.

ability Abort where 
  cut : a

Here is a handler for it:

abortHandler returnedValue = cases 
  -- handling the cut action
  {Abort.cut -> continuation} -> returnedValue
  -- Exit case
  {a} -> a

There is 2 branches in our cases statement, the first one is handling our cut action by ignoring the continuation and immediately returning returnedValue.

The first branch starts with the action pattern {Abort.cut → continuation} that matches our only action Abort.cut and specifies what to do when this action is encountered. the continuation binding is holding the rest of the program. Any action pattern can do many different things with it: ignore it, call it, call it several times, record it... In this case, we are simply ignoring it and immediately returning returnedValue (the only argument of our handler).

The second branch {a} → a is the exit branch, every ability handler have to have one. It is taken when all actions calls have been handled.

Let's try it by adding this watch expression:

> handle 
    x = 4
    Abort.cut
    x + 2 
  with 
    abortHandler 0

it evaluates to 0 as intended (our abortHandler was initiated with 0). If we remove the Abort.cut line it evaluates to 6.

So what has happened here?

The first expression was evaluated normally, x was bound to 4.
Then we hit an action that has to be handled, so we defer the control of the execution to our ability handler giving him access to the continuation of the program, the first branch of our handler is taken, the continuation is ignored and the handler first argument (0 in this case) is returned.
The x + 2 expression is never computed.

Store again

Let's take a look at a more interesting handler.

A handler for the Store ability may look like this

storeHandler storedValue = cases
  -- handling the get action
  {Store.get -> cont} ->
    handle cont storedValue with storeHandler storedValue
  -- handling the put action 
  {Store.put v -> cont} ->
    handle cont () with storeHandler v
  -- the exit case
  {a} -> a

In general an action pattern has the following structure:

{A.c p_1 p_2 p_n -> k}

Where A is the name of the ability, c is the name of the action, p_1 through p_n are patterns matching the arguments to the action, and k is a continuation for the program.

Here, our handler has 3 cases, 2 for our get and put actions plus the exit case.

The 1st and 2nd branches are using the handler recursively to handle the evaluation of the rest of the computation.

The handler is initiated with an argument (storedValue ) that we will use to keep some piece of state along handling, this is a very common pattern.

One really dummy example that uses the Store ability:

-- an helper function to update the value of our store
storeUpdate f =
  Store.put (f Store.get)

-- our dummy function that uses a mutable state 
-- in order to count the number of zeros present in a list
countZeros = cases
  -- non empty list case
  head +: tail ->
    -- first we check if the head is zero and incrementing our store value if so
    if (head == 0) then storeUpdate inc else ()
    -- then we recur with the tail of the list
    countZeros tail
  -- empty list case, we are using Store.get to obtain our stored value and return it
  [] -> Store.get

-- we are initiating the storeHandler with the value 0 
-- and handling the countZeros call with it
> handle countZeros [1,0,3,0,1,0]  
  with storeHandler 0 
-- it returns: 3

Let see what happens on the evaluation of this watch expression:

storeHandler 0 is returning a handler function that will be used to handle the countZeros [1,0,3,0,1,0] expression.
[1,0,3,0,1,0] is matching the first branch of the countZeros function, we are binding 1 to head and [0,3,0,1,0] to tail.
We test if head equals 0: this is not the case so the line evaluate to ().
On the next line, we are recurring with tail.
We still match the first branch, binding 0 to head and [3,0,1,0] to tail.
This time head equals 0. So we call storeUpdate with the inc function.
The storeUpdate function is calling Store.get.
Store.get is an action. So the control is passed to our handler.
The first branch of storeHandler is matching the Store.get action.
The continuation of the program has to be called with the value that the action is meant to return, here a Nat, that is the currently stored value (the argument passed to storeHandler initially: 0).
Since the rest of the computation may contains other actions we still have to handle, we use the same handler to do so (wrapping it around our continuation call with an handle expression).
Now we are back in the storeUpdate function, the value that we got with Store.get (0) is incremented by inc and the Store.put action is called with this new value (1).
Again, we are entering our handler, this time matching the second branch.
Like the first branch, we will resume the computation, this time passing () to the continuation since the Store.put is meant to return a value of type Unit (()). This time we will slightly change the handler to capture the new value that we want to store (1).
And it continues, incrementing our store each time a 0 is encountered until the list is empty.
When the list is empty, the second branch of countZeros is taken and we call Store.get a last time to return the result (in this particular case: 3).