Glycogen

Motivation

In Clojure(Script), when fast polymorphic functions are needed, we are typically using Protocols.

As an example, we will introduce a protocol for adding things together.

(defprotocol IPlus
  (plus [x y] "add two things together"))

Then we extend this protocol to some types

(extend-protocol IPlus
   clojure.lang.IPersistentVector
   (plus [x y] (into x y))
   clojure.lang.IPersistentMap
   (plus [x y] (merge x y))
   clojure.lang.ISeq
   (plus [x y] (merge x y)))

If you are also targeting ClojureScript, you can use reader conditionals

(extend-protocol IPlus
  #?(:clj  clojure.lang.PersistentVector
     :cljs cljs.core/PersistentVector)
  (plus [x y] (into x y))
  #?(:clj  clojure.lang.IPersistentMap
     :cljs cljs.core/PersistentArrayMap)
  (plus [x y] (merge x y))
  #?(:clj  clojure.lang.ISeq
     :cljs cljs.core/List)
  (plus [x y] (concat x y)))

Those reader conditionals can become a bit noisy when this kind of code is strongly used in a project.

Let's check that everything is working as intended

(is (plus [1 2 3] [4 5 6])
    [1 2 3 4 5 6])

(is (plus {:a 1 :b 2} {:b 22 :c 3})
    {:a 1 :b 22 :c 3})

(is (plus (list 1 2) (list 1 2))
    (list 1 2 1 2))

It seems to work! But in fact, this is not so easy in ClojureScript:

(plus (range 4) (list 1 2)) 
;; this fails, complaining there is no implementation for cljs.core/Range

You may think that I should extend ISeq instead of cljs.core/List? But it does not work, because ClojureScript does not have interfaces like Clojure. To obtain the same thing that we have in Clojure, in ClojureScript we have to write this:

(extend-protocol IPlus
   cljs.core/PersistentVector
   (plus [x y] (into x y))
   ;; duplicated impls to cover all IMap
   cljs.core/PersistentArrayMap
   (plus [x y] (merge x y))
   cljs.core/PersistentHashMap
   (plus [x y] (merge x y))
   cljs.core/PersistentTreeMap
   (plus [x y] (merge x y))
   ;; same here
   cljs.core/List
   (plus [x y] (concat x y))
   cljs.core/Range
   (plus [x y] (concat x y))
   ;; ... many other duplicates for covering ISeq
   ;; complete list of classes actually is:
   ;; ArrayNodeSeq ChunkedCons ChunkedSeq Cons Cycle ES6IteratorSeq
   ;; EmptyList IndexedSeq Iterate KeySeq LazySeq List NodeSeq PersistentArrayMapSeq
   ;; PersistentQueue PersistentQueueSeq PersistentTreeMapSeq RSeq Range RangeChunk Repeat ValSeq
   )

Obviously, this is not really what we want... so i've wrote the Glycogen library

Proposition

If we forget about implementation details, the code defined above could be defined like this:

(defg plus [x y]
      :map (merge x y)
      :vec (into x y)
      :lst (concat x y))

We immediately spot the point here, not bothering with platform specific stuff. The defg macro is part of the glycogen.generics namespace.

So, as you see, the classes of the target platforms (and reader conditionals) have disappeared and the problematic expression works as intended.

(is (plus (range 4) (list 1 2))
    (list 0 1 2 3 1 2))

Of course this is a simple case. There is only one arity and no destructuration patterns (all cases are sharing the same argument vector pattern, this is not always what we want, sometimes one implementation need its own destructuring pattern. We will address these considerations at a later point. For now lets discuss those type-keywords that we are using to indicate the type(s) that each implementation belongs to. (e.g: :map :vec :lst ...)

To add glycogen to your project simply add glycogen {:mvn/version "0.1.1-SNAPSHOT"} to your deps.edn

Hidding host classes

One of the stepping stone on which this library is built upon is a little wrapper around hosting platform's type hierarchies that acts as a compatibility layer and let us forget about it.

It resides in the glycogen.types namespace.

(require '[glycogen.types :as t] 
         '[glycogen.state :as state])

The idea is really simple, there is one type register per target platform consisting of a map of type: keyword -> set of keyword or class-symbol

from clojure you can inspect it like this:

;; clojure's type registry
(t/get-reg)
;;=>
'{ 
   ;; primitives
	 :nil #{nil},
	 :num #{java.lang.Number},
	 :fun #{clojure.lang.Fn},
	 :lst #{clojure.lang.ISeq},
	 :vec #{clojure.lang.IPersistentVector},
	 :key #{clojure.lang.Keyword},
	 :sym #{clojure.lang.Symbol},
	 :str #{java.lang.String},
	 :link #{clojure.lang.MapEntry},
	 :set #{clojure.lang.IPersistentSet},
	 :map #{clojure.lang.PersistentArrayMap clojure.lang.PersistentHashMap},
	
	 ;; aggregates
	 :line #{:lst :vec},
	 :word #{:key :sym :str},
	 :hash #{:set :map},
	 :atom #{:num :fun :key :sym :str :link},
	 :coll #{:lst :vec :set :map},
	 :prim #{:num :fun :lst :vec :key :sym :str :link :nil :set :map}}

To see the clojurescript type registry:

(state/targeting-cljs (t/get-reg))
;;=>
 '{:nil #{nil}
   :num #{number},
   :fun #{function},
   :key #{Keyword},
   :sym #{Symbol},
   :str #{string},
   :link #{MapEntry},
   :set #{PersistentTreeSet PersistentHashSet},
   :map #{ObjMap PersistentHashMap PersistentTreeMap PersistentArrayMap},
   :vec #{MapEntry BlackNode Subvec RedNode PersistentVector},
   :lst #{IndexedSeq LazySeq PersistentTreeMapSeq NodeSeq PersistentArrayMapSeq ES6IteratorSeq ChunkedSeq 
          Cons Iterate RSeq ArrayNodeSeq Cycle ChunkedCons ValSeq Repeat PersistentQueueSeq EmptyList 
          PersistentQueue Range KeySeq List},
   
  :hash #{:set :map},
  :coll #{:lst :vec :set :map},
  :line #{:lst :vec},
  :word #{:key :sym :str},
  :atom #{:num :fun :key :sym :str :link},
  :prim #{:num :fun :lst :vec :key :sym :str :link :nil :set :map}}

Those type registry are certainly not complete but I think that's enough to see the point.

Along with the definitions of those registries, the glycogen.types namespace is defining some handy functions and macros to play with types :

;; basic hierarchy informations

(t/childs :coll)
;; is getting all children of a type =>
'[:lst
 :vec
 :set
 :map
 clojure.lang.ISeq
 clojure.lang.PersistentArrayMap
 clojure.lang.IPersistentSet
 clojure.lang.PersistentHashMap
 clojure.lang.IPersistentVector]

(t/parents :fun)
;; is getting all parents of a type=>
'(:atom :prim)

(t/classes :coll)
;; is getting all classes the type is belonging to =>
'(clojure.lang.ISeq
  clojure.lang.IPersistentVector
  clojure.lang.IPersistentSet
  clojure.lang.PersistentArrayMap
  clojure.lang.PersistentHashMap)

;; set litterals can be used to denotes union types
(t/childs #{:lst :vec})
;; returns all children for types :lst and :vec =>
'(clojure.lang.ISeq clojure.lang.IPersistentVector)

(t/childof :vec :coll) ;=> :vec (indicating success)

(t/parentof #{:str :sym} :sym) ;=> #{:sym :str}

(t/parentof :word #{:str :sym}) ;=> :word

(t/parentof :word #{:str :vec}) ;=> false

;; isa

(t/isa :num 1) ;;=> 1

(t/isa :coll []) ;;=> []

;; glycogen.types/isa is a macro, the below call expands to a fast typecheck :

((clojure.core/fn
  [G__4117]
  (clojure.core/when
    (clojure.core/or
      (clojure.core/seq? G__4117)
      (clojure.core/vector? G__4117)
      (clojure.core/set? G__4117)
      (clojure.core/map? G__4117))
    G__4117))
[])

;; you can use set notation here to

(t/isa #{:sym :line} 'aze) ;=> 'aze
(t/isa #{:sym :line} (list 1 2 3))) ;=> '(1 2 3)

The glycogen.types namespace is Clojure only, so you may wonder about Clojurescript... in fact it is not intended to expose anything at runtime in Clojurescript, it serves only at compile time when a macro is expanded, depending on the targeted platform, the type registry of Clojure or Clojurescript is used.

Usage

Now we've seen how glycogen.types works we will return to our initial intent, which is to be able to define generic functions in a more concise and powerful way.

Let's recap the points in which protocols declarations and implementations could be improved:

  • Be able to share implementations
  • Support for variadic arity
  • Avoiding to write reader conditionals
  • Type less, grasp the intent faster
  • Be able to clone an existing generic fonction
  • Partial implementation

Let's first import the library:

(ns glycogen.demo
  (#?(:clj :require :cljs :require-macros)
   [glycogen.generics :as g]))

Declaration

(g/defg fmap 
        "apply one or several function to something"
        ([this f])
        ([this f & fs]))

Here we are declaring a fmap generic function with a docstring a fixed arity and a variadic one.

Extension

(g/generic+ fmap

  ([x f] ;; arity 2

   ;; variadic arity
   :vec (mapv f x)

   ;; set literal can be used to implement several types at once
   #{:set :map} (into (empty x) (map f x))

   ;; the default case (if x does not implement fmap)
   (f x))



  ([x f & fs] ;; variadic arity

   ;; we have only a default case here
   (reduce fmap x (cons f fs))))

All at once

(g/defg fmap

  "apply one or several function to something"

  ([x f]
   :vec (mapv f x)
   #{:set :map} (into (empty x) (map f x))
   (f x))

  ([x f & fs]
   (reduce fmap x (cons f fs))))

So in this exemple we have addressed several points

  • variadic arity
  • code deduplication
  • hiding target platform details

Let's try fmap:

(is (fmap [1 2 3] inc)
    [2 3 4])

(is (fmap {:a 1 :b 2} (comp vec reverse))
    {1 :a, 2 :b})

;; variadic arity
(is (fmap [1 2 3] inc inc)
    [3 4 5])

One other thing that is I think a little annoying in clojure is that each implementer have to define all arities of the implemented generic. with glycogen's generics it is not mandatory, in the below exemple, the variadic arity has only a default case, it will be the implementation used for any type that implement only the arity 2 (in this case).

Let's add an implementation of fmapfor lists.

(g/generic+ fmap [x f]
  :lst (map f x))

(is (fmap (range 4) inc)
    '(1 2 3 4))

In clojure we would have been forced to give implementation for every arities of fmap. Here it is not the case, we can verify it by trying to use the variadic arity of fmapon a list.

(is (fmap (range 4) inc inc)
    '(2 3 4 5))

We can even only overide the variadic arity if needed:

(g/generic+ fmap [x f & fs]
  :lst
  (do (println "smart variadic fmap")
      (fmap x (apply comp (reverse (cons f fs))))))

(is (fmap (range 4) inc inc)
    '(2 3 4 5)) ;; prints "smart variadic fmap"

Related operations

In addition to def-protocol and extend-protocol, in clojure we have related operations like extend-type, defrecord, reify etc...

thing

In clojure and clojurescript we have reify that creates an anonymous class that implements some protocols. Here we can do roughly the same with glycogen.generics/thing.

(let [mything
      ;; we are creating an anonymous class that implement 
      ;; the previously defined generics in a really dummy way
      (g/thing (fmap [x f] [:fmaped x f])
               (plus [x y] [:plused x y]))]
  ;; checks
  (is (plus mything 1)
      [:plused mything 1])
  (is (fmap mything inc)
      [:fmaped mything inc]))

fork

One other thing that may be useful sometimes is the ability to stole a generic from somewhere else and build a new one by overriding some parts of the original one (without altering the original one and the code that depends on it). For this we have the glycogen.generics/fork operator.

(g/fork fmap ;; the generic that we are cloning/forking
        tweaked-fmap ;; the name that will hold the copy
        ;; it takes the same body format as defg or generic+
        ;; here we are only overiding the arity 2 implementation for vectors
        [x f]
        :vec (do (println "tweaked fmap") (mapv f x)))

(tweaked-fmap [1 2 3] inc) ;; printing "tweaked fmap"

The tweaked-fmap generic now exists on its own and is completly hermetic to fmap further changes/extensions.

type+

There is something similar to extend-type and its name is glycogen.generics/type+

Here we are extending the type :num to our previously defined generics (plus and fmap)

(g/type+ :num
         (fmap [x f] (println "fmaping num") (f x))
         (plus [x y] (+ x y)))

(is (with-out-str
      (is (fmap 1 inc)
          2))
    "fmaping num\n")

(is (plus 1 2)
    3)

deft

One thing we are still missing is the ability to introduce new types the way deftype or defrecord do it.

The deft macro is similar to defrecord but let you implement generics.

We will define a :pair type holding two fields car and cdr

Along with defining a new record, the deft macro defines some useful functions to work with your type:

  • a constructor function (here pair)
  • a casting generic function (here →pair) that can be implemented by other types in order to cast into the defined type.

(g/deft :pair ;; the type tag
        [car cdr] ;; the fields
        ;; generics implementations
        ;; note that the constructor function is available (pair)
        (plus [_ y] (pair car (if cdr (plus cdr y) y)))
        (fmap [_ f] (pair (f car) (when cdr (fmap cdr f)))))

(defn lst [& xs]
  (reduce (fn [p x] (pair x p))
          nil (reverse xs)))

(is (fmap (plus (lst 1 2 3) (lst 4 5 6))
          inc)
    (lst 2 3 4 5 6 7))

more about defg

precedence

The order of implementations matters, the semantics are similar to clojure/cond, the first implementation have priority on the laters.

(g/defg whoami [x]
        :vec "I'm a vector"
        :coll "I'm a collection")

(is (whoami [1 2])
    "I'm a vector")

(is (whoami (list 1 2))
    "I'm a collection")

bindings

As mentioned previously, sharing the binding pattern accross all the implementations of an arity is not always what we want. But in fact you can provides several times the same arity with different binding patterns.

To demonstrate this we will extend the ->pair generic that convert something to a pair (and has been automatically declared by the deft exemple above).

(g/generic+ ->pair 

      ([[car & cdr]] 
       :coll (pair car cdr))

      ([x] 
       :pair x 
       (pair x nil)))

(is (->pair [1 2 3])
    (->pair (list 1 2 3))
    (pair 1 (list 2 3)))

(is (->pair 1)
    (pair 1 nil))

Under the hood

In order for all of this to work we cannot map directly to clojure's protocol, I mean that in fact when defining a polyarity generic several protocols are defined, one for each arity and one for the variadic arity.

Let's take a look at the macro expansion of defg

'(do
   ;; first are doing some var cleaning, if the defined 
   ;; generic already exists we are removing related vars (a common case in dev)
  (do
    (clojure.core/ns-unmap (quote glycogen.article) (quote plus))
    (clojure.core/ns-unmap (quote glycogen.article) (quote p_plus_3))
    (clojure.core/ns-unmap (quote glycogen.article) (quote Iplus_3))
    (clojure.core/ns-unmap (quote glycogen.article) (quote p_plus_2))
    (clojure.core/ns-unmap (quote glycogen.article) (quote Iplus_2)))
   
   ;; for each arity of the defined generic we are defining a protocol
  (do
    ;; the arity 3 is holding our variadic arity
    (clojure.core/defprotocol glycogen.article/Iplus_3 (p_plus_3 [a_5712 a_5713 a_5714]))
    (clojure.core/defprotocol glycogen.article/Iplus_2 (p_plus_2 [a_5715 a_5716])))
   
   ;; we are wrapping all this in a function that will be the user calling interface
  (clojure.core/defn
    plus
    ;; the arity 2 is simply wrapping the arity 2 protocol
    ([a_5715 a_5716] (p_plus_2 a_5715 a_5716))
    ;; the variadic arity wraps the rest argument and uses the arity 3 protocol
    ([a_5712 a_5713 & a_5714] (p_plus_3 a_5712 a_5713 a_5714)))
   
   ;; for each implementation we are defining a var, it can serves several purposes, 
   ;; one is to easily implement copying of generics across namespaces
   ;; one other is to be able to inline some implementations (in some compiler context, no-one wants to see those wierd names in code)
  (do
    (do
      (clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_lst))
      (clojure.core/defn plus_2_IMPL_lst ([x y] (concat x y))))
    (do
      (clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_vec))
      (clojure.core/defn plus_2_IMPL_vec ([x y] (into x y))))
    (do
      (clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_map))
      (clojure.core/defn plus_2_IMPL_map ([x y] (merge x y))))
    
    ;; note that a default case that throw a "missing implementation" 
    ;; error is automatically defined when no default case in given by the user
    (do
      (clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_any))
      (clojure.core/defn
        plus_2_IMPL_any
        ([x y]
         (glycogen.prelude/error
           "missing implementation for generic: "
           (quote plus)
           "\npattern:\n"
           (quote [x y])
           "\nwhere:\n"
           {(quote x) x, (quote y) y}))))
    (do
      (clojure.core/ns-unmap (quote glycogen.article) (quote plus_3_IMPL_any))
      (clojure.core/defn plus_3_IMPL_any ([x y ys] (reduce plus (plus x y) ys)))))
   
   ;; then we are actually extending defined protocols (in clojurescript extend-type is used)
  (do
    (clojure.core/extend clojure.lang.ISeq glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_lst})
    (clojure.core/extend clojure.lang.IPersistentVector glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_vec})
    (clojure.core/extend clojure.lang.PersistentArrayMap glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_map})
    (clojure.core/extend clojure.lang.PersistentHashMap glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_map})
    (clojure.core/extend Object glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_any})
    (clojure.core/extend Object glycogen.article/Iplus_3 {:p_plus_3 plus_3_IMPL_any}))
   
   ;; finally we are returning the main function 
  plus)

Further ideas

Generic function should maybe being able to be anonymous and used as lambdas are. For exemple:

(let [f (fg [x] 
            :coll [:coll x] 
            :atom [:atom x]
            [:any x])]
  (is (f {:a 1})
      [:coll {:a 1}])
  (is (f 1)
      [:atom 1]))

But it brings several question to the table

  • how can we simulate closure behavior? is it even possible?
  • memory management