Motivation
In Clojure(Script), when fast polymorphic functions are needed, we are typically using Protocols.
As an example, we will introduce a protocol for adding things together.
(defprotocol IPlus
(plus [x y] "add two things together"))
Then we extend this protocol to some types
(extend-protocol IPlus
clojure.lang.IPersistentVector
(plus [x y] (into x y))
clojure.lang.IPersistentMap
(plus [x y] (merge x y))
clojure.lang.ISeq
(plus [x y] (merge x y)))
If you are also targeting ClojureScript, you can use reader conditionals
(extend-protocol IPlus
#?(:clj clojure.lang.PersistentVector
:cljs cljs.core/PersistentVector)
(plus [x y] (into x y))
#?(:clj clojure.lang.IPersistentMap
:cljs cljs.core/PersistentArrayMap)
(plus [x y] (merge x y))
#?(:clj clojure.lang.ISeq
:cljs cljs.core/List)
(plus [x y] (concat x y)))
Those reader conditionals can become a bit noisy when this kind of code is strongly used in a project.
Let's check that everything is working as intended
(is (plus [1 2 3] [4 5 6])
[1 2 3 4 5 6])
(is (plus {:a 1 :b 2} {:b 22 :c 3})
{:a 1 :b 22 :c 3})
(is (plus (list 1 2) (list 1 2))
(list 1 2 1 2))
It seems to work! But in fact, this is not so easy in ClojureScript:
(plus (range 4) (list 1 2))
;; this fails, complaining there is no implementation for cljs.core/Range
You may think that I should extend ISeq
instead of cljs.core/List
? But it does not work, because ClojureScript does not have interfaces like Clojure. To obtain the same thing that we have in Clojure, in ClojureScript we have to write this:
(extend-protocol IPlus
cljs.core/PersistentVector
(plus [x y] (into x y))
;; duplicated impls to cover all IMap
cljs.core/PersistentArrayMap
(plus [x y] (merge x y))
cljs.core/PersistentHashMap
(plus [x y] (merge x y))
cljs.core/PersistentTreeMap
(plus [x y] (merge x y))
;; same here
cljs.core/List
(plus [x y] (concat x y))
cljs.core/Range
(plus [x y] (concat x y))
;; ... many other duplicates for covering ISeq
;; complete list of classes actually is:
;; ArrayNodeSeq ChunkedCons ChunkedSeq Cons Cycle ES6IteratorSeq
;; EmptyList IndexedSeq Iterate KeySeq LazySeq List NodeSeq PersistentArrayMapSeq
;; PersistentQueue PersistentQueueSeq PersistentTreeMapSeq RSeq Range RangeChunk Repeat ValSeq
)
Obviously, this is not really what we want... so i've wrote the Glycogen library
Proposition
If we forget about implementation details, the code defined above could be defined like this:
(defg plus [x y]
:map (merge x y)
:vec (into x y)
:lst (concat x y))
We immediately spot the point here, not bothering with platform specific stuff. The defg
macro is part of the glycogen.generics
namespace.
So, as you see, the classes of the target platforms (and reader conditionals) have disappeared and the problematic expression works as intended.
(is (plus (range 4) (list 1 2))
(list 0 1 2 3 1 2))
Of course this is a simple case. There is only one arity and no destructuration patterns (all cases are sharing the same argument vector pattern, this is not always what we want, sometimes one implementation need its own destructuring pattern. We will address these considerations at a later point. For now lets discuss those type-keywords that we are using to indicate the type(s) that each implementation belongs to. (e.g: :map
:vec
:lst
...)
To add glycogen to your project simply add glycogen {:mvn/version "0.1.1-SNAPSHOT"}
to your deps.edn
Hidding host classes
One of the stepping stone on which this library is built upon is a little wrapper around hosting platform's type hierarchies that acts as a compatibility layer and let us forget about it.
It resides in the glycogen.types
namespace.
(require '[glycogen.types :as t]
'[glycogen.state :as state])
The idea is really simple, there is one type register per target platform consisting of a map of type: keyword -> set of keyword or class-symbol
from clojure you can inspect it like this:
;; clojure's type registry
(t/get-reg)
;;=>
'{
;; primitives
:nil #{nil},
:num #{java.lang.Number},
:fun #{clojure.lang.Fn},
:lst #{clojure.lang.ISeq},
:vec #{clojure.lang.IPersistentVector},
:key #{clojure.lang.Keyword},
:sym #{clojure.lang.Symbol},
:str #{java.lang.String},
:link #{clojure.lang.MapEntry},
:set #{clojure.lang.IPersistentSet},
:map #{clojure.lang.PersistentArrayMap clojure.lang.PersistentHashMap},
;; aggregates
:line #{:lst :vec},
:word #{:key :sym :str},
:hash #{:set :map},
:atom #{:num :fun :key :sym :str :link},
:coll #{:lst :vec :set :map},
:prim #{:num :fun :lst :vec :key :sym :str :link :nil :set :map}}
To see the clojurescript type registry:
(state/targeting-cljs (t/get-reg))
;;=>
'{:nil #{nil}
:num #{number},
:fun #{function},
:key #{Keyword},
:sym #{Symbol},
:str #{string},
:link #{MapEntry},
:set #{PersistentTreeSet PersistentHashSet},
:map #{ObjMap PersistentHashMap PersistentTreeMap PersistentArrayMap},
:vec #{MapEntry BlackNode Subvec RedNode PersistentVector},
:lst #{IndexedSeq LazySeq PersistentTreeMapSeq NodeSeq PersistentArrayMapSeq ES6IteratorSeq ChunkedSeq
Cons Iterate RSeq ArrayNodeSeq Cycle ChunkedCons ValSeq Repeat PersistentQueueSeq EmptyList
PersistentQueue Range KeySeq List},
:hash #{:set :map},
:coll #{:lst :vec :set :map},
:line #{:lst :vec},
:word #{:key :sym :str},
:atom #{:num :fun :key :sym :str :link},
:prim #{:num :fun :lst :vec :key :sym :str :link :nil :set :map}}
Those type registry are certainly not complete but I think that's enough to see the point.
Along with the definitions of those registries, the glycogen.types
namespace is defining some handy functions and macros to play with types :
;; basic hierarchy informations
(t/childs :coll)
;; is getting all children of a type =>
'[:lst
:vec
:set
:map
clojure.lang.ISeq
clojure.lang.PersistentArrayMap
clojure.lang.IPersistentSet
clojure.lang.PersistentHashMap
clojure.lang.IPersistentVector]
(t/parents :fun)
;; is getting all parents of a type=>
'(:atom :prim)
(t/classes :coll)
;; is getting all classes the type is belonging to =>
'(clojure.lang.ISeq
clojure.lang.IPersistentVector
clojure.lang.IPersistentSet
clojure.lang.PersistentArrayMap
clojure.lang.PersistentHashMap)
;; set litterals can be used to denotes union types
(t/childs #{:lst :vec})
;; returns all children for types :lst and :vec =>
'(clojure.lang.ISeq clojure.lang.IPersistentVector)
(t/childof :vec :coll) ;=> :vec (indicating success)
(t/parentof #{:str :sym} :sym) ;=> #{:sym :str}
(t/parentof :word #{:str :sym}) ;=> :word
(t/parentof :word #{:str :vec}) ;=> false
;; isa
(t/isa :num 1) ;;=> 1
(t/isa :coll []) ;;=> []
;; glycogen.types/isa is a macro, the below call expands to a fast typecheck :
((clojure.core/fn
[G__4117]
(clojure.core/when
(clojure.core/or
(clojure.core/seq? G__4117)
(clojure.core/vector? G__4117)
(clojure.core/set? G__4117)
(clojure.core/map? G__4117))
G__4117))
[])
;; you can use set notation here to
(t/isa #{:sym :line} 'aze) ;=> 'aze
(t/isa #{:sym :line} (list 1 2 3))) ;=> '(1 2 3)
The glycogen.types
namespace is Clojure only, so you may wonder about Clojurescript... in fact it is not intended to expose anything at runtime in Clojurescript, it serves only at compile time when a macro is expanded, depending on the targeted platform, the type registry of Clojure or Clojurescript is used.
Usage
Now we've seen how glycogen.types
works we will return to our initial intent, which is to be able to define generic functions in a more concise and powerful way.
Let's recap the points in which protocols declarations and implementations could be improved:
- Be able to share implementations
- Support for variadic arity
- Avoiding to write reader conditionals
- Type less, grasp the intent faster
- Be able to clone an existing generic fonction
- Partial implementation
Let's first import the library:
(ns glycogen.demo
(#?(:clj :require :cljs :require-macros)
[glycogen.generics :as g]))
Declaration
(g/defg fmap
"apply one or several function to something"
([this f])
([this f & fs]))
Here we are declaring a fmap
generic function with a docstring a fixed arity and a variadic one.
Extension
(g/generic+ fmap
([x f] ;; arity 2
;; variadic arity
:vec (mapv f x)
;; set literal can be used to implement several types at once
#{:set :map} (into (empty x) (map f x))
;; the default case (if x does not implement fmap)
(f x))
([x f & fs] ;; variadic arity
;; we have only a default case here
(reduce fmap x (cons f fs))))
All at once
(g/defg fmap
"apply one or several function to something"
([x f]
:vec (mapv f x)
#{:set :map} (into (empty x) (map f x))
(f x))
([x f & fs]
(reduce fmap x (cons f fs))))
So in this exemple we have addressed several points
- variadic arity
- code deduplication
- hiding target platform details
Let's try fmap
:
(is (fmap [1 2 3] inc)
[2 3 4])
(is (fmap {:a 1 :b 2} (comp vec reverse))
{1 :a, 2 :b})
;; variadic arity
(is (fmap [1 2 3] inc inc)
[3 4 5])
One other thing that is I think a little annoying in clojure is that each implementer have to define all arities of the implemented generic. with glycogen's generics it is not mandatory, in the below exemple, the variadic arity has only a default case, it will be the implementation used for any type that implement only the arity 2 (in this case).
Let's add an implementation of fmap
for lists.
(g/generic+ fmap [x f]
:lst (map f x))
(is (fmap (range 4) inc)
'(1 2 3 4))
In clojure we would have been forced to give implementation for every arities of fmap
. Here it is not the case, we can verify it by trying to use the variadic arity of fmap
on a list.
(is (fmap (range 4) inc inc)
'(2 3 4 5))
We can even only overide the variadic arity if needed:
(g/generic+ fmap [x f & fs]
:lst
(do (println "smart variadic fmap")
(fmap x (apply comp (reverse (cons f fs))))))
(is (fmap (range 4) inc inc)
'(2 3 4 5)) ;; prints "smart variadic fmap"
Related operations
In addition to def-protocol
and extend-protocol
, in clojure we have related operations like extend-type
, defrecord
, reify
etc...
thing
In clojure and clojurescript we have reify
that creates an anonymous class that implements some protocols. Here we can do roughly the same with glycogen.generics/thing
.
(let [mything
;; we are creating an anonymous class that implement
;; the previously defined generics in a really dummy way
(g/thing (fmap [x f] [:fmaped x f])
(plus [x y] [:plused x y]))]
;; checks
(is (plus mything 1)
[:plused mything 1])
(is (fmap mything inc)
[:fmaped mything inc]))
fork
One other thing that may be useful sometimes is the ability to stole a generic from somewhere else and build a new one by overriding some parts of the original one (without altering the original one and the code that depends on it). For this we have the glycogen.generics/fork
operator.
(g/fork fmap ;; the generic that we are cloning/forking
tweaked-fmap ;; the name that will hold the copy
;; it takes the same body format as defg or generic+
;; here we are only overiding the arity 2 implementation for vectors
[x f]
:vec (do (println "tweaked fmap") (mapv f x)))
(tweaked-fmap [1 2 3] inc) ;; printing "tweaked fmap"
The tweaked-fmap
generic now exists on its own and is completly hermetic to fmap
further changes/extensions.
type+
There is something similar to extend-type
and its name is glycogen.generics/type+
Here we are extending the type :num
to our previously defined generics (plus
and fmap
)
(g/type+ :num
(fmap [x f] (println "fmaping num") (f x))
(plus [x y] (+ x y)))
(is (with-out-str
(is (fmap 1 inc)
2))
"fmaping num\n")
(is (plus 1 2)
3)
deft
One thing we are still missing is the ability to introduce new types the way deftype
or defrecord
do it.
The deft
macro is similar to defrecord
but let you implement generics.
We will define a :pair
type holding two fields car
and cdr
Along with defining a new record, the deft
macro defines some useful functions to work with your type:
- a constructor function (here
pair
) - a casting generic function (here
→pair
) that can be implemented by other types in order to cast into the defined type.
(g/deft :pair ;; the type tag
[car cdr] ;; the fields
;; generics implementations
;; note that the constructor function is available (pair)
(plus [_ y] (pair car (if cdr (plus cdr y) y)))
(fmap [_ f] (pair (f car) (when cdr (fmap cdr f)))))
(defn lst [& xs]
(reduce (fn [p x] (pair x p))
nil (reverse xs)))
(is (fmap (plus (lst 1 2 3) (lst 4 5 6))
inc)
(lst 2 3 4 5 6 7))
more about defg
precedence
The order of implementations matters, the semantics are similar to clojure/cond, the first implementation have priority on the laters.
(g/defg whoami [x]
:vec "I'm a vector"
:coll "I'm a collection")
(is (whoami [1 2])
"I'm a vector")
(is (whoami (list 1 2))
"I'm a collection")
bindings
As mentioned previously, sharing the binding pattern accross all the implementations of an arity is not always what we want. But in fact you can provides several times the same arity with different binding patterns.
To demonstrate this we will extend the ->pair
generic that convert something to a pair (and has been automatically declared by the deft
exemple above).
(g/generic+ ->pair
([[car & cdr]]
:coll (pair car cdr))
([x]
:pair x
(pair x nil)))
(is (->pair [1 2 3])
(->pair (list 1 2 3))
(pair 1 (list 2 3)))
(is (->pair 1)
(pair 1 nil))
Under the hood
In order for all of this to work we cannot map directly to clojure's protocol, I mean that in fact when defining a polyarity generic several protocols are defined, one for each arity and one for the variadic arity.
Let's take a look at the macro expansion of defg
'(do
;; first are doing some var cleaning, if the defined
;; generic already exists we are removing related vars (a common case in dev)
(do
(clojure.core/ns-unmap (quote glycogen.article) (quote plus))
(clojure.core/ns-unmap (quote glycogen.article) (quote p_plus_3))
(clojure.core/ns-unmap (quote glycogen.article) (quote Iplus_3))
(clojure.core/ns-unmap (quote glycogen.article) (quote p_plus_2))
(clojure.core/ns-unmap (quote glycogen.article) (quote Iplus_2)))
;; for each arity of the defined generic we are defining a protocol
(do
;; the arity 3 is holding our variadic arity
(clojure.core/defprotocol glycogen.article/Iplus_3 (p_plus_3 [a_5712 a_5713 a_5714]))
(clojure.core/defprotocol glycogen.article/Iplus_2 (p_plus_2 [a_5715 a_5716])))
;; we are wrapping all this in a function that will be the user calling interface
(clojure.core/defn
plus
;; the arity 2 is simply wrapping the arity 2 protocol
([a_5715 a_5716] (p_plus_2 a_5715 a_5716))
;; the variadic arity wraps the rest argument and uses the arity 3 protocol
([a_5712 a_5713 & a_5714] (p_plus_3 a_5712 a_5713 a_5714)))
;; for each implementation we are defining a var, it can serves several purposes,
;; one is to easily implement copying of generics across namespaces
;; one other is to be able to inline some implementations (in some compiler context, no-one wants to see those wierd names in code)
(do
(do
(clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_lst))
(clojure.core/defn plus_2_IMPL_lst ([x y] (concat x y))))
(do
(clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_vec))
(clojure.core/defn plus_2_IMPL_vec ([x y] (into x y))))
(do
(clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_map))
(clojure.core/defn plus_2_IMPL_map ([x y] (merge x y))))
;; note that a default case that throw a "missing implementation"
;; error is automatically defined when no default case in given by the user
(do
(clojure.core/ns-unmap (quote glycogen.article) (quote plus_2_IMPL_any))
(clojure.core/defn
plus_2_IMPL_any
([x y]
(glycogen.prelude/error
"missing implementation for generic: "
(quote plus)
"\npattern:\n"
(quote [x y])
"\nwhere:\n"
{(quote x) x, (quote y) y}))))
(do
(clojure.core/ns-unmap (quote glycogen.article) (quote plus_3_IMPL_any))
(clojure.core/defn plus_3_IMPL_any ([x y ys] (reduce plus (plus x y) ys)))))
;; then we are actually extending defined protocols (in clojurescript extend-type is used)
(do
(clojure.core/extend clojure.lang.ISeq glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_lst})
(clojure.core/extend clojure.lang.IPersistentVector glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_vec})
(clojure.core/extend clojure.lang.PersistentArrayMap glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_map})
(clojure.core/extend clojure.lang.PersistentHashMap glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_map})
(clojure.core/extend Object glycogen.article/Iplus_2 {:p_plus_2 plus_2_IMPL_any})
(clojure.core/extend Object glycogen.article/Iplus_3 {:p_plus_3 plus_3_IMPL_any}))
;; finally we are returning the main function
plus)
Further ideas
Generic function should maybe being able to be anonymous and used as lambdas are. For exemple:
(let [f (fg [x]
:coll [:coll x]
:atom [:atom x]
[:any x])]
(is (f {:a 1})
[:coll {:a 1}])
(is (f 1)
[:atom 1]))
But it brings several question to the table
- how can we simulate closure behavior? is it even possible?
- memory management