Getting Started with Category Theory

July 1, 2024

Category theory is a fascinating and unreasonably powerful tool for thinking about computation, transformation, and relationships between things in the world. That makes it useful for mathematicians, computer scientists, and philosophers alike. I've struggled through learning a decent chunk of the theory on my own (meaning, no university courses) and now that I have a couple years of that struggle behind me I thought I should document some of the really useful intuitions and perspectives on category theory that made everything make sense for me but which aren't really discussed in any materials I came across. This post will be a bit long, acting as a whirlwind tour of many important concepts.

I often find that the most useful and time-saving thing I do when learning category theory is to get out a pencil and paper and sketch the concepts in any way that makes sense to me. The teaching benefits massively from the combination of seeing things organized two-dimensionally and figuring out enough to be able to draw anything in the first place. I recommend you draw out example diagrams and tables as you read through this post, though I'll try to make that as unnecessary as I can. With this many ideas, it can be valuable to take your time getting through the post. I also plan on making a little wiki of concepts, recording some videos, and publishing some parts of this post as subposts that are more digestable, but having all this here in a post is great for control-F/command-F searching for things, and for having a pedagogical order of concepts.

I know not all my explanations here will sufficiently answer all questions; that's the nature of the medium. But hopefully I give enough direction for searching online (or in books) for any answers you might need. At the end of the post I mention some resources that have taught me almost everything I know. To get a sense of what today's mathematicians are actually doing in category theory research, though, you have to read papers; struggling through papers is a major source of intuition and depth for me. Examples of papers with fascinating category theory are Adjunction Models For Call-By-Push-Value With Stacks80568-1), Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire, and Recovering Purity with Comonads and Capabilities, all of which are beyond what is discussed here but which I'd love to write posts on in the future.

This post assumes some light familiarity with set theory, such as how a function in set theory is a set of pairs, or how there's no set of all sets. I often use the term "collection," implicitly referring to set-theoretical issues such as the possibility of needing to use classes instead of sets in some situations. Otherwise, this post tries to be fairly self-contained. (Hence its length.)

Without further ado, let's get started.

What's a category?

If you feel like you already know what a category is, you might still get value from this section, as I tried to fill it with all the intuition, philosophy, advice, and perspective that has helped me so far.

Informally, a category is just a collection of things, along with a particular relationship between these things. For example, imagine the collection of all possible states our world could be in (Einstein being the American president, dogs not existing, and any other wacky thing you can come up with). In any particular world \(W_1\), there are other worlds \(W_2\) that we can "get to" if some sequence of events were to occur, transforming \(W_1\) into \(W_2\). At the very least, every state can "get to" itself by "doing nothing." For example, I can get from a world where I have an apple to a world where I don't by eating the apple, or giving it away, or throwing it in a lake, etc.

Formally, a category \(C\) is a collection of objects (often written \(\texttt{ob}(C)\)) and a collection of arrows (often called "morphisms," each starting at an object and ending at an object). To be a valid category, there are some requirements these collections need to satisfy. It's very helpful to draw these on paper. For any object \(X\) in \(C\), there must be a morphism from \(X\) to \(X\) (drawn as a little loop at \(X\)) that "does nothing" (typically written \(id_X\) or \(1_X\) but sometimes unfortunately written as just \(X\) when what is meant is "clear from context"). This is called an "identity arrow," "identity on \(X\)," or "identity of \(X\)." Also, for any objects \(X\), \(Y\), and \(Z\) in \(C\), for any two morphisms \(f:X\to Y\) and \(g:Y\to Z\), there must be a morphism \(g\circ f:X\to Z\) (pronounced "\(g\) after \(f\)") that is exactly the same as following \(f\) from \(X\) to \(Y\) and from there following \(g\) to finally get to \(Z\). This is called the composition of \(f\) and \(g\), and the composition operator \(\circ\) must be associative. These requirements are drawn formally with the following diagrams. These diagrams are called "commutative diagrams" or "commuting diagrams," meaning that any two paths from one object to another must be equal (one says the diagram "commutes"). Therefore, each diagram is visually representing a bunch of equations, such as \(f\circ id_X=f\).

These requirements aren't too difficult to satisfy; categories show up all over the place. Any transitive, reflexive relation (these are called "preorders") gives rise to a category, including \(\leq\) on the integers. So for any integer, say \(7\), it's less than or equal to itself (\(7\leq 7\)). And for any three integers \(a\), \(b\), and \(c\), if \(a\leq b\) and \(b\leq c\), then \(a\leq c\). And of course, \(\leq\) is associative.

You'll also find a category in any "monoid," that is, an associative binary operation (like \(+\)) on a set (like the integers) which has a unit for the operation (like \(0\)). You can form a category with a single object \(M\) where the arrows all loop from \(M\) back to \(M\), representing the elements of the set (such as the integers). So for example, \(1:M\to M\), \(2:M\to M\), etc. \(M\) is called a "monoid object." This is a proper category because any two of these arrows can be composed using the binary operation, which we've required to be associative. For example, \(3+4:M\to M\). And there's an arrow for the unit (such as \(0\), for \(+\)) that behaves as the identity arrow of \(M\).

*Psst-- Draw those examples out on a sheet of paper! Show some example diagrams! It really helps ideas crystallize. Here's one:*

It's been said in some category theory learning material that I can't find now that a category is something like a bridge between the definition of a preorder and the definition of a monoid, capturing each in a way that extends the other. This is an informal idea but I found it helpful. A preorder is a category with multiple objects but only one arrow (at most) between each object, and a monoid is a category with multiple arrows but only one object. Bartosz Milewski (a popular category theory educator) suggested that a category can also be seen as a "proof-relevant" preorder, meaning objects can be related to each other and themselves in multiple ways that are seen as distinct, as opposed to typical relations which either relate two given objects or don't.

In programming, we typically use a slightly different-looking category. Namely, we typically use categories where the objects are types (or some similar collection, like sets) and the arrows are functions (or function-like in some way). That's why I've been naming objects uppercase and naming morphisms starting from "f." This particular kind of category has certain properties right out the gate. The "relationships" between objects aren't propositions (like \(4\leq 5\)) but transformations (like \(f(x)=x+1\)). We often have values within objects (since the objects are collections), which wouldn't make sense with an object like \(7\). That means there are multiple distinct values, like \(7\) or \(8\), that are different from each other but represented by the same object in the category (say, int). Category theory can't talk about these values directly (it can hardly say things about even the objects, as it generally uses morphisms for all reasoning) though if you have something like a singleton set \(S\) then the set of all functions from \(S\) to any set \(X\) is equivalent to the set \(X\) itself, and gives a way of talking about the elements of \(X\) in category theory. Categories where the objects are collections are EXTREMELY powerful, and it's the main reason we say a category has a "collection" of objects instead of a "set;" we'd like to be able to construct categories where the objects are all the sets, and there's no set-of-all-sets so this category's objects don't form a set. "Small" categories are ones where the objects form a set, and "large" categories are the rest, and include many interesting and powerful categories I'll discuss later on.

I think these categories-of-collections are also very useful and underappreciated for analytic philosophers, which is why the first example category I gave was an analysis of possible worlds. That category is an example where objects have an enormous amount of information (the entire state of the world), and arrows are transformations (sequences of events).

Another philosophical point is that we try whenever possible to avoid testing two objects for equality. Equality is seen as a philosophically problematic or even potentially ill-defined notion in the full generality of category theory. So, for example, it's important to talk about the composition law as "given objects \(X\), \(Y\), \(Z\), and morphisms \(f:X\to Y\) and \(g:Y\to Z\), there's a morphism \(g\circ f:X\to Z\) such that etc. etc." Instead of, "given morphisms \(f:X\to Y\) and \(g:Z\to W\), if \(Y=Z\) then there's a morphism \(g\circ f:X\to W\) etc." or anything like that. It's useful to note that there's nothing saying that \(X\), \(Y\), and \(Z\) have to be different from each other. Recall the monoid-based category, with only one object \(M\), where composition was between two arrows \(f:M\to M\) and \(g:M\to M\). The rules for identity arrows, like \(id_Y\circ f=f\;\) (for some \(f:X\to Y\)), certainly apply when \(X\) and \(Y\) are the same object.

A final philosophical point that's very important for getting any further in category theory is that one should be thinking in terms of the sets of morphisms from an object to another. These sets are called "homsets" and are written \(C(X,Y)\) for the set of arrows starting at \(X\) and ending at \(Y\) in the category \(C\). These sets are also written \(\texttt{Hom}(X,Y)\). Category theory never wants to be talking about the details of a particular object; these should be seen as opaque and uninformative, even if they have a rich internal structure. Instead, the philosophy of category theory is that objects are completely described by the arrows going to and from them, and these arrows should be what you refer to when you want to say something about an object. Note that homsets run into the same "large" and "small" thing that collections objects do: when talking about the homsets of a category, one uses the term "local," so a "locally small" category is one where the collections of arrows between any two objects is always a set. Advanced category often "enriches" categories, a term that means using something besides sets for these collections, such as categories, but I'm getting ahead of myself!

In many cases, the arrows of a category will also have some representation as objects in that same category. For example, in a category of sets and functions, functions are also sets, so may be present as both arrows and objects in the category. Categories where every arrow has a corresponding object are called "closed," though this adjective doesn't appear by itself because of there multiple interesting formal definitions to pick between. The object of an arrow \(f:X\to Y\) is called an "exponential" or "exponential object" and is written \(Y^X\). This notation comes from the idea that if \(X\) and \(Y\) are sets, the number of elements (functions) in the set \(X\to Y\) is equal to the number of elements in \(Y\) raised to the power of the number of elements in \(X\), which is kinda fun I think. I won't properly define exponential objects until later on when I discuss products and universal constructions, but just knowing about them informally helps one realize the depth with which one can think about arrows in a category.

Category theory on category theory

Now we have a bunch of definitions and intuition about what a category is, but the rest of the results of category theory are still impenetrable. That is, until we start thinking about shifting perspectives. Notice that when I gave an example of a category based on a preorder, the objects were the set of all integers. Then when I gave an example of a category based on a monoid, the morphisms were the set of all integers. It's the same set, but in one category they're objects and in another they're arrows! These kinds of shifts between different categories that describe the same thing in very different ways are all over the place in category theory.

A particularly powerful example of this are categories where the objects are categories. These categories mean that when you're doing some work with a category, you're also working in larger categories where your category is an object, and those categories have arrows to and from your category that you can actually use to help with your reasoning. The primary example of this is called \(\texttt{Cat}\), a large category where the objects are all small categories. The arrows in \(\texttt{Cat}\) are called "functors," and they're functions (really mappings, in the event of a large or locally large category where they can't be functions) mapping objects of some category \(C\) into objects of some category \(D\), and mapping the arrows in \(C\) to arrows in \(D\). Functors also get an extra restriction: functors must preserve commuting diagrams. What this means is that composition and identity must be preserved: \(F(g\circ f)=Fg\circ Ff\) and \(F(id_X)=id_{FX}\). (Category theory doesn't do parentheses for applying mappings to things; \(Fg\) might be written \(F(g)\) in what you're used to.) That preservation condition is an important idea and worth thinking about a little, and hopefully will become more clear in the coming paragraphs. It makes functors a sort of homomorphisms, if you're familiar with group theory or related fields of abstract algebra. In more friendly language, functors are required to preserve the structure of a category. So if I have some diagram in my category \(C\) then a functor \(F:C\to D\) gives me a diagram in \(D\) that can be drawn in a very similar way:

This does not mean that if \(X\), \(Y\), and \(Z\) are different from each other, then \(FX\), \(FY\), and \(FZ\) are too. It only means that, assuming the first diagram commutes, the second one does too. \(FX\) could be equal to \(FY\), and \(Fp\) could even just be \(id_{FX}\)! So long as commuting diagrams are mapped to diagrams that also commute. One example of a functor that I like is from the monoid-based category of addition on natural numbers to the monoid-based category of multiplication on natural numbers. The single monoid object gets mapped into the other single monoid object, of course. There are a bunch of functors to choose from here, but I like powers of 2 so I'll map every morphism \(n\) in the addition category to the morphism \(2^n\) in the multiplication category. That's a functor because if I compose two morphisms in the addition category (say, \(2+3\)) then does the right thing in the multiplication category. Functors are often written with a dash in the place of the parameter, so this functor would be written \(2^-\). Here's a diagram to help illustrate:

This represents how exponentiation turns multiplication into addition! Composition in these categories are addition and multiplication respectively, and the functor is raising 2 to a power, so the functors-preserve-composition equation \(F(g\circ f)=Fg\circ Ff\) becomes \(2^{n+m}=2^ncdot 2^m\), a rule for exponentials we learn in high school!

I find it helpful to remember that functors are just arrows in \(\texttt{Cat}\). They're just transformations from collections of objects and morphisms to other collections of objects and morphisms, and we've attached a restriction to them that we've found useful and commonly-satisfied. A category of all small categories could be created where the arrows aren't functors, and functors aren't some new axiomatic thing in category theory. It's still all just objects and arrows, and functors are just an example of morphisms. They just feel different when we use them because we're not working in \(\texttt{Cat}\) but within some object in \(\texttt{Cat}\), so the arrows of \(\texttt{Cat}\) have a sort of strange (but useful!) effect.

In programming, as I mentioned before, we often deal with categories of collections. So an object (say, a set) contains elements within it. It's very important to keep in mind that functors just map objects to objects and can't touch the elements within at all. A functor might map the set \(\{4\}\) to the empty set \(\{\}\), and note that there isn't any function from \(\{4\}\) to the empty set, since there's no way to transform \(4\) into nothing! The real function is on the collections of objects (and morphisms), of which \(\{4\}\) and \(\{\}\) are just regular old members. This fact makes functors useful in programming. An "endofunctor" is a looping morphism in \(\texttt{Cat}\), so it goes from a category back to the same category. A popular example in programming is the List functor, which maps any type T into the type of lists whose elements are of type T, called something like List<T>. Again, T could be void and this mapping is still perfectly fine, because we're just transforming types, not the values of those types. List<void> is a perfectly valid type and has exactly one value, namely []. The List functor maps morphisms (functions) \(f:T\to U\) to \(\texttt{map}\;f:\texttt{List}\;T\to\texttt{List}\;U\), functions which apply \(f\) to each element in the list to produce a new list. That's something we do a lot in functional programming and I think that's awesome!

\(\texttt{Cat}\) has exponential objects too (objects representing arrows), called "functor categories." Remember, the objects of \(\texttt{Cat}\) are categories, so exponential objects are categories. A functor category representing the arrows in \(\texttt{Cat}\) from a category \(C\) to a category \(D\) is written \([C,D]\) (or, in typical exponential notation, \(D^C\)). The objects in such a category are functors from \(C\) to \(D\). The arrows transform one functor to another, and are called "natural transformations." Similar to how functors are like functions with an extra restriction that makes them useful, natural transformations are transformations that satisfy a "naturality condition," hence calling them "natural." Formally, for two functors \(F,G:C\to D\), a natural transformation \(\alpha:F\Rightarrow G\) is a family of morphisms in \(D\) transforming "outputs" of \(F\) into "outputs" of \(G\). Namely, \(\alpha_X:FX\to GX\). So going back to the example of a addition monoid-based category \(M_+\) and a multiplication monoid-based category \(M_\cdot\), \(2^-,3^-:M_+\to M_\cdot\) are two functors (each doing exponentiation) and if I map all the powers of two in \(M_\cdot\) to the corresponding powers of three then that transforms \(2^-\) into \(3^-\). The naturality condition that natural transformations must satisfy is depicted below as a diagram that must commute, and also illuminates how the transformation works:

That is, \(Gf\circ\alpha_X=\alpha_Y\circ Ff\). Note that this entire diagram takes place in \(D\), and the only mention of \(C\) are \(X\) and \(Y\), which are objects in \(C\).

I've heard many times that being able to talk about and study natural transformations is the original "point" of category theory. Note once more that natural transformations, transforming functors into other functors, aren't some new special feature we just added to category theory. They're just arrows in categories! Not only that, but their abstract meaning in a functor category is more concrete in the categories functors act on, where they are just families of arrows. In functional programming we're quite comfortable with families of morphisms: they're (parametrically) polymorphic functions! Indeed, it's been proven that a parametrically-polymorphic function f<T>: F<T> -> G<T> (or, in a more Haskelly notation, f :: F a -> G a) satisfies the naturality condition and and is a natural transformation if F and G are functors. So the abstract-ness of natural transformations gets concrete pretty quickly in, say, a language like Haskell.

One of the main reasons natural transformations felt like arcane magic to me for a long time was that people often draw them as arrows between arrows, like so:

This makes it appear as though there's some new thing we're allowed to do with categories, while in reality a natural transformation is just another regular morphism in some category somewhere. This notation does, however, help illustrate a complication with natural transformations: there are two different ways to compose them! "Vertical" composition is the kind you might expect:

But there's also "horizontal" composition, which I could draw either of two ways:

This is a powerful form of composition that gets used a lot but one that took me a while to wrap my head around. A composition of functors \(H\circ F\) is just some functor out there, and so is \(K\circ G\). They may behave quite differently from \(H\) or \(F\) or \(K\) or \(G\). It seems strange that transformations from \(F\) to \(G\) and \(H\) to \(K\) would be enough to find a natural transformation between these two mysterious functors. But indeed they are! Consider the natural transformation written "pointwise" or "by components:" \(\gamma_X:H(FX)\to K(GX)\). (\(\gamma\) is what I'm calling the natural transformation \(\beta\circ \alpha\).) If we think just for a moment for cases where objects are types and arrows are functions, this is a polymorphic function that takes a value of type H (F a) (using a Haskelly notation) and returns a value of type K (G a). Available to us are two polymorphic functions alpha: F a -> G a and beta: H a -> K a. So we can define \(\gamma_X\) as gamma hfa = beta (fmap alpha hfa) in Haskell, or \(\gamma_X=\beta_X\circ H\alpha_X\) in category theory notation. You could have instead done \(\gamma_X=K\alpha_X\circ \beta_X\), and because of the naturality condition these two definitions are equal. So we define the horizontal composition \(\beta\circ\alpha=\gamma\).

Now one last tricky thing is that if we have two ways to compose, is the following diagram potentially ambiguous?

We've got a square of natural transformations here, and it's unclear if we horizontally compose into a column first and then vertically compose, or if we vertically compose into a row first and then horizontally compose. Luckily, the naturality condition of natural transformations guarantees that either way produces the same result! Either way you end up with the same \(K\circ F\to M\circ H\) natural transformation. This is called the Interchange Law, and it has some pretty cool consequences I'll get to in future posts.

A popular example of natural transformations in a language like Haskell are monads. Note that my explanation here won't help much with understanding how to use monads in your code, I'm just discussing the math. A monad is an endofunctor \(T:C\to C\) with two natural transformations \(\eta:id_C\Rightarrow T\) and \(\mu:T\circ T\Rightarrow T\). There are also some conditions to satisfy, but let's start by unpacking just that much! \(id_C\) is an identity arrow in \(\texttt{Cat}\), so it's a functor that maps every object and morphism in \(C\) into that same object or morphism. \(T\circ T\) is the composition of \(T\) with itself, which is a thing you can do when an arrow loops from an object back to the same object (remember, \(T\) is an endofunctor, aka a looping arrow in \(\texttt{Cat}\)). So we've got three functors we're talking about here (\(id_C\), \(T\), and \(T\circ T\)) and two natural transformations to get between them. Now, the restrictions on these natural transformations are nicely illustrated in two commutating diagrams:

Note that compositions in these arrows are compositions of natural transformations. In particular, they are horizontal compositions, which you can only really know by context. Recall that \(id_C\circ T=T\), so \(\eta\) transforms the implicit \(id_C\) into \(T\) and \(id_T\) (an identity arrow in the functor category \([C,C]\)) does nothing \(T\), so horizontally composed into \(\eta\circ id_T\) they transform \(T\) into \(T\circ T\). If the arrow were a vertical composition instead, then it would mean \(id_T\) would transform \(T\) into \(T\) and then you'd do \(\eta\) on \(T\), which makes no sense because \(\eta:id_C\to T\) and isn't an arrow coming out of \(T\). In the second diagram, \(id_T\circ\mu:T\circ T\circ T\to T\circ T\) might be more clear when written as \(id_T\circ\mu:T\circ(T\circ T)\to T\circ T\), which is equal because composition is associative.

Sometimes writing natural transformations as \(\alpha_X: FX\to GX\) instead of \(\alpha:F\to G\) clarifies what's going on, making it feel more concrete. Note that in Haskell we'd write return :: a -> T a for some monad T instead of \(\eta:id_C\to T\), because we write natural transformations as polymorphic functions alpha :: F a -> G a and \(id_C\) applied to any type a is just equal to that same type a. And for \(\mu\) we have the polymorphic function join :: T (T a) -> T a for any monad T.

The last point I'll bring up about category theory on category theory is the notion of "enrichment." A typical category has sets \(\texttt{Hom}(X,Y)\) of arrows going from any object \(X\) to any object \(Y\) (it might be the empty set if there are no such arrows). The fact that these collections of arrows are sets means a category is "locally small," and sometimes that's taken as part of the definition of a category. But the idea of a category can be generalized to have other kinds of collections for the arrows between objects. If the collection is a "class" and too big to form a set, then the category can be called "locally large," for example. When generalizing categories this way, we pick a category and say that these collections are objects in that category. For example, a locally small category (aka the normal kind) uses the category \(\texttt{Set}\) of all sets (it's a large category), which means that the collections of arrows are sets. In this system, the idea of "homset" gets replaced with what are called "hom-objects," since they're objects in a category. You then say that locally small categories are "enriched over \(\texttt{Set}\)." If the collection of arrows is a small category instead of a set, then you'd generally say the hom-objects are in \(\texttt{Cat}\).

Funnily enough, \(\texttt{Cat}\) itself is an example of a category enriched over \(\texttt{Cat}\)! That only means that the collections of functors between categories in \(\texttt{Cat}\) are categories, not just sets. Also, \(\texttt{Set}\) is enriched over \(\texttt{Set}\), since function spaces \(A\to B\) are just sets of functions. Remember, exponential objects are objects representing arrows, and \(\texttt{Set}\) has them because a homset is a set, and \(\texttt{Cat}\) has them because the collection of functors from an object \(C\) to an object \(D\) form a category. So indeed, a category being enriched over itself just mean it has exponential objects for all its arrows!

Because \(\texttt{Cat}\) is enriched over \(\texttt{Cat}\), the arrows between two objects (categories) form a category, and we can say there are arrows between the arrows. That's what leads to the above notation of natural transformations as arrows between arrows. We call \(\texttt{Cat}\) a "2-category" because there are arrows between the arrows, the idea being that a set is only objects and is a "0-category," and then adding arrows between objects is a "1-category." If you have arrows between arrows between arrows then it's a 3-category. Note that the category of 1-categories (which is \(\texttt{Cat}\)) is a 2-category; the category of 2-categories is a 3-category and this pattern continues indefinitely. I'm told advanced researchers are studying \(\infty\)-categories but we're well beyond what I know now!

Other important concepts

It's unusual that I introduce natural transformations before universal constructions (like, say, a product). That choice has made some things hard (I still owe you a formal definition of exponential objects!) but I think it does more good than bad, and I have enormous power now to talk deeply about these other powerful concepts.

One simple one to start off with is "dual" categories. Any category \(C\) has a dual category, written \(C^{op}\), where you simply flip the direction of all the arrows. This is still a valid category because if you had two morphisms \(f:X\to Y\) and \(g:Y\to Z\) that compose into \(g\circ f:X\to Z\) in \(C\) then in \(C^{op}\) you have \(f:Y\to X\), \(g:Z\to Y\), and \(f\circ g:Z\to X\) which is all perfectly valid category theory stuff. Looping arrows (including identity arrows) still loop on the same object, though their actual definition might change if they aren't identity arrows. Obviously, the dual of a dual is the original category.

I like to think of objects in dual categories as "absences" of objects. This is building on the perspective of morphisms as transformations. So if there's an object \(X\) in a category \(C\), the corresponding \(X\) in \(C^{op}\) represents a hole that would be filled if we had an \(X\). For example, if \(C\) is a category of types and functions, and has some object int for integers, then the corresponding object in \(C^{op}\) means not having an integer. Where does this weird idea come from? Well, if I have a function like f: bool -> int then I could use it to produce an integer and fill the hole, but that just leaves me needing a boolean! So a function from bool to int transforms a need for integers into a need for booleans!

To give this intuition a bit more credit, category theory uses the dual category to represent the inputs to arrows. Let me explain. Consider a functor \(FX=X\to A\) for some fixed object \(A\). So in a category of types and functions, \(F\) would map the type \(\texttt{int}\) of integers to the type \(\texttt{int}\to A\) of functions from integers to \(A\). Except that's not quite right, because the first \(\texttt{int}\) is a presence of integers and the second \(\texttt{int}\) is a need for integers. Therefore we actually say \(F\) maps objects from a dual category into objects in the original category: \(F:C^{op}\to C\). Alternatively, we say \(F\) is a "contravariant" functor from \(C\) to \(C\), which just means that the input category is actually the dual of what's written. All the other functors are then called "covariant;" these are just terms you have to memorize, which I personally find a bit painful, but I guess they make sense in some other fields of math.

The other use of dual categories is as a way of finding interesting ideas. Consider the monad defined earlier, which had certain commutative diagrams it had to satisfy. What would you be defining if you flipped all those arrows? A monad in a dual category \(C^{op}\) becomes a "comonad" in \(C\), which is a less famous concept but certainly a powerful one that I've mentioned in past posts and I'm sure will come up again in future ones. The idea is that if you flip all the arrows in a definition like the definition of monads, you use the "co-" prefix to name the new concept you find. Here's the commutative diagrams a comonad must satisfy, just so you can see the flipped arrows (\(\epsilon\) is "co-\(\eta\)" and \(\delta\) is "co-\(\mu\)"):

This whole "co-" business with dual categories is the idea behind the "a coconut is a nut" category theory joke :)

Another important concept to discuss is the idea of a "universal construction." The idea is we have some object we want to define, so we specify some condition it has to specify, and then either the category has no objects that satisfy it, has one object that satisfies it, or has multiple objects that satisfy it. In the first two cases our job is done, but if we have multiple objects that satisfy our condition then we need to specify how we're going to pick out the object we're looking for. In a universal construction, this means specifying how every other candidate interacts with the one we're looking for. If there is such an object that interacts with the rest in that way, then it's the object we're looking for; if not, then the category just doesn't have the object we're looking for. Whew, that was abstract! Let's look at an example. A "product" between two objects \(X\) and \(Y\) is written \(X\times Y\) and is an object with morphisms \(p:X\times Y\to X\) and \(q:X\times Y\to Y\), called "projections." If there are multiple such objects, then they all have a unique arrow to the product such that the following diagram commutes:

So there's just one arrow \(h\) from a candidate to the product and the morphisms from a candidate to \(X\) and \(Y\) are equal to the composition of \(h\) and the respective morphisms of the product. Note that when the candidate is the product, \(h=id_{X\times Y}\). If these restrictions can't be satisfied, then \(X\) and \(Y\) don't have a product.

If every pair of objects has a product in the category, the category is "cartesian," and called a "cartesian monoidal category." I'll touch on the "monoidal" part in a moment, but I just want to mention that in \(\texttt{Set}\), the category of all sets and functions, products are sets of pairs \(A\times B\) where the first component comes from some set \(A\) and the second component comes from \(B\). The most famous example by far is the cartesian plane, where we do geometry in high school! It's the set of points \((x,y):\mathbb{R}\times\mathbb{R}\), and it's most people's first time hearing the word "cartesian." These sets of pairs are called "cartesian products" in set theory, hence the terms "cartesian plane" and "cartesian monoidal category," and the name "product" (and corresponding multiplication symbol \(\times\)) for the object. Also note that the cartesian product of two sets \(A\) and \(B\) has a number of elements as the number of elements in \(A\) times the number of elements in \(B\).

\(\texttt{Cat}\) is a cartesian monoidal category, so you can take two categories \(C\) and \(D\) and they have a product \(C\times D\). What is this new category? The objects are pairs of objects \((X,Y)\) where \(X\) is an object in \(C\) and \(Y\) is an object in \(D\). The morphisms are pairs of morphisms \((f,g):(X,Z)\to(Y,W)\) where \(f:X\to Y\) is a morphism in \(C\) and \(g:Z\to W\) is a morphism in \(D\). \(C\times D\) is called a "product category." If you want your functor to take two arguments (that is, maps objects and arrows from two different categories into objects and arrows in a third category) then you say it comes from a product category. Functors going out from product categories are called "bifunctors." The condition functors must satisfy (called "functoriality") applies to bifunctors too, of course, and is then called "bifunctoriality:" \(F((h,k)\circ(f,g))\)\(=\)\(F(h\circ f,k\circ g)\)\(=\)\(F(h,k)\circ F(f,g)\).

It's a good time to talk about "isomorphisms." In category theory, an isomorphism is just when two objects have arrow between them that "cancel out." Formally, if two objects \(X\) and \(Y\) have arrows \(f:X\to Y\) and \(g:Y\to X\) such that \(g\circ f=id_X\) and \(f\circ g=id_Y\), then \(X\) and \(Y\) are "isomophic," written \(X\cong Y\), and \(f\) and \(g\) form an isomorphism, written \(X\leftrightarrow Y\) in diagrams. Why do isomorphisms matter? Well, consider an object \(Z\) isomorphic to a product \(X\times Y\):

For one, \(Z\) is a candidate for the universal construction of products, since there are morphisms \(p\circ f:Z\to X\) and \(q\circ f:Z\circ Y\). But there's an arrow \(g\) going from the product to \(Z\), and any other candidate also has an arrow going to \(Z\): \(g\circ h\) for the \(h\) of that candidate! This means that objects isomorphic to products are just as good candidates for products. Therefore we say a product is defined "up to isomorphism," which means there might be multiple products but they're all isomorphic, and thus in some important sense equivalent to each other. Why equivalent? Well, any commutative diagram with a product in could have the product swapped out with anything isomorphic, and the diagram would still commute! You just need to replace any arrow going to the product \(s:W\to X\times Y\) with the arrow \(g\circ s:W\to Z\), and any arrow going out of the product \(t:X\times Y\to W\) (including the projections \(p\) and \(q\)) with the arrow \(t\circ f:Z\to W\). Philosophically, if we think of an object as fully explained by its morphisms, well, objects that are isomorphic have the essentially the same morphisms as each other, because of composition with the isomorphism, so then the objects are philosophically identical.

Isomorphisms make products a little odd. Since we define things up to isomorphism, products are actually often commutative: \(X\times Y\) is often actually isomorphic to \(Y\times X\)! Consider \(\texttt{Set}\), where these objects are sets of pairs. A function \(f(x,y)=(y,x)\) will map pairs in \(X\times Y\) to pairs in \(Y\times X\), and it is its own inverse, so \(f\circ f=id\). From the same sort of reasoning, products are generally associative: \(X\times(Y\times Z)\cong(X\times Y)\times Z\). That means we often talk about products with more than two components, like \(X\times Y\times Z\).

Now, that's enough about products. Remember dual categories! We just defined products, so we must have just defined something called "coproducts" as well! A product \(X\times Y\) in a dual category \(C^{op}\) is a coproduct in \(C\), written \(X+Y\). This means two things. First, a coproduct has arrows coming to it from the objects \(X\) and \(Y\), which we often call \(\texttt{inl}\) and \(\texttt{inr}\) instead of \(p\) and \(q\) respectively. Second, the commutative diagram of the universal construction gets its arrows flipped as well, so a coproduct has a single arrow going to all the other candidates, instead of coming from them. A category that has a coproduct for every pair of objects is called a "cocartesian monoidal" category.

The first question to ask: are coproducts a useful idea in any way? It turns out they're extremely useful: in \(\texttt{Set}\), they're set unions! In a category of types and functions, they're sum types (like an enum type in Rust, or datatypes with multiple variants in Haskell), hence the \(+\) notation. Also note that the union of two sets \(A\) and \(B\) has a number of elements equal the number of elements of \(A\) plus the number of elements of \(B\), assuming no overlap.

Another simple and powerful idea is that of the "terminal object," often written \(1\). This is a very simple universal construction: it's the object where every object has exactly one arrow (no more, no less) going to it. These objects tend to be uninformative: in \(\texttt{Set}\) these are singleton sets (which are all isomorphic to each other, so it's fine to call one the terminal object). There's only one function from a set to a singleton set: the function that ignores its argument and returns the element of the singleton set no matter what. Terminal objects have one arrow to themselves as well, of course, which must be the identity arrow. Note that, up to isomorphism, terminal objects are a do-nothing ("unit") object for products: \(X\cong X\times 1\cong 1\times X\). That's because if you have an \(X\), you can produce an \(X\times 1\) because you can always produce \(1\), and (like if \(1\) is a set) it'll always be the same value "in" \(1\), up to isomorphism. Likewise, given an \(X\times 1\), you can use the projection \(p\) to get the \(X\). These two things constitute an isomorphism.

Now, the dual construction is called an "initial" object, written \(0\). These have exactly one arrow to every object, including to themselves via the identity arrow. In \(\texttt{Set}\) this is the empty set. It has one arrow to itself (an empty set of pairs) and one function to every other set (also an empty set of pairs). Initial objects are the unit for coproducts: \(X\cong X+0\cong 0+X\). In \(\texttt{Set}\), this is shown by how the union of empty set with any other set is just that other set!

Now what is this "monoidal" thing that keeps getting mentioned? Is it the same as those monoid-based categories from earlier? Nope. The actual definition of a monoidal category is a little painful in its full generality, so I'll just define "strict" monoidal categories, which are what people usually have in mind anyway. A strict monoidal category is a category \(C\) with a bifunctor \(\otimes:C\times C\to C\) called the "tensor product" that is associative (\(X\otimes(Y\otimes Z)=(X\otimes Y)\otimes Z\)) and has an object \(1\) called the "unit" such that \(1\otimes X=X=X\otimes 1\) for any \(X\) in \(C\). In a monoidal category, these equations are only required to isomorphisms instead, though the precise details of that are a little hairy. Cartesian monoidal categories have \(\times\) as the tensor product and the terminal object as the unit. Cocartesian monoidal categories have \(+\) as the tensor product and the initial object as the unit. That's what makes these categories monoidal. It's also interesting to realize that \(\times\) and \(+\) are actually bifunctors!

Let's finally define exponential objects! In a monoidal category, given objects \(X\) and \(Y\), an exponential object \(Y^X\) is an object such that there's an arrow \(\texttt{eval}:Y^X\otimes X\to Y\). It's a universal construction, so for any candidate exponential object \(Z\) with its arrow \(e:Z\otimes X\to Y\), there's exactly one arrow \(u: Z\to Y^X\) such that the following diagram commutes:

The notation \(u\otimes id_X\) is taking advantage of the fact that \(\otimes\) is a bifunctor, so can map a pair of morphisms \(u:Z\to Y^X\) and \(id_X:X\to X\) into a morphism from \(Z\otimes X\) to \(Y^X\otimes X\). With this sort of notation you have to just know that the things on either side of the \(\otimes\) are morphisms, not objects, to know what's going on. Note that when the tensor product is a (cartesian) product, we write \(\times\) instead of \(\otimes\).

So the idea of an exponential object \(Y^X\) is that if you have an \(Y^X\) and an \(X\), there's a way to produce a \(Y\). In that way, exponential objects behave like the collection of arrows from \(X\) to \(Y\). When a category is enriched over itself, like \(\texttt{Set}\) and \(\texttt{Cat}\) are, it just means that the exponential objects and the collections of arrows are actually one and the same. For example, in \(\texttt{Set}\), a function space \(A\to B\) is the set of all functions from the set \(A\) to the set \(B\), which is equal to the homset \(\texttt{Set}(A,B)\). Such categories are "monoidal closed categories," meaning any collection of arrows from an object \(X\) to an object \(Y\) has a corresponding exponential object \(Y^X\). Exponential objects only make sense in monoidal categories because you need some way of saying you have both the input object \(X\) and the arrows \(Y^X\) to get to the output object \(Y\). To say "two objects together as an object," we use the monoidal bifunctor \(\otimes\).

If that bifunctor is \(\times\), then we instead call it a "cartesian closed category," sometimes written CCC. If the bifunctor is \(+\) then we call it a "cocartesian closed category," sometime written CoCCC. And if a category has both all products and all coproducts then we call it a "bicartesian monoidal category," and if it also has all exponential objects then it's a "bicartesian closed category," sometimes written BCCC. \(\texttt{Set}\) is a BCCC because of cartesian products, singleton sets, unions, the empty set, and function spaces. Categories of types and functions, used in functional programming, are often BCCCs because of tuple types, the unit type (), tagged unions, the empty type void, and function types.

The dual concept, called coexponentials and written \(Y-X\), are a thing but I won't talk much more about them here. I'm sure I'll discuss them in depth in future posts, because they're quite interesting in the study of logic. The term for having all coexponentials is "coclosed," and a "cocartesian coclosed category" comes up enough that it's worth mentioning, and means having all coproducts, an initial object, and all coexponentials. These are dual to cartesian closed categories (CCCs). Again, I'll likely write more about these categories in the future but they're not important enough to cover here.

Notice that any of the commuting diagrams in this post are perfectly valid as tiny little categories on their own, if we of course include the implicit identity arrows and composed arrows. A functor from such a "diagram category" into a category \(C\) is essentially just picking places in \(C\) that match the diagram; functors preserve commuting diagrams, after all. The collection of all functors from a diagram category into \(C\) is the collection of all collections of objects and arrows in \(C\) that satisfy the diagram. For this reason, the formal definition of a diagram is a functor from a diagram category into the category of interest, often written \(F:D\to C\).

One interesting diagram category is the category with a single object and a single morphism (the identity morphism). Functors from such a category into a category \(C\) just pick an object in the category, and the collection of all such functors is basically the collection of objects in \(C\). This category (and all the categories isomorphic to it) is the terminal object in \(\texttt{Cat}\), so we might call it \(1\). Similarly, a diagram category of two objects with one arrow between them leads to picking all the morphisms in \(C\). These categories give a way for reasoning about the internals of categories in \(\texttt{Cat}\), which can only talk about functors between categories instead of the objects and arrows within them.

This formal idea of diagram gives an alternative way of thinking about many universal constructions, which I personally prefer. Consider products. Instead of a universal construction, we could define products first with the diagram category of two objects with only identity arrows. I'll call this category \(\mathbf{2}\). Functors from this category into a category \(C\) just pick two objects in \(C\), which could even be the same object. Now consider what's called a "constant functor," a functor that maps all objects into some specific object \(X\) and map all arrows into \(id_X\). This is typically written \(\Delta_X\). There's a constant functor \(\Delta_c:\mathbf{2}\to C\) for every object \(c\) in \(C\). For a given diagram \(F:\mathbf{2}\to C\) and a given constant functor \(\Delta_c:\mathbf{2}\to C\), there might be a natural transformation \(\alpha:\Delta_c\to F\). If there is, its naturality square looks like the following:

It's a triangle instead of a square because \(\Delta_c X=\Delta_c Y\) so the top corners of the square are the same point. There's no arrow between \(FX\) and \(FY\) because it would have to be \(Ff\) for some morphism \(f:X\to Y\) but there are no such arrows in the diagram category. Anyway, hopefully you can see that this natural transformation makes \(c\) a possible ("candidate") product of \(FX\) and \(FY\)! Now, instead of any universal construction stuff, we take all the constant functors that have a natural transformation to \(F\), and form a new category out of the corresponding constant objects (all the the "\(c\)"s if you will). The arrows in this new category will be just whatever arrows those objects had between each other in \(C\). Now we can simply define the product of \(FX\) and \(FY\) to be the terminal object in this new category!

This general trick using functors as diagrams, constant functors as particular objects, natural transformations between the two, and forming categories which might have initial or terminal objects, is a very cool and broad technique that is, in certain situations, more flexible and (in my opinion) concrete than a universal construction. For example, if we take natural transformations to constant functors, then the initial object in the resulting category is the coproduct of the pair of objects. (Draw that out!) Moreover, I find that thinking about definitions this way gives me a much deeper understanding of universal construction!

A natural transformation from a constant functor to a diagram is called a "cone," because you draw the one point at the top with arrows going down to all the objects in the diagram which then looks like the bottom of a cone or pyramid. The category of constant functors with natural transformations to a diagram is called a "cone category" for that diagram. The terminal object in a cone category is called a "limit" of the diagram. Thus products are limits. Flipping the arrows (aka looking at the same thing in a dual category), a natural transformation to a constant functor from a diagram is called a "cocone." The initial object in a cocone category is called a "colimit." Thus coproducts are colimits. Note the generality of limits and colimits: they could be about any diagram, not just a two-object two-arrow one.

Conclusion

Whew! That was quite a long ride! But by the end of it we're doing fairly serious work in category theory, the sort one might see in academic papers. I know I didn't have much chance to explain the value of many of these concepts, except in defining other concepts of unclear value, but I hope at least that this post can serve as the door to a very impenetrable field. If you find yourself needing to know what "the terminal object in a cone category" is talking about, this post gives direction. When I started out, looking at definitions on Wikipedia since I didn't know any better, I felt that everything was defined in terms of everything else and there were a million names for things, so zero way for me to get even a foot in the door. This post aims to be exactly what I needed in that moment: a slow, measured explanation from the ground up of the concepts, names, and intuitions. You may have notice that I put a particular amount of effort into giving the names and notations for every concept, often putting names in quotations when I define them. This kind of resource would serve as a guide on the side as I attempt to parse a comically-terse Wikipedia entry.

All of that is just to say that my goal here was both to build up concepts slowly and incrementally with minimal prerequisites, and to be as searchable as possible with command-F, control-F, or search engines. I hope I did a decent job at these two goals.

I should also mention the materials I came across in my personal self-teaching journey. Bartosz Milewski and nLab are well-known resources that I found helpful, the first for its beginner-friendly nature and presentation of concepts in a proper order, and the second for details on particular concepts. Be warned, though, that nLab can be hilariously hard to read for beginners. Milewski also has a famous blog, if you prefer the medium of text. I've also found Wikipedia to be helpful at times, in the same way as nLab, but it's known to be inaccurate for fuzzier/more-philosophical concepts like constructivism. Less well-known (but super valuable) resources are Richard Southwell and the Catsters. For certain harder concepts, these last two resources were more helpful than nLab or Wikipedia by far, and they cover a few important concepts that Milewski doesn't. I learned the sequent calculus, Heyting algebra, and the theory and intuition of dependent types from Southwell, and can't recommend him enough.

If you like my work, consider supporting me on Ko-fi or GitHub Sponsors! At the time of writing I'm job-searching, and anything in the meantime really helps! My biggest dream is to be able to make content like this full-time, including videos as well, from this kind of support.