When Percy met Pafnuty

Math papers are an odd beast. They out of necessity contain a lot of technical details and can appear quite abstruse. They are filled with jargon, formulæ, and other bits that can easily scare the non-specialist reader.

At the same time, there is a lot of beauty, elegance, and interesting ideas floating about. I hope, over the next few blog posts, to go over some of my published papers and to express their content in a way that will appeal to the interested non-specialist reader (Hi John!).

A happy accident

My first paper came out almost accidentally. I was working on a part of my thesis, mucking around with some computations that I was trying to understand, when I happened across a curious connection between my thesis problem and some mathematical work studied by Percy MacMahon in 1921. I couldn’t explain it, and so it was recommended that I appeal to George Andrews who was far more experienced in this subject. After a quick “I find your question extremely interesting” response from him, a few days later we had put together a short little paper which proved to be invaluable in my thesis.

So let’s go over this paper. It’s not a particularly long one, and it has a couple important ideas that crop up all throughout my work, which we will introduce and discuss here. The overview of the paper is that we prove some surprising connections between some mathematical work of the wonderfully bearded Pafnuty Chebyshev and some other work of Percy MacMahon (whose moustache is pretty spectacular as well). This ends up having some pleasant consequences for my thesis, which will be discussed at a later point.

But what does that all mean? Well, to begin with,

Let’s talk about Generating Functions

Let’s start by assuming you have a collection of numbers that you find interesting. We could start with 1, 2, 3, 5, 7, 11, say. You can start with you favourite set of numbers (I recommend looking around on the Online Encyclopedia of Integer Sequences, for example), your list can be as long—even infinite!—as you like.

Briefly speaking, what we are interested in is a way to talk about all of these numbers together at once, while respecting their order—which, after all, is part of their structure. We want to be able to then manipulate the whole list in ways that illuminate and provide information about the nature of the underlying numbers, or that provide information about whatever our source of these numbers was.

Let’s look at an example. The Fibonacci numbers are the following:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …

That is, we start with the numbers 0 and 1, and then the next number is the sum of the previous two numbers. So for example, we have that 5 + 8 = 13, 5 and 8 being the two numbers before 13.

[There is a fun aside about Fibonacci numbers: If you take the ratio of successive Fibonacci numbers, say 3/2, 5/3, …, 144/89, …, then these get closer and closer to the “golden ratio”, which is approximately 1.619. By pure coincidence, the ratio of a mile to a kilometre is very close to 1.609, which is quite close to 1.619. So if you want to convert from miles to kilometres, you can just choose the next Fibonacci number to get a good approximation. For example, since the next Fibonacci number to 8 is 13, it follows that 8 miles is pretty close to 13 kilometres. In practice, you only need to memorize the first few Fibonacci numbers to get a pretty good guess for this.]

So what is the “generating function”  for the Fibonacci numbers? This is given by

0x0 + 1x1 + 1x2 + 2x3 + 3x4 + 5x5 + 8x6 + 13x7 + 21x8 + …

where you can see that we’ve put the Fibonacci numbers (in red) as the numbers in front of powers of a variable x. To use an analogy due to Herbert Wilf, we can think of a generating function as hanging the Fibonacci numbers on a clothesline.

clothesline

Figure 1: A clothesline

Anyhow, you could ask and answer a lot of questions about these numbers using this clothesline very quickly. For example, you can show that the 7-th Fibonacci number is the closest integer to

\displaystyle \frac{1}{\sqrt{5}}\Big[\frac{1 + \sqrt{5}}{2}\Big]^7

Figure 2: Math!

(Try it out! It’s kind of cool!) This of course allows you to compute other Fibonacci numbers without having to compute all of the ones before it.

The moral of the story is that generating functions are tools that we can use in mathematics to help us understand different quantities by studying their generating functions—and hence better understand whatever the numbers that make them up are.

What about Modular Forms?

So let’s look at a few particular examples of generating functions. Consider a regular lattice of points in a line, a plane, or any higher dimensional space as in the following image.

lattice

Figure 3: Colourful telephone poles

This is a one-dimensional lattice; you can think of it like the integers along a real line, or evenly spaced telephone poles along the road. I’ve marked one special one, which I call the “origin”, in red. What we want to do is count the number of points a fixed distance from this origin. For technical reasons we will see shortly, we count them not by their distance d, but by the square of the distance, d2. For example, the blue dots are of distance 2 from the red dot, but the squared distance is 22 = 4.

We will now write down a generating function so that the coefficient in front of xd2 is the number of points whose squared distance from the origin is d2. For the case above, we obtain the expression

1x0 + 2x1 + 0x2 + 0x32x4 + 0x5 + 0x6 +0x7 + 0x8 +  2x9 + …

since there is exactly one point whose distance is zero from the red dot (the red dot itself), and then two points whose distance is one, two points whose distance is two (so the distance squared is four), two points whose distance is three (and hence distance squared is nine), and so forth. There are of course no points whose distance squared is 2, or 3, and so on. This expression is called the theta function of the lattice.

Our next example is the following lattice.

lattice_sq

Figure 4: Dotty dotty dots

This new example is a two-dimensional square lattice. The colours are chosen so that red is again the origin, and then each other colour represents all the points a fixed distance from the (red) origin. If we count them up, we end up with

1x0 + 4x1 + 4x2 + 4x4 + 8x5 + 4x8 + …

As we discussed above, the powers of x are the squared distance, and the coefficient is the count. So there are 8 orange dots, all of whose squared distance is 5 from the red dot (Remember your Pythagorean theorem—a2 + b2 = c2!).

dist

Figure 5: 1 + 4 = 5, I promise

In particular, this is why we use the squared distance instead of the distance—this will always be a whole number.

Having written these two generating functions down, we get to our point. If you start with a lattice (as always, there are a few technical conditions that I’m eliding over), then when you build a theta function as we did above, you will always get what we call a modular form. In particular, both

1x0 + 2x1 + 2x4 + 2x9 + 2x16 + 2x25 + …

and

1x0 + 4x1 + 4x2 + 4x4 + 8x5 + 4x8 + …

are modular forms.

So what are modular forms? Well, I’m the first to admit that I’m not an expert in their study. To me, they are simply a particular kind of generating functions that interest other mathematicians. More glibly, they are generating functions that have coefficients that are likely to show up on the OEIS. Of course, these theta functions are not the only things that are modular forms; many other generating functions—although a vanishingly small proportion of all generating functions—are modular (The theta functions are however the only ones for which we have any real sense as to what their modularity means are these theta functions; everything else is a mysterious and lovely accident).

In the end, the study of modular forms is a rich and classical subject, that unites many seemingly disparate areas of mathematics. It ties together geometry and number theory (as above), and possibly even also analysis (studying the generating function itself) and even of late, some branches of theoretical physics. They were, in fact, used in 1994 by Sir Andrew Wiles to prove the famous Fermat’s Last Theorem.

A = B

A surprising amount of mathematics comes down to statements of the form A = B. While this seems not so interesting in some cases (1 + 1 = 2), in its most exciting versions we are showing that two seemingly different objects or quantities are equal to each other, or possibly that there are two ways of writing the same thing.

We now get to the main point of our paper. At its heart, it’s an A = B statement, namely that there are two ways of writing a certain generating function (of Chebyshev polynomials, the aforementioned work of Pafnuty Chebyshev). What do we mean by that in this case? Well, look at the following image.

cheb-macmahon

Figure 6: Up Up Down Down Left Right…

If we look at this generating function from one side (so to speak), we get a generating function for Chebyshev polynomials. If we look at it from the other side, we get one for some functions studied by MacMahon. This in and of itself is a nice and non-obvious statement, and was not known by MacMahon. But the real point of the paper was to examine certain consequences of this fact.

The Chebyshev polynomials are given by the rows in the diagram above, and MacMahon’s functions are given by the columns. Due to the fact that they describe the same collection of data—together with what we know about Chebyshev’s polynomials—we can deduce that the columns are modular forms, a fact which was also not known by MacMahon!

To re-iterate, we find a novel way of writing a generating function for Chebyshev polynomials, which we then leverage to show that another type of function is a modular form.

What are those functions, you might ask? They themselves are generating functions, specifically those that count a certain geometric type of object. However, as they are the subject of two other papers that I have written, I’ll discuss them in more detail in a later blog post.

Posted in Uncategorized | Leave a comment

Tropical Geometry, part 4.

We’re finally at the point where we can provide the first definition of Tropical Geometry, and for the sake of personal historicity, it will be the one that I didn’t particularly like when I learned of it.

Remember that the point of the algebraic geometry, as it’s studied, is that we study a geometric object by studying the functions defined on that object. The two perspectives are equivalent, and you can use tools from one to study the other (and of course, vice versa).

As an aside, this is one way that you can understand what is meant by non-commutative geometry, at least coarsely. One fact about all of the geometric objects that we study is that the rings of functions defined on any of these things are what we call commutative. That is, f(x)g(x) = g(x)f(x) for every pair of functions f(x), g(x). If you think about this, this is really a reflection of the fact that at it’s heart, the functions take values in real or complex numbers (well…), and so since those satisfy x \cdot y = y\cdot x, it follows that functions into them do as well.

So where does non-commutative geometry come in? Well, what happens when you study non-commutative rings? What do they represent “geometrically”?

The point in this case is that if we expand generalize the left-hand side of the equivalence

commutative k-algebras \iff geometric stuff

then this in some sense should provide something that is a generalization of the right-hand side as well. This is a well trod-upon tactic in mathematics, and provides us with notions such as a stack (note: not the same as a stack in the computer science world!), or derived schemes, or even derived stacks (combine the two).

Anyhow! So I promised that I would talk about Tropical geometry, and how it fits into the picture. Well, here goes.

See, a ring is something that satisfied a collection of proprties (or axioms) which state how we can multiply and add things together. These basically mean that they behave like the familiar integers, real numbers, or whatever—they look like the normal number systems that we’re all familiar with. It turns out that this list of properties is all we really need to build a phenomenally rich geometric world.

So for Tropical geometry, we look at a slightly different starting point. Consider the real numbers, but with the following funny rules for “addition” and “multiplication”:

x \oplus y = \max\{x, y\}

and

x \otimes y = x + y

Ok, what the hell is this. Multiplication is addition now? Addition is… the maximum? This seems very strange (and it is!) but it turns out that with this bizarre notion of addition and multiplication that we still get a surprising amount of similar properties than normal addition and multiplication have. For example, we still have that

x \otimes (y \oplus z) = (x \otimes y) \oplus (x \otimes z)

i.e. the distributive law. We also have a multiplicative identity (x \otimes 0 = x for every x), we have multiplicative inverses (since x \otimes (-x) = 0, the identity). We even can have an additive identity if we include -\infty in the package. What we can’t have though, is additive inverses and hence no subtraction.

So yeah, weird. Something which satisfies this collection of rules is a semi-ring, and with this in mind, we do exactly what you should be now expecting: Tropical geometry is geometry done using semi-rings.

Tada!

Posted in Uncategorized | Leave a comment

Tropical Geometry, part 3.

So we have discussed the following idea. Given a geometric object X, we can study it by studying the functions that are defined on X, which we will write as \mathcal{O}_X (I’m not actually sure what the \mathcal{O} stands for, but this is in a certain I’m-slightly-lying-to-you way the standard way of writing this).

Now, functions are objects that we can add together (f(x) + g(x)), we can multiply them together (f(x)g(x)), and perhaps if we feel like it, we can also scale them by multiplying them by a real (or even complex) number (\lambda \cdot f(x)). They are, to use mathematical terminology, a ring or an algebra. So restated, as above, we can associate to every geometric thingy X its associated ring/algebra \mathcal{O}_X.

One of the great shifts in the 20th century is that you can actually do the reverse to this as well. That is, to every ring R, there is a canonically associated geometric object (a scheme) which we denote as \mathrm{Spec}\, R. Moreover, these associations are inverse to each other. That is, we have (in a certain sense)

\mathrm{Spec}\, \mathcal{O}_X = X

and

\mathcal{O}_{\mathrm{Spec}\, R} = R.

(I should really stress again that I am slightly lying to you here. There is a context in which this is 100% true, but there are some subtleties to what I am saying. Caveat lector.)

Let’s go over a few examples just to ground ourself here. The simplest non-trivial example in some sense is the following. If we write the ring of polynomials in one variable as \mathbb{C}[x] = \{f(x) = a_0 + a_1x + \cdots + a_nx^n \mid a_i \in \mathbb{C}\} then this is certainly a ring (in fact, as algebra, because you can multiply polynomials by real or complex numbers) since you can add and multiply polynomials together. So what is the corresponding geometric object? It is just the complex plane! The rough idea is that a polynomial is determined by its roots, and so we identify a polynomial f(x) with its zero set. That is,

f(x) \leftrightarrow \{z \in \mathbb{C} \mid f(z) = 0\}

For another similar example, if let consider polynomials in two variables (for example, f(x, y) = 4y^2 - 2xy + 11xy^2 - \pi x) and let the ring/algebra of all of these be written as \mathbb{C}[x,y], then we have that

\mathrm{Spec}\, \mathbb{C}[x,y] = \mathbb{C}^2

and you may be able to guess how this generalizes.

Finally, to tie ourselves into the previous post, consider the following example. Suppose that we define the ring R to be the collection of all two-variable polynomials f(x, y) where we identify any two of them if their difference is a multiple of h(x, y) = x^2 + y^2 - 1. You can check that this makes sense as a definition, but given that, then we have that \mathrm{Spec}\, R iscaveat lector, again  the circle!

So the tl;dr version of this post: up to some finicky details that can be dealt with, algebraic things like rings and algebras are the same as geometric things. This is a powerful, powerful tool.

Posted in Uncategorized | Leave a comment

Tropical Geometry, part 2.

So last post we went over the origin of the name “Tropical Geometry”, but not what it was. I would like to start to do that, but I think that in order to do so we have to take a few steps back and understand a little bit about algebraic geometry as a whole.

The idea of algebraic geometry is to study the geometry of objects defined by algebra. Let’s look at the simplest non-trivial example. As you may recall from high school mathematics, a circle of radius R in the plane can be seen as the set of all solutions to the equation

x^2 + y^2 - R^2 = 0

although I have perhaps written it somewhat idiosyncratically, with all of the terms on the left-hand side of the equals sign. The point is that a circle can be defined by a polynomial equation, and these are the objects that interest us: those geometric figures that can be described by polynomial equations (this is the algebra part of algebraic geometry).

By contrast, if we consider the graph of the function y = e^x, then there is no algebraic equation that the coordinates of this graph will satisfy, and so it is not an object that we are interested in in this context.

So how does one study these? Well, it turns out that a major insight was that you can study objects (geometric or otherwise) by studying all of the functions that are defined on those objects. In our case since we are concerned with—for the time being—figures that are cut out by polynomials in the plane, we are also going to restrict ourselves to considering polynomial-type functions defined on these objects. So what are those?

Well, an obvious source of such a function is any polynomial in the variables x, y. Since our circle lies in the plane, any function defined on the plane a fortiori will define a function on our circle: just define the value of the function on the circle to be the value of the planar function at that point.

The problem with this approach is that you will typically get too many functions. There may be more than one function defined on the plane whose values on the circle are the same! For example, the two polynomials

f(x, y) = x^2

and

g(x,y) = -y^2 + R^2

will secretly yield the exact same function on our circle. The reason is that f(x, y) - g(x, y) = x^2 + y^2 - R^2—but this is the defining equation of our circle! So what we should do is say that any two functions on the plane are, for our purposes, the same function if they differ by the defining equation of our geometric figure. It turns out that if we do this, then we can get a meaningful way to talk about all of the functions on our figure.

Moreover—and this shouldn’t necessarily be obvious—in a certain sense, one can show that if we do this, then the geometric figure is entirely equivalent to the so-obtained functions. That is, it is completely equivalent to study either the figure itself, or the functions as we have described them. This is a very powerful shift in perspective.

Posted in Uncategorized | Leave a comment

Tropical Geometry, part 1.

Tropical geometry is a funny one. When I first learned of it, I had two reactions: first, I hated the name, and second, I thought it was unmotivated and was really just generalization for the sake of generalization.

I was wrong, on both counts.

Let’s talk first about the name, before we get into what Tropical geometry is and why I was wrong about its motivation (or lack thereof). It is named in honour of Imre Simon, a Hungarian-born mathematician living in Brazil. Since he was one of the pioneers in this field, and since he lived in the tropics… whence the name.

I’ve actually heard someone say further that he lived and worked on opposite sides of the tropic of Capricorn, which was also part of the name. That said, I’ve only heard this part once, and so I’m not sure how much I believe it.

Anyhow, originally I disliked the name due to its frivolous nature. Perhaps part of this was due to my initial dislike of the subject, but either way I was bothered by how un-descriptive it was. By contrast, mathematical terms are typically named either after a mathematician or in some descriptive manner. Hilbert space. Sheaf. Étale. Gromov-Witten theory. Solvable. Space-filling curve. Markov process.

In particular, the descriptive naming is something that works very well. The name itself tells you something about what you’re studying, which helps a lot in remembering the ideas involved.

However, one problem that occurs frequently is that mathematicians as a group can be strikingly unimaginative in naming their objects, and so we end up with a proliferation of “normal” objects, or “regular” ones. And one of the most infamous examples, of course, is that it is perfectly reasonable to describe something as being both reduced and irreducible.

By contrast—or even hypocritically—I had always loved the whimsical nature of some of the names coming from physics. I love the name quark, and even more than that I really love their names—strange, charmed—although I would have preferred that they stuck with the “truth” and “beauty” quarks instead of the “top” and “bottom” quarks.

And of course, here is a problem. On one hand, I disliked the term “tropical” for its irreverence, but lauded physicists for their whimsical name choices.

In the end, the name won out, at least to me.

Posted in Uncategorized | 1 Comment

Some thoughts on a provoking discussion

So I recently stumbled upon (via Izabella Laba) the discussion at Scott Aaronson’s blog that arose from the events surrounding the dismissal of Walter Lewin.

Amazingly enough, I actually read through the entire 593 comment responses. This was a surprisingly intelligent a civil discussion (on the internet!) between people who don’t completely see eye-to-eye about everything, and about sexism, no less!

The discussion is a little disheartening for the first (roughly) hundred comments or so. However, starting some time around the linked comment, things get a lot better—as a whole, the major players in the discussion actually listened and seemed to empathize with one another, if not perfectly all the time.

A few thoughts:

  1. I think that Scott (and the main people in the discussion) did a great job of ignoring the more troll-ish posts. There are a number scattered throughout—towards the end, in particular, there is a post which calls for the ban of Amy (if only for a few days!), which thankfully is largely ignored. However…

    Comments such as these are an interesting instance of Lewis’ Law. I do believe that Scott is a good person who—as much as possible—eschews overly sexist views. But it’s interesting that underscoring a discussion about the role of women in STEM fields that there is—quite literally!—a constant low-level buzz of commentary that at the least borders on anti-feminist. So if you were someone reading this post who held views similar to those of Amy (which are not radical in the least), on one hand I would be welcomed that the major discussion is civil and interesting. On the other hand, it’s also believable that you might also feel like the room in which the discussion is happening is subtly hostile to you and your views. Is it surprising that women might be discouraged from self-advocacy in situations like this?

    I really should stress that I think that Scott did a wonderful job in this discussion of staying on point, not engaging the trolls, etc. But the existence of these background comments really does suggest something, I think.

  2. On that note, seriously? Amy is by no means a “radical feminist” in her postings. I would describe her as pretty middle-of-the-road (although that may say more about me than anything, I guess). She advocates for communication and being aware of the existence of structural imbalances. CRAZY AND RADICAL INDEED.
  3. Reading through this sort of discussion really makes me think again about the difficulty of communication when we don’t define our terms—or in this case, when either the context is difficult to convey, or the terms themselves may not be easily definable. Many of the flare-ups that occured throughout the discussion often seemed to result from a mis-reading of what one of the other posters was trying to convey. Not all, certainly, since not everyone agreed on a variety of issues. But there were still many of them.

Anyhow, it was a surprisingly edifying read, although I can’t really say if I would recommend reading through all 593 comments, which will take quite a long time regardless. Still, I’m glad to see that civil discussion about sexism among people who do not agree can take place in this day and age. Kudos to Scott, Amy, Gil, Vijay, dorothy, and a few others.

Posted in Uncategorized | Tagged , , | 2 Comments

Projectivity (continued)

So what does it mean for a variety to be projective? Well, that’s easy: A variety is projective if you can embed it in projective space.

That’s easy, but that’s not particularly helpful.

What are the benefits of being projective? Why is it something that we should care about?

The way I see it, the main advantage of projectivity is that any analytic projective variety is in fact algebraic i.e. it can be described in terms of zero sets of polynomials, and not just analytic functions. This is essentially a loose paraphrase of Chow’s theorem.

So this explains why projectivity is a good thing, but it doesn’t tell us how to detect it. To help with this, let’s consider what we do get if a variety (we will only really care about tori, but for now we will be more general) is projective.

On \mathbb{P}^N, we have the line bundle \mathcal{O}(1). Thus, given a morphism f : X \to \mathbb{P}^N, we can pull this line bundle back to obtain a line bundle L := f^*\mathcal{O}(1). This is an ample bundle; that is, if we take a sufficiently high power L^{\otimes n}, then sections of this new bundle will in fact yield an embedding into projective space of some dimension. More specifically, choose a basis s_0, \ldots, s_N for H^0(X, L^{\otimes n}). Then as this line bundle is base-point free (it comes from a map into projective space), we can consider the map

X \to \mathbb{P}H^0(X, L^{\otimes n}) \qquad x \mapsto (s_0(x) : \cdots : s_N(x))

then this map will be an embedding.

Given such a pair (X, L) consisting of a complex manifold X together with an ample line bundle L, then we can see that it must be projective. Such a pair is called a polarized variety*.

Now, many manifolds come with natural choices of polarizations; for example, all non-elliptic curves have either their canonical or anti-canonical bundle which are ample, and so they are just naturally polarized. Elliptic curves are as well, but you can’t use their canonical bundle, since it is trivial.

The same is of course true with complex tori; their canonical bundles are trivial, and so these do not provide us with a projective embedding. So let’s see what else a polarization gives us.

Let’s consider the first chern class of our line bundle. We have (since we are working with the complex numbers) the exponential sequence of sheaves

0 \to \mathbb{Z} \to \mathcal{O} \to \mathcal{O}^\times \to 0

which yields the long exact sequence some of whose low degree terms are

\cdots \to H^1(X, \mathcal{O}) \to H^1(X, \mathcal{O}^\times) \to H^2(X, \mathbb{Z}) \to \cdots

where H^1(X, \mathcal{O}^\times) is the Picard group of X (denoted Pic(X)); that is, the group of line bundles on X. The map to H^2(X, \mathbb{Z}) is the map which takes a line bundle to its first chern class c_1(L). It is this that we use to understand what makes a manifold projective.

In the case of tori, we know very well what H^2(X, \mathbb{Z}) (henceforth we will omit the coefficient ring if it is the integers) is. In fact, due to the Künneth theorem and the fact that topologically, a complex torus is simply a product of circles, we have the isomorphisms

H^2(X) \cong \Lambda^2 H^1(X) \cong \Lambda^2 H_1(X)^\vee \cong \big(\Lambda^2 H_1(X)\big)^\vee

Exercise: Check these!

That is, an element of H^2(X) can be though of as an alternating bilinear form E on the underlying lattice H_1(X). In particular, the first chern class of a polarization on a complex torus X = \mathbb{C}^k / \Gamma is an alternating form on its underlying lattice \Gamma = H_1(X).

Now, it is not too hard to see (and you should check this) that there is a bijective correspondence between alternating bilinear forms E on a lattice \Gamma \subset \mathbb{C}^k which satisfy

E(iv, iw) = E(v,w)

and hermitian forms H on \mathbb{C}^k which satisfy \mathfrak{Im}\, H(\Gamma, \Gamma) \subset \mathbb{Z}; this is given by the bijection

E(-,-) \qquad \iff \qquad E(i-,-) + iE(-,-)

Another way to say this is that alternating bilinear forms on \Gamma which are compatible with the complex structure on \mathbb{C}^k are (essentially) the same as hermitian forms on \mathbb{C}^k whose imaginary parts take integer values on \Gamma.

And the magic about this is that an element E \in H^2(X) is the first chern class of an ample line bundle if and only if this latter condition is satisfied.

*Well, that’s not exactly correct. It isn’t the line bundle L that is the polarization, but the class of the line bundle in the Neron-Severi group of X. But it’s close enough.

Posted in Uncategorized | 1 Comment