On suboptimal optimization.

On suboptimal optimization.

I’ve been helping a friend learn the math behind optimization so that she can pass a graduation-requirement course in linear algebra. 

Optimization is a wonderful mathematical tool.  Biochemists love it – progression toward an energy minimum directs protein folding, among other physical phenomena.  Economists love it – whenever you’re trying to make money, you’re solving for a constrained maximum.  Philosophers love it – how can we provide the most happiness for a population?  Computer scientists love it – self-taught translation algorithms use this same methodology (I still believe that you could mostly replace Ludwig Wittgenstein’s Philosophical Investigations with this New York Times Magazine article on machine learning and a primer on principal component analysis).

But, even though optimization problems are useful, the math behind them can be tricky.  I’m skeptical that this mathematical technique is essential for everyone who wants a B.A. to grasp – my friend, for example, is a wonderful preschool teacher who hopes to finally finish a degree in child psychology.  She would have graduated two years ago except that she’s failed this math class three times.

I could understand if the university wanted her to take statistics, as that would help her understand psychology research papers … and the science underlying contemporary political debates … and value-added models for education … and more.  A basic understanding of statistics might make people better citizens.

Whereas … linear algebra?  This is a beautiful but counterintuitive field of mathematics.  If you’re interested in certain subjects – if you want to become a physicist, for example – you really should learn this math.  A deep understanding of linear algebra can enliven your study of quantum mechanics.

The summary of quantum mechanics: animation by Templaton.

Then again, Werner Heisenberg, who was a brilliant physicist, had a limited grasp on linear algebra.  He made huge contributions to our understanding of quantum mechanics, but his lack of mathematical expertise occasionally held him back.  He never quite understood the implications of the Heisenberg Uncertainty Principle, and he failed to provide Adolph Hitler with an atomic bomb.

In retrospect, maybe it’s good that Heisenberg didn’t know more linear algebra.

While I doubt that Heisenberg would have made a great preschool teacher, I don’t think that deficits in linear algebra were deterring him from that profession.  After each evening that I spend working with my friend, I do feel that she understands matrices a little better … but her ability to nurture children isn’t improving.

And yet.  Somebody in an office decided that all university students here need to pass this class.  I don’t think this rule optimizes the educational outcomes for their students, but perhaps they are maximizing something else, like the registration fees that can be extracted.

Optimization is a wonderful mathematical tool, but it’s easy to misuse.  Numbers will always do what they’re supposed to, but each such problem begins with a choice.  What exactly do you hope to optimize?

Choose the wrong thing and you’ll make the world worse.

#

Figure 1 from Eykholt et al., 2018.

Most automobile companies are researching self-driving cars.  They’re the way of the future!  In a previous essay, I included links to studies showing that unremarkable-looking graffiti could confound self-driving cars … but the issue I want to discuss today is both more mundane and more perfidious.

After all, using graffiti to make a self-driving car interpret a stop sign as “Speed Limit 45” is a design flaw.  A car that accelerates instead of braking in that situation is not operating as intended.

But passenger-less self-driving cars that roam the city all day, intentionally creating as many traffic jams as possible?  That’s a feature.  That’s what self-driving cars are designed to do.

A machine designed to create traffic jams?

Despite my wariness about automation and algorithms run amok, I hadn’t considered this problem until I read Adam Millard-Ball’s recent research paper, “The Autonomous Vehicle Parking Problem.” Millard-Ball begins with a simple assumption: what if a self-driving car is designed to maximize utility for its owner?

This assumption seems reasonable.  After all, the AI piloting a self-driving car must include an explicit response to the trolley problem.  Should the car intentionally crash and kill its passenger in order to save the lives of a group of pedestrians?  This ethical quandary is notoriously tricky to answer … but a computer scientist designing a self-driving car will probably answer, “no.” 

Otherwise, the manufacturers won’t sell cars.  Would you ride in a vehicle that was programmed to sacrifice you?

Luckily, the AI will not have to make that sort of life and death decision often.  But here’s a question that will arise daily: if you commute in a self-driving car, what should the car do while you’re working?

If the car was designed to maximize public utility, perhaps it would spend those hours serving as a low-cost taxi.  If demand for transportation happened to be lower than the quantity of available, unoccupied self-driving cars, it might use its elaborate array of sensors to squeeze into as small a space as possible inside a parking garage.

But what if the car is designed to benefit its owner?

Perhaps the owner would still want for the car to work as a taxi, just as an extra source of income.  But some people – especially the people wealthy enough to afford to purchase the first wave of self-driving cars – don’t like the idea of strangers mucking around in their vehicles.  Some self-driving cars would spend those hours unoccupied.

But they won’t park.  In most cities, parking costs between $2 and $10 per hour, depending on whether it’s street or garage parking, whether you purchase a long-term contract, etc. 

The cost to just keep driving is generally going to be lower than $2 per hour.  Worse, this cost is a function of the car’s speed.  If the car is idling at a dead stop, it will use approximately 0.1 gallon per hour, costing 25 cents per hour at today’s prices.  If the car is traveling at 30 mph without breaks, it will use approximately 1 gallon per hour, costing $2.50 per hour.

To save money, the car wants to stay on the road … but it wants for traffic to be as close to a standstill as possible.

Luckily for the car, this is an easy optimization problem.  It can consult its onboard GPS to find nearby areas where traffic is slow, then drive over there.  As more and more self-driving cars converge on the same jammed streets, they’ll slow traffic more and more, allowing them to consume the workday with as little motion as possible.

Photo by walidhassanein on Flickr.

Pity the person sitting behind the wheel of an occupied car on those streets.  All the self-driving cars will be having a great time stuck in that traffic jam: we’re saving money!, they get to think.  Meanwhile the human is stuck swearing at empty shells, cursing a bevy of computer programmers who made their choices months or years ago.

And all those idling engines exhale carbon dioxide.  But it doesn’t cost money to pollute, because one political party’s worth of politicians willfully ignore the fact that capitalism, by philosophical design, requires we set prices for scarce resources … like clean air, or habitable planets.

On uncertainty (with cartoon ending).

The whole cartoon is at the end.
See this monstrosity, in its entirety, at the end of this essay.

Reading about the uncertainty principle in popular literature almost always sets my teeth on edge.

CaptureI assume most people have a few qualms like that, things they often see done incorrectly that infuriate them.  After a few pointed interactions with our thesis advisor, a friend of mine started going berserk whenever he saw “it’s” and “its” misused on signs.  My middle school algebra teacher fumed whenever he saw store prices marked “.25% off!” when they meant you’d pay three quarters of the standard price, not 99.75%.  A violinist friend with perfect pitch called me (much too early) on a Sunday morning to complain that the birds on her windowsill were out of tune… how could she sleep when they couldn’t hit an F#??

“Ha,” I say.  “That’s silly… they should just let it go.”  But then I start frowning and sputtering when I read about the uncertainty principle.  Anytime somebody writes a line to the effect of, we’ve learned from quantum mechanics that measurement obscures the world, so we will always be uncertain what reality might have been had we not measured it.

My ire is risible in part because the idea isn’t so bad.  It even holds in some fields.  Like social psychology, I’d say.  If a research group identifies a peculiarity of the human mind and then widely publicizes their findings, that particularity might go away.  There was a study published shortly before I got my first driver’s license concluding that the rightmost lanes of toll booths were almost always fastest.  Now that’s no longer true.  Humans can correct their mistakes, but first they have to realize they’re mistaken.

That’s not the uncertainty principle, though.

CaptureAnd, silly me, I’d always thought that this misconception was due to liberal arts professors wanting to cite some fancy-sounding physics they didn’t understand.  I didn’t realize the original misconception was due to Heisenberg himself.  In The Physical Principles of Quantum Theory. he wrote (and please note that this is not the correct explanation for the uncertainty principle):

Thus suppose that the velocity of a free electron is precisely known, while the position is completely unknown.  Then the principle states that every subsequent observation of the position will alter the momentum by an unknown and undeterminable amount such that after carrying out the experiment our knowledge of the electronic motion is restricted by the uncertainty relation.  This may be expressed in concise and general terms by saying that every experiment destroys some of the knowledge of the system which was obtained by previous experiments.

Most of this isn’t so bad, despite not being the uncertainty principle.  The next line is worse, if what you’re hoping for is an accurate translation of quantum mechanics into English.

This formulation makes it clear that the uncertainty relation does not refer to the past; if the velocity of the electron is at first known and the position then exactly measured, the position for times previous to the measurement may be calculated.  Then for these past times ∆p∆q [“p” stands for momentum and “q” stands for position in most mathematical expressions of quantum mechanics] is smaller than the usual limiting value, but this knowledge of the past is of a purely speculative character, since it can never (because of the unknown change in momentum caused by the position measurement) be used as an initial condition in any calculation of the future progress of the electron and thus cannot be subjected to experimental verification.

That’s not correct.  Because the uncertainty principle is not about measurement, it’s about the world and what states the world itself can possibly adopt.  We can’t trace the position & momentum both backward through time to know where & how fast an electron was earlier because the interactions that define a measurement create discrete properties, i.e. they are not revealing crisp properties that pre-existed the measurement.

Heisenberg was a brilliant man, but he made two major mistakes (that I know of, at least.  Maybe he had his own running tally of things he wished he’d done differently).  One mistake may have saved us all, as was depicted beautifully in Michael Frayn’s Copenhagen (also… they made a film of this?  I was lucky enough to see the play in person, but I’ll have to watch it again!) — who knows what would’ve happened if Germany had the bomb?

Heisenberg’s other big mistake was his word-based interpretation of the uncertainty principle he discovered.

CaptureHis misconception is understandable, though.  It’s very hard to translate from mathematics into words.  I’ll try my best with this essay, but I might botch it too — it’s going to be extra-hard for me because my math is so rusty.  I studied quantum mechanics from 2003 to 2007 but since then haven’t had professional reasons to work through any of the equations.  Eight years of lassitude is a long time, long enough to forget a lot, especially because my mathematical grounding was never very good.  I skipped several prerequisite math courses because I had good intuition for numbers, but this meant that when my study groups solved problem sets together we often divided the labor such that I’d write down the correct answer then they’d work backwards from it and teach me why it was correct.

I solved equations Robert Johnson crossroads style, except I had a Texas Instruments graphing calculator instead of a guitar.

The other major impediment Heisenberg was up against is that the uncertainty principle is most intuitive when expressed in matrix mechanics… and Heisenberg had no formal training in linear algebra.  I hadn’t realized this until I read Jagdish Mehra’s The Formulation of Matrix Mechanics and Its Modifications from his Historical Development of Quantum Theory.  A charming book, citing many of the letters the researchers sent to one another, providing mini-biographies of everyone who contributed to the theory.  The chapter describing Heisenberg’s rush to learn matrices in order to collaborate with Max Born and Pascual Jordan before the former left for a lecture series in the United States has a surprising amount of action for a history book about mathematics… but the outcome seems to be that Heisenberg’s rushed autodidacticism left him with some misconceptions.

Which is too bad.  The key idea was Heisenberg’s, the idea that non-commuting variables might underlie quantum behavior.

Commuting? I should probably explain that, at least briefly.  My algebra teacher, the same one who turned apoplectic when he saw miswritten grocery store discount signs, taught the subject like it was gym class (which I mean as a compliment, despite hating gym class).  Each operation was its own sport with a set of rules.  Multiplication, for instance, had rules that let you commute, and distribute, and associate.  When you commute, you get to shuffle your players around.  7 • 5 will give you the same answer as 5 • 7.

CaptureBut just because kicks to the head are legal in MMA doesn’t mean you can do ’em in soccer.  You’re allowed to commute when you’re playing multiplication, but you can’t do it in quantum mechanics.  You can’t commute matrices either, which was why Born realized that they might be the best way to express quantum phenomena algebraically.  If you have a matrix A and another matrix B, then A • B will often not be the same as B • A.

That difference underlies the uncertainty principle.

So, here’s the part of the essay wherein I will try my very best to make the math both comprehensible and accurate.  But I might fail at one or the other or both… if so, my apologies!

A matrix is an array of numbers that represents an operation.  I think the easiest way to understand matrices is to start by imagining operators that work in two dimensions.

Just like surgeons all dressed up in their scrubs and carrying a gleaming scalpel and peering down the corridors searching for a next victim, every operator needs something to operate on.  In the case of surgeons, it’s moneyed sick people.  In the case of matrices, it’s “vectors.”

As a first approximation, you can imagine vectors are just coordinate pairs.  Dots on a graph.  Typically the term “vector” implies something with a starting point, a direction, and a length… but it’s not a big deal to imagine a whole bunch of vectors that all start from the origin, so then all you need to know is the point at which the tip of an arrow might end.

It’ll be easiest to show you some operations if we have a bunch of vectors.  So here’s a list of them, always with the x coordinate written above the y coordinate.

3        4        5        2        6        1         7         3          5

0 ,      0 ,      0 ,      1 ,      1 ,      2 ,       2 ,       5 ,        5

That set of points makes a crude smiley face.

graph-1

And we can operate on that set points with a matrix in order to change the image in a predictable way.  I’ve always thought the way the math works here is cute… you have to imagine a vector leaping out of the water like a dolphin or killer whale and then splashing down horizontally onto the matrix.  Then the vector sinks down through the rows.

It won’t be as fun when I depict it statically, but the math works like this:

Picture 2

Does it make sense why I imagine the vector, the (x,y) thing, flopping over sideways?

The simplest matrix is something called an “identity” matrix.  It looks like this:

Picture 4

When we multiply a vector by the identity matrix, it isn’t changed.  The zeros mean the y term of our initial vector won’t affect the x term of our result, and the x term of our initial vector won’t affect the y term of our result.  Here:

Picture 5

And there are a couple other simple matrices we might consider (you’ll only need to learn a little more before I get back to that “matrices don’t commute” idea).

If we want to make our smiling face twice as big, we can use this operator:

2   0

0   2

Hopefully that matrix makes a little bit of sense.  The x and y terms still do not affect each other, which is why we have the zeros on the upward diagonal, and every coordinate must become twice as large to scoot everything farther from the origin, making the entire picture bigger.

We could instead make a mirror image of our picture by reflecting across the y axis:

-1   0

0    1

Or rotate our picture 90º counterclockwise:

0  -1

1   0

The rotation matrix has those terms because the previous Y axis spins down to align with the negative X axis, and the X axis rotates up to become the positive Y axis.

And those last two operators, mirror reflection and rotation, will let us see why the commutative property does not hold in linear algebra.  Why A • B is not necessarily equal to B • A if both A & B are matrices.

Here are some nifty pictures showing what happens when we first reflect our smile then rotate, versus first rotating then reflecting.  If the matrices did commute, if A • B = B • A, the outcome of the pair of operations would be the same no matter what order they were applied in.  And they aren’t! The top row of the image below shows reflection then rotation; the bottom row shows rotating our smile then reflecting it.

graph-2

And that, in essence, is where the uncertainty principle comes from.  Although there is one more mathematical concept that I should tell you about, the other rationale for using matrices to understand quantum mechanics in the first place.

You can write a matrix that would represent any operation or any set of forces.  One important class of matrices are those that use the positions of each relevant object, like the locations of each electron around a nucleus, in order to calculate the total energy of a system.  The electrons have kinetic energy based on their momentum (the derivative of their position with respect to time) and potential energy related to their position itself, due to interaction with the protons in the nucleus and, if there are multiple electrons, repulsive forces between each other…

Elliptic_orbit(I assume you’ve heard the term “two-body problem” before, used by couples who are trying to find a pair of jobs in the same city so they can move there together.  It’s a big issue in science and medicine, double matching for residencies, internships, post-docs, etc.  Well, it turns out that nobody thinks it’s funny to make a math joke out of this and say, “At least two-body problems are solvable.  Three-body problems have to be approximated numerically.”)

…but once you have a wavefunction (which is basically just a fancy vector, now with a stack of functions instead of a stack of numbers), you can imagine acting upon it with any matrix you want.  Any measurement you make, for instance, can be represented by a matrix.  And the cute thing about quantum mechanics, the thing that makes it quantized, is that only a discrete set of answers can come out of most measurements.  This is because a measurement causes the system to adopt an eigenfunction of the matrix representing that measurement.

An eigenfunction is a vector that still looks the same after it’s been operated upon by a particular matrix (from the German word “eigen,” which means something like “own” or “self”).  If we consider the operator for reflection that I jotted out above, you can see that a vector pointing straight up will still resemble itself after it’s been acted upon.

And a neat property of quantum mechanics is that every operator has a set of eigenfunctions that spans whatever space you’re working with.  For instance, the X & Y axes together span all of two-dimensional space… but so do any pair of non-parallel lines.  You could pick any pair of lines that cross and use them as a basis set to describe two-dimensional space.  Any point you want to reach can indeed be arrived at by moving some distance along your first line and then some distance along your second.

This is relevant to quantum mechanics because any measurement collapses the system into an eigenfunction of its representative matrix, and the probability that it will end up in any one state is determined by the amount of that eigenfunction you need to describe its previous wavefunction in your new basis set.

That is one ugly sentence.

Maybe it’s not so surprising that Heisenberg described this incorrectly in words, because this is somewhat arduous…

Here, I’ll draw another nifty picture.  We’ll have to imagine two different operations (you could even get ahead of me and imagine that these represent measuring position and momentum, since that’s the pair of famous variables that don’t commute), and the eigenvectors for these operations are represented by either the blue arrows or the red arrows below.

graph-3

If we make a measurement with the blue matrix, it’ll collapse the system into one of the two blue eigenvectors.  If we decide to measure the same property again, i.e. act upon the system with the blue matrix again, we’re sure to see that same blue eigenvector.  We’ll know what we’ll be getting.

But once the system has collapsed into a blue arrow, if we measure with the red matrix the system has to shift to align with one of the red arrows.  And our probability of getting each red answer depends upon how similar each red arrow is to the blue arrows… the one that looks more like our current state is more likely to occur, but because neither red arrow matches a blue arrow perfectly, there’s a chance we’ll end up with either answer.

And if we want to make a blue measurement, then red, then blue… the two blue measurements won’t necessarily be the same.  After we’re in a state that matches a red eigenvector, we have some probability to flop back to either blue eigenvector, depending, again, on how similar each is to the red eigenvector we land in.

That’s the uncertainty principle.  That position is simply not well-defined when momentum is precisely known, and vice versa.  The eigenfunctions for one type of measurement do not resemble the eigenfunctions for the other measurement.  Which means that the type of measurement you have to make in order to know one or the other property invariably changes the system and gives you an unpredictable result… it’s like you’re rolling dice every time you switch which flavor of measurement you’re making.

But the measurement isn’t causing error.  It’s revealing an underlying probability distribution.  That is, there is no conceivable “gentle” way of measuring that will give a predictable answer, because the phenomenon itself is probabilistic.  Because the mechanics are quantized, because there are no in-between states, the system flops like a landbound fish from eigenvectors of one measurement to eigenvectors of the other.

Which is why it bothers me so much to see the uncertainty principle described as measurement obscuring reality when the idea crops up in philosophy or literature.  Those allusions also tend to place too much import on the idea of “observers,” like the old adage about a tree making or not making sound when it falls in an empty forest.  Perhaps I did a bad job of this too by writing “measurement” so often.  Maybe that word makes it sound as though quantum collapse requires intentional human involvement.  It doesn’t.  Any interaction between quantum mechanics and a semi-classical system will couple them and can cause the probabilistic distribution of wavefunctions to condense into particle-like behavior.

And I think the biggest difference between the uncertainty principle and the way it’s often portrayed in literature is that, rather than measurements obscuring reality, you could almost say that measurements create reality.  There wasn’t a discrete state until the measurement was made.  It’s like asking an inebriated collegiate friend who just learned something troubling about his romantic partner, “Well, what are you going to do?”  He’ll probably answer.  While you’re talking about it, it’ll seem like he’s going to stick to that answer.  But if you hadn’t asked he probably would’ve continued to mull things over, continued to exist in that seemingly in-between state where there’s both a chance that he’ll break up or try to work things out.  By asking, you learn his plan… but you also forced him to come up with a plan.

And it’s important that our collegian be drunk in this analogy… because making a different measurement has to re-randomize behavior.  Even after he resolves to break up, if you ask “Where should we go for our midnight snack,” mulling that over would make him forget what he’d planned to do about the whole dating situation.  The next time you ask, he might decide to ride it out.  It’s only when allowed to keep the one answer in the forefront of his mind that the answer stays consistent.

The uncertainty principle says that position and momentum can’t both be known precisely not because measurement is difficult, but because elementary particles are too drunk to remember where they are when you ask how fast they’re moving.

And, here, a treat!  As a reward for wading through all this, I’ve drawn a cartoon version of Heisenberg’s misconception.  Note that this is not, in fact, the correct explanation for the uncertainty principle… but do you really need me to sketch a bunch of besotted electrons?

cartoon-title

cartoon-1003

cartoon-summary