Today I got around to trying out a simplified molecular version of the gate model that will replace my hyperbolic function.

The kinetics are all arbitrary for the model, but the shape of the transfer function looks even better than the made-up model from before. There's an almost perfectly linear section in the middle -- it looks more made-up than my made-up model! This is assuming that all three reactions have the same strength. Next, I need reasonable terms for the three reaction rates.

## Friday, May 29, 2009

## Sunday, May 24, 2009

### More parameter space of "standing" circuit

Using the parameter space maps made last time, I've set the "standing" circuit into a place where it has a nearly symmetric bi-stable steady-state at p1 =0.25 and p2=0.50.

The following is the derivative at a given concentration of standing. This dy/dt vs y plot (I don't know if there is a correct name for this find of plot) shows that there are two stable steady states at the zero crossings -5, and +5. There's also the unstable point near zero. It is not exactly at zero because the gate model functions do not cross at zero as seen below.

Now I continue the analysis with the "tired" half of the circuit. I'm interested in the response of "tired" when the "standing" input reaches 0, the point at which the tired circuit will charge fully.

Charging of the tired circuit when standing is 0 and tired starts at its steady-state value of -5

So, "tired" reaches 0 (the point at which the gate 5 is going to be fully on) within about 20 time units when standing = 0.

The following is a sampling of the parameter space for p1 and p2 given "standing" = 0. The steady-state value of tired changes as a function of p1, so for each graph I've started "tired" off at the appropriate steady-state and then watch the evolution when "standing" = 0. This demonstrates that I can delay both the onset of tired (when it hits zero) and how high tired gets at steady-state by adjusting these two parameters.

Next up, I put the circuit back together again...

The following is the derivative at a given concentration of standing. This dy/dt vs y plot (I don't know if there is a correct name for this find of plot) shows that there are two stable steady states at the zero crossings -5, and +5. There's also the unstable point near zero. It is not exactly at zero because the gate model functions do not cross at zero as seen below.

Now I continue the analysis with the "tired" half of the circuit. I'm interested in the response of "tired" when the "standing" input reaches 0, the point at which the tired circuit will charge fully.

Charging of the tired circuit when standing is 0 and tired starts at its steady-state value of -5

So, "tired" reaches 0 (the point at which the gate 5 is going to be fully on) within about 20 time units when standing = 0.

The following is a sampling of the parameter space for p1 and p2 given "standing" = 0. The steady-state value of tired changes as a function of p1, so for each graph I've started "tired" off at the appropriate steady-state and then watch the evolution when "standing" = 0. This demonstrates that I can delay both the onset of tired (when it hits zero) and how high tired gets at steady-state by adjusting these two parameters.

Next up, I put the circuit back together again...

## Wednesday, May 20, 2009

### Parameter space of "standing" circuit

I've been working on decomposing the traveling pulse circuit in order to understanding the parameter space. Today I've worked on the isolated "standing" circuit.

There's two parts. The "pull down" gate that is constantly trying to pull the system to a negative value against the action of the resistor which is trying to pull it to zero. The ratio of the pull down gate (1) to the resistor (RNAase) determines the steady-state level when the feedback gate 3 is not active. The RNAase resistor must be common to all nodes so I treat it as a fixed parameter; I picked the value 0.01 out of thin air for it.

For the following graphs, I pick different starting conditions for "standing" and let this circuit evolve. Each colored trace in the chart is one run of the circuit. Note that there are two steady states. One is about 28 and the other is about -1. If the "standing" value falls below about -0.5 then it goes to the low steady-state and above that it goes high. I like this chart in comparison to typical transform function plots because it lets you see both the kinetics and the steady-states in one place.

Here's the same chart but zoomed in around the origin so you can see that the critical point is about -0.5 which is determined by the gate model.

I varied the two parameters over a range and plotted the parameter space result (best viewed on large monitor).

From top to bottom p1 is increasing. From left to right p2 is increasing. Increasing p2 shifts the steady-state of the "standing" state upwards and thereby separates the two states more dramatically. As p1 is increased -- moving from top to bottom -- both the top and bottom steady-states shift downwards but the bottom one seems to move faster. In the lower left, the two states blur into each other and are poorly defined. So, in general you'd like to push p2 and p1 fairly high but this comes at the cost of slowing down the approach to steady-state as they are pushed further away. When the other half of the circuit is added, p2 value will have to be smaller than p5, so that will determine the upper bound of p2.

There's two parts. The "pull down" gate that is constantly trying to pull the system to a negative value against the action of the resistor which is trying to pull it to zero. The ratio of the pull down gate (1) to the resistor (RNAase) determines the steady-state level when the feedback gate 3 is not active. The RNAase resistor must be common to all nodes so I treat it as a fixed parameter; I picked the value 0.01 out of thin air for it.

For the following graphs, I pick different starting conditions for "standing" and let this circuit evolve. Each colored trace in the chart is one run of the circuit. Note that there are two steady states. One is about 28 and the other is about -1. If the "standing" value falls below about -0.5 then it goes to the low steady-state and above that it goes high. I like this chart in comparison to typical transform function plots because it lets you see both the kinetics and the steady-states in one place.

Here's the same chart but zoomed in around the origin so you can see that the critical point is about -0.5 which is determined by the gate model.

I varied the two parameters over a range and plotted the parameter space result (best viewed on large monitor).

From top to bottom p1 is increasing. From left to right p2 is increasing. Increasing p2 shifts the steady-state of the "standing" state upwards and thereby separates the two states more dramatically. As p1 is increased -- moving from top to bottom -- both the top and bottom steady-states shift downwards but the bottom one seems to move faster. In the lower left, the two states blur into each other and are poorly defined. So, in general you'd like to push p2 and p1 fairly high but this comes at the cost of slowing down the approach to steady-state as they are pushed further away. When the other half of the circuit is added, p2 value will have to be smaller than p5, so that will determine the upper bound of p2.

### My conversation with Alpha

I tried out Wolfram's Alpha this morning. First, something technical and mathematical as it suggests:

Where are the tidal phase singularities?

> Wolfram|Alpha isn't sure what to do with your input. ...

The same search on Google not only brings up links to maps but also brings up the scanned and OCR pages from Winfree's book -- via Google books -- where I got the phrase! Google is amazing.

Why should I use wolfram alpha?

> Wolfram|Alpha isn't sure what to do with your input.

The same search on Google came up with the pages on Wolfram's own site and many more reviews.

Why is Stephen Wolfram so cocky?

> Wolfram|Alpha isn't sure what to do with your input. ... person: Stephen Wolfram ... chemical element: element Wolfram

Tungsten (according to Wiki) is also called "Wolfram" which is why it has the the chemical symbol "W", but nowhere on Wolfram's summary page about Tungsten does it mention this. If you do this same search on Google the first hit is a Slashdot article about the outrageous TOS on alpha that's only *16 hours* old! Google continues to amaze.

How big of an ego does Stephen Wolfram have?

> Wolfram|Alpha isn't sure what to do with your input. ...

The same search on Google returns all kinds of hits from book reviews and whatnot complaining about his inflated ego.

All joking aside, I did like its stock summary page (one of its suggested searches). When you do ask it about something it knows does present a very well formatted result with lots of good technical information. But the TOS is absurd.

Where are the tidal phase singularities?

> Wolfram|Alpha isn't sure what to do with your input. ...

The same search on Google not only brings up links to maps but also brings up the scanned and OCR pages from Winfree's book -- via Google books -- where I got the phrase! Google is amazing.

Why should I use wolfram alpha?

> Wolfram|Alpha isn't sure what to do with your input.

The same search on Google came up with the pages on Wolfram's own site and many more reviews.

Why is Stephen Wolfram so cocky?

> Wolfram|Alpha isn't sure what to do with your input. ... person: Stephen Wolfram ... chemical element: element Wolfram

Tungsten (according to Wiki) is also called "Wolfram" which is why it has the the chemical symbol "W", but nowhere on Wolfram's summary page about Tungsten does it mention this. If you do this same search on Google the first hit is a Slashdot article about the outrageous TOS on alpha that's only *16 hours* old! Google continues to amaze.

How big of an ego does Stephen Wolfram have?

> Wolfram|Alpha isn't sure what to do with your input. ...

The same search on Google returns all kinds of hits from book reviews and whatnot complaining about his inflated ego.

All joking aside, I did like its stock summary page (one of its suggested searches). When you do ask it about something it knows does present a very well formatted result with lots of good technical information. But the TOS is absurd.

## Tuesday, May 19, 2009

### Complementary logic ideas

Talking with John this morning about the equivalence between the gates we're proposing and electrical analogs. John points out that our gates are like "half of a tri-state gate". We started thinking about higher-order logic cells using the proposed gates and realized that you can be logically complete assuming that you can mix gates with complementary inputs and only lose some fraction of them to a bi-molecular cancellation. If this is not the case -- if you lose everything -- then there might still be a way to do it with extra translation stages, but I haven't thought that through yet.

(Image update 21 May. Thanks to Erik for pointing out that I forgot the promoter completion domain.)

Assuming that the above gate cancellation reaction is not favorable (or that tethering them reduces the favorability) then you could combine the gates to make buffers, inverters, and a biased-and-gate that doesn't produce a very clean output, but which would have the property that when inputs A & B are + then output would be + and all other input combination would give output slightly - to very -.

(Image update 21 May. Thanks to Erik for pointing out that I forgot the promoter completion domain.)

Assuming that the above gate cancellation reaction is not favorable (or that tethering them reduces the favorability) then you could combine the gates to make buffers, inverters, and a biased-and-gate that doesn't produce a very clean output, but which would have the property that when inputs A & B are + then output would be + and all other input combination would give output slightly - to very -.

### Traveling pulse - a stable orbit

I started hunting around in parameter space trying to get my head around what makes the traveling pulse stable and predictable. I don't yet have a set of exact rules, but what I've learned is that the reactions need to be slow compared to the diffusion. This is achieved by simply lowering the concentration of the gates and resistors appropriately. Next, the pull down gates 1 & 2 are very small compared to the feedback and shutdown gates. Also, the "tired" charging gate is very small so that you can delay the onset of the shutdown.

The biggest point is obvious when you look at the phase diagram: you have to let the system get back into steady-state before another pulse hits it. Also interesting is how perfectly straight are the edges of the phase diagram. I think that this means that the gates are run way out of their linear regions and are running in steady-state most of the time. I'm going to try to make a graph to make sense of that.

I also found that it is easy to make complex patterns form when you push the system really hard as in the following class-3-like cellular automata. Note that the system was started with symmetric initial conditions and has full symmetric rules yet is symmetric only until it starts to interact with itself; once it reaches the boundaries, it becomes asymmetric. Fascinating. I suppose this is because the "periodicity" of the pattern is not related to the size of the container so the two periods start to alias in some weird sense.

## Friday, May 15, 2009

### Idea: Cut healthcare costs? Reduce the patent duration.

Brooks has a good essay today about the proposed underwhelming health care cost-cutting measures. I agree that none of the proposed changes sound like enough to take a reasonable bite out of our growing health care costs; and I doubt that for such a big problem there exists many easy fixes. But, there is one very easy fix that would have an huge impact -- cut patent duration times from 20 years to, say, 10. Of course, innovating companies will hate the idea of reducing their patents and boring-old manufacturers will love it but I guarantee that 10 years from now there will be an incredible drop in drug prices.

We have a fundamental problem that no one wants to admit: until some revolution in drug development takes place (e.g: if it turns out that siRNAs are a magic bullet) then we simply can not have guns, butter, and bandages -- at least we can't have every newfangled "bandage" being made at such an incredible pace.

We have an impossible expectation for our health care that we don't have for any other sector of our economy. We simultaneously want the free market to invent new treatments on a for-profit motive and then we want everyone to have access to the result. In contrast, we don't expect every driver in the country to have access to a Lamborghini just because they exist. We don't expect everyone to have access to the latest iPhone gadget just because they exist. But we do expect -- for good ethical and moral reasons -- that everyone should have access to whatever the latest, best treatments are. While this expectation is understandable, it's nevertheless schizophrenic: "Pharma: go be innovative, invest a lot of money to make amazing drugs! Oh my god, why are they so expensive?" We don't say: "Apple: go be innovative, invest a lot of money to make amazing phone! Oh my god, why are they so expensive?" (Actually some people do, but most just recognize that if the phone is too expensive they'll just do without.)

Health care is always going to involve an insurance middle man be it private, public, or all-messed-up-in-between as it is now. So, health care will always be a collective venture. It is simply irrational to expect that we can collectively afford every possible innovation, just as it would be irrational to expect that we could all collectively own the latest iPhone gadgets. Thus, the systemic way to change the collective system is to simply lower the profit bar. And this can be done by changing one simple variable: the duration of patents. Make patents last 10 years and drug companies won't build as many expensive drugs and, yes, more people will die of things that could have been prevented. But, recognize that this is already the case! The 20 year limit is totally arbitrary. Had it been set at, say, 30 years then there would exist, right now, more amazing but even more expensive drugs and therefore because the number is set at 20 and not 30 we are "heartlessly" letting people go untreated because of an arbitrary number. The number has changed before (upwards) and we can change it again, downwards -- at least for drugs -- if we collectively choose to. It's the only "easy" fix.

## Thursday, May 14, 2009

### Traveling Pulse Phase Diagrams

Working on understanding the behavior of my amorphous traveling pulse, "Mexican Wave". On the right is a marked up phase diagram of the two states "standing" on the x axis and "tired" on the y axis. The mark ups show the regions where different parts of the circuit are operational. This has helped me get my head around what has to be adjusted to make the system more predictable. One lesson is that the mystery of why the pulse is traveling at different speeds has something to do with the fact that the system does not usually get all the way back down into the same steady-state. The bottom steady-state point "not standing and not tired" should be determined by the relationship of the pull down gate 1 & 2 and the grounding resistors. So, next thing I'm going to do is try to adjust things so that I give the system enough time always settle down into that same point. Then I can tackle understanding how the other gates reshape this phase chart.

An observation. The one directional traveling pulse on the left is making a pattern that looks like the branching pattern on a plant stem. This reminds me of a plant branching model Wolfram talked about in NKS.

## Wednesday, May 13, 2009

### More fun with Traveling Pulse

I started messing around today with the amorphous traveling pulse from yesterday. First thing I did was try creating an asymmetric starting condition by "pipetteing" in both a spot of "standing" as yesterday and also a spot of "tired" adjacent to that so that the pulse could travel only in one direction. As before, the x axis is cyclical space which is why the pulse travels off to the left and then reappears on the right.

Inexplicably, the pulse does not always travel at the same velocity. I have no idea why, maybe its an artifact of the integration but it seems periodic -- like its accelerating and decelerating at some predictable way.

I then start exploring parameter space of the circuit, repeated here for reference.

(Drawing revised 19 May)

I started with 3 vs 5. All things being equal, it should be the case that the concentration of gate 5 needs to be greater than the concentration of gate 3 so that it can overpower "standing" when "tired". As the following phase chart of 3 vs 5 illustrates, this is true. Also, as 3 grows so does the pulse width. This is intuitive because the harder p3 works to pull up "standing", the longer it will take for the discharge circuit to overpower it. Graph of P3 vs P5:

Then I started on P3 vs P4. P4 determines how fast it gets "tired" so more P4 should create a narrower pulse width, which is indeed the case. As you would expect, there's a limit, P4 can make the system tired so quickly that the pulse disappears (it becomes tired the instant it stands). However, there's a relationship between P3, the charging circuit and P4 the "getting tired" drive. As the standing driver is increased, you have to compensate with fast you become "tired". Makes sense. Ratios in the kind of 5-7 ball park seem to work well given the arbitrary other settings I have. Graph of P3 vs P4:

Crazy things happen when you change the two stabilizing gates p1 and p2. When pull down resistors are set to 0.01 and diffusion to 0.3 in this simulation. As p1 increases the pulse travels slower which makes sense as it is harder to charge standing. (Thanks to Xi for pointing out that I had previously stated this backwards.) At some critical value, it the charge circuit can't keep up with the diffusion and pull down sides and the pulse evaporates. Really weird things start happening around p1=0.01 and p2=0.07, looks like it becomes unstable and pattern forming, which is cool.

Some close ups of instability patterns. They look a like Sierpinski triangles which makes some vague sense because the standing and tired are in opposition to each other and can act as some kind of binary counter where diffusion permits the next space over to act as the carry bit. (I say this with while waving my hands furiously :-)

Inexplicably, the pulse does not always travel at the same velocity. I have no idea why, maybe its an artifact of the integration but it seems periodic -- like its accelerating and decelerating at some predictable way.

I then start exploring parameter space of the circuit, repeated here for reference.

(Drawing revised 19 May)

I started with 3 vs 5. All things being equal, it should be the case that the concentration of gate 5 needs to be greater than the concentration of gate 3 so that it can overpower "standing" when "tired". As the following phase chart of 3 vs 5 illustrates, this is true. Also, as 3 grows so does the pulse width. This is intuitive because the harder p3 works to pull up "standing", the longer it will take for the discharge circuit to overpower it. Graph of P3 vs P5:

Then I started on P3 vs P4. P4 determines how fast it gets "tired" so more P4 should create a narrower pulse width, which is indeed the case. As you would expect, there's a limit, P4 can make the system tired so quickly that the pulse disappears (it becomes tired the instant it stands). However, there's a relationship between P3, the charging circuit and P4 the "getting tired" drive. As the standing driver is increased, you have to compensate with fast you become "tired". Makes sense. Ratios in the kind of 5-7 ball park seem to work well given the arbitrary other settings I have. Graph of P3 vs P4:

Crazy things happen when you change the two stabilizing gates p1 and p2. When pull down resistors are set to 0.01 and diffusion to 0.3 in this simulation. As p1 increases the pulse travels slower which makes sense as it is harder to charge standing. (Thanks to Xi for pointing out that I had previously stated this backwards.) At some critical value, it the charge circuit can't keep up with the diffusion and pull down sides and the pulse evaporates. Really weird things start happening around p1=0.01 and p2=0.07, looks like it becomes unstable and pattern forming, which is cool.

Some close ups of instability patterns. They look a like Sierpinski triangles which makes some vague sense because the standing and tired are in opposition to each other and can act as some kind of binary counter where diffusion permits the next space over to act as the carry bit. (I say this with while waving my hands furiously :-)

## Tuesday, May 12, 2009

### Traveling Pulse Amorphous Computer

After a few meetings with John, Nam, Xi, Edward, and Andy in the last few weeks I think I have a plausible molecular gate model that can make some interesting amorphous computations. Specifically, I've been trying to make the "Mexican Wave" -- an amorphous pulse wave.

A variable "A" is encoded by the log ratio of the concentration of two RNA species: a sense strand called "A+" and its anti-sense strand called "A-".

(Image updated 21 May -- Thanks for Erik for pointing out I left off the promoter completion domain)

Gates are molecular beacons that use promoter disruption to squelch the generation of some output strand. For now, all gates are unary operators. The RNAs can be displaced off the beacons by toe-hold mediated strand displacement. This design is basically Winfree lab's transcriptional circuits but where the gate is a hairpin DNA molecular beacon and where variables are encoded by log ratio of sense and anti-sense instead of as a proportionality to concentration of an ssRNA.

(Note I updated this diagram to change the naming convention on this 17 May 2009. Again on 21 May thanks for Erik for noticing I left off the promoter completion domain.)

Gates are modeled as having hyperbolic production curves and can be built according to one of four choices of sense and anti-sense sequence on the inputs and outputs. As a matter of convention, the sense strand is labeled "+" relative to the ssRNAs, not relative to the DNA because the concentration of the RNAs is the variable of interest in these systems.

To explore the model, I created a circuit that I hoped would make an amorphous pulse propagating wave. Below, I switch into electrical analogy which I do for my own sanity. The charge across capacitors represent the two variables which I call "standing" and "tired" by analogy with the Mexican Wave. The gates are labeled like "i+o+" meaning "when input is + the output will be -". (I've changed around the naming convention several times, this update is as of 17 May) The gates without inputs are under constitutive promotion and are labeld only by what they output. All nodes are pulled down by the same RNAases represented here as resistors to ground from each capacitor. The two variables are assumed to diffuse at equal rates. The only changeable parameter is assumed to be the concentrations of the gates.

(Thanks to Xi and John for help reworking this diagram. I updated it on 19 May.)

This circuit can be thought of like this. "Standing" and "tired" are constantly being pulled low by the gates 1 & 2 against the action of the resistors. If the rest of the gates weren't there, this would ensure the system will be "not standing" and "not tired". Gate 3 puts feedback on "standing" thus a small threshold level of "standing" will generate more until it saturates in steady-state against the resistor. Gate 4 increases "tired" if "standing". Gate 5 is in high concentration relative to the other gates and can thus overpower the "standing" variable when "tired".

Here are the 1D amorphous results. The two plots are "standing" (left) and "tired" (right). The X axis of each is space (cyclical coordinates). The Y axis from bottom to top is increasing time. Blue represents a high ratio of - to + strands. Red represents a high ratio of + to - strands. Black represents an even ratio. At time zero, a pulse of + is added to the "standing" variable representing a manual pipetteing operation at some point in space. As time passes (bottom to top) the pulse propagates in both directions at a constant rate until the two pulses hit each other and then stop.

A variable "A" is encoded by the log ratio of the concentration of two RNA species: a sense strand called "A+" and its anti-sense strand called "A-".

(Image updated 21 May -- Thanks for Erik for pointing out I left off the promoter completion domain)

Gates are molecular beacons that use promoter disruption to squelch the generation of some output strand. For now, all gates are unary operators. The RNAs can be displaced off the beacons by toe-hold mediated strand displacement. This design is basically Winfree lab's transcriptional circuits but where the gate is a hairpin DNA molecular beacon and where variables are encoded by log ratio of sense and anti-sense instead of as a proportionality to concentration of an ssRNA.

(Note I updated this diagram to change the naming convention on this 17 May 2009. Again on 21 May thanks for Erik for noticing I left off the promoter completion domain.)

Gates are modeled as having hyperbolic production curves and can be built according to one of four choices of sense and anti-sense sequence on the inputs and outputs. As a matter of convention, the sense strand is labeled "+" relative to the ssRNAs, not relative to the DNA because the concentration of the RNAs is the variable of interest in these systems.

To explore the model, I created a circuit that I hoped would make an amorphous pulse propagating wave. Below, I switch into electrical analogy which I do for my own sanity. The charge across capacitors represent the two variables which I call "standing" and "tired" by analogy with the Mexican Wave. The gates are labeled like "i+o+" meaning "when input is + the output will be -". (I've changed around the naming convention several times, this update is as of 17 May) The gates without inputs are under constitutive promotion and are labeld only by what they output. All nodes are pulled down by the same RNAases represented here as resistors to ground from each capacitor. The two variables are assumed to diffuse at equal rates. The only changeable parameter is assumed to be the concentrations of the gates.

(Thanks to Xi and John for help reworking this diagram. I updated it on 19 May.)

This circuit can be thought of like this. "Standing" and "tired" are constantly being pulled low by the gates 1 & 2 against the action of the resistors. If the rest of the gates weren't there, this would ensure the system will be "not standing" and "not tired". Gate 3 puts feedback on "standing" thus a small threshold level of "standing" will generate more until it saturates in steady-state against the resistor. Gate 4 increases "tired" if "standing". Gate 5 is in high concentration relative to the other gates and can thus overpower the "standing" variable when "tired".

Here are the 1D amorphous results. The two plots are "standing" (left) and "tired" (right). The X axis of each is space (cyclical coordinates). The Y axis from bottom to top is increasing time. Blue represents a high ratio of - to + strands. Red represents a high ratio of + to - strands. Black represents an even ratio. At time zero, a pulse of + is added to the "standing" variable representing a manual pipetteing operation at some point in space. As time passes (bottom to top) the pulse propagates in both directions at a constant rate until the two pulses hit each other and then stop.

## Sunday, May 10, 2009

### Understanding Principal Component Analysis via cool Gapminder graphs

Gapminder.org is a wonderful site full of "statistical porn". This chart in particular is a fascinating graph that demonstrates the correlation between income and child mortality rates. It is also a great example to teach about a cool statistical tool: "Principal Component Analysis".

In this graph of regions there is an obvious negative correlation between infant mortality and income illustrated by the fact that the data points scatter along a line from upper left to lower right. In other words, if you knew only the infant mortality rate or the income of a region you could make a reasonable guess at the other.

Principal Component Analysis (PCA) is a statistical tool that’s very useful in situations like this. PCA delivers a new set of axes that are well aligned to correlated data like this -- I've illustrated them here with black and red lines. For each axis, it also returns a “variance strength” which I’ve represented as the length of the black and red axes. (Actually I just hand approximated these axes by eye for the purposes of illustration).

The strongest new axis returned by PCA (the black one) aligns well with the primary axis of the data. In other words, if one were forced to summarize a region with a single number it would be best to do so with the position along this black axis. The zero point on the axis is arbitrary but is usually positioned in the center of the data (the mean). Positive valued points along this black axis would be those regions further toward the lower right and negative valued regions would be those further toward the upper left. Let’s call this new axis “wealth” to separate it in our minds from “income” which is the horizontal axis of the original data set. Increases in “wealth” represent an increase in income and drop in infant mortality simultaneously.

The second axis returned by PCA is shown as the red axis. Countries that lie far off the main diagonal trend-line (black axis) have particularly unique infant mortality rates given their wealth which we’ll assume is because of something unique about their health care systems. Points well below the black axis are regions that have very good health care given their wealth and those above it have particularly poor health care given their wealth.

Because PCA gives us convenient axes that are well aligned to the data, it makes senses to just rotate the graph to align to these new axes as illustrated here. Nothing has changed here, we've simply made the graph easier to read.

Before you even look at specific regions on these new axes, one could guess that socialist countries would score more negatively along this red axis and those whose economy is heavily biased towards mineral extraction -- where income tends to be very unevenly distributed -- would score more positively. Indeed, this is confirmed. The most obvious outliers below the black axis are Cuba and Vietnam where communist governments have directed the economy to spend disproportionately on health care and the outliers on the other side are: Saudi Arabia, South Africa, and Botswana -- all regions heavily dependent on resource extraction where the mean income statistics hide the reality that few are doing very well while the vast majority are in extreme relative poverty.

One particularly interesting outlier is Washington DC which is located as far along the red axis as is Botswana! In other words, based on this realigned graph, you might guess that the wealth in DC is as unevenly distributed as it is in Botswana. Fascinating! (The observation is probably at least partially explained by the fact that it is the only all urban "state" and urban areas will tend to have wider income distributions than rural/suburban areas.) Also note that all of the points in the United States (orange) are well into positive territory on the red axis -- our health care system is as messed up relative to our wealth as is Chad, Bhutan, and Kazakhstan -- countries with completely screwed-up governmental agendas. Think of it this way: the degree to which our infant mortality rates are "good" owes everything to our wealth and is despite the variables independent of wealth! In other words, countries that provide average health-care relative to their wealth like El Salvador, Ukraine, Australia and the UK fall right on the black axis but we fall significantly above that line -- roughly the same place as countries that are, independent of their wealth, really messed up like Chad and Kazakhstan. (A caveat: the chart is on a log scale so the comparative analysis is more subtle than I'm making it out here.)

PCA returns not only the direction of the new axes but also the variance of the data along those axes. To understand this, imagine for a moment that all the regions of the world had exactly the same health care given their income; in this case all the points would align perfectly along the main trend line (the black axis) and the variance of the red axis would be zero. In this imaginary case, the data would be “one dimensional”, that is income and infant mortality would be one in the same statement; if you knew one, you'd know the other exactly. Now imagine the opposite scenario. Imagine that there was no relationship at all between income and infant mortality; in that case we would see a scattering of points all over the place and there wouldn’t any obvious trend lines. Neither of these imaginary scenarios are what we see in the actual data. It isn’t quite a line along the black axis but neither is it a buckshot scattering of points, so we can say the data is somewhere between 1 dimensional and 2 dimensional. If both variances are large and equal to each other, then the system is 2 dimensional while if one of the variances is large while the other is near zero, then we know the system is nearly 1 dimensional. In other words, PCA permits you to summarize complicated data by finding axes of low variance and simply eliminate them. This technique is called “dimensional reduction” and is a very powerful tool for summarizing complicated data sets such as would arise if we looked at more than two variables. For example, we might include: car ownership, water accessibility, education, average adult height, etc to the analysis at which point performing a dimensional reduction would help to get our heads around any simplifications we might wish to make.

## Wednesday, May 6, 2009

### External link: My Manhattan Project

This is an excellent article in New York Magazine about a software engineer on Wall Street.

Some quotes and thoughts.

> "Over time, the users of any software are inured to the intricate nature of what they are doing."

Well put. This is the heart of all software successes and failures. Software is the perfect tool to lie to others and lie to ourselves with. It is the ultimate obfuscatory tool if you let it be.

Thomas Jefferson fought against a Hamilton-supported economy based on industry and banking. Hamilton was right, of course, but Jefferson had a good point. A detail of that eighteenth century debate that has intrigued me is: If financial instruments were already obfuscated to Jefferson in the 18th century, imagine what it must be like now? This article confirmed my intuition for what must have been going on: software sold and maintained by an external company helped to obfuscate the transactions to everyone involved. Of course the technology should have allowed it to be understood too but sounds like some monopolies of thought took hold because it was short-term profitable for them to. It was Jefferson's worries manifest in twenty-first century technology. Maybe there is some law lurking like "The Law of Constant Obfuscation" -- at any given time technology will permit obfuscation to a constant level.

> “Mike,” he told me when denying my request, “can you really look for people dumber than you and then take advantage of them? That’s what trading is all about.”

Ha!

> "I was very good at programming a computer. And that computer, with my software, touched billions of dollars of the firm’s money. Every week. That justified [my salary]. When you’re close to the money, you get the first cut. Oyster farmers eat lots of oysters, don’t they?"

Rationalization is such a powerful force! As is momentum. Feynman has an excellent point in one of his books where he's talking about forgetting why he worked on the Manhattan Project. He joined because, like his collaborators, the idea of Hitler having unilateral nuclear power was unimaginably scary. But, after Hitler was defeated, he forgot why it was he was working on this project and the momentum of the technical challenge remained. He regretted that he didn't re-evaluate his thinking after the VE day.

Some quotes and thoughts.

> "Over time, the users of any software are inured to the intricate nature of what they are doing."

Well put. This is the heart of all software successes and failures. Software is the perfect tool to lie to others and lie to ourselves with. It is the ultimate obfuscatory tool if you let it be.

Thomas Jefferson fought against a Hamilton-supported economy based on industry and banking. Hamilton was right, of course, but Jefferson had a good point. A detail of that eighteenth century debate that has intrigued me is: If financial instruments were already obfuscated to Jefferson in the 18th century, imagine what it must be like now? This article confirmed my intuition for what must have been going on: software sold and maintained by an external company helped to obfuscate the transactions to everyone involved. Of course the technology should have allowed it to be understood too but sounds like some monopolies of thought took hold because it was short-term profitable for them to. It was Jefferson's worries manifest in twenty-first century technology. Maybe there is some law lurking like "The Law of Constant Obfuscation" -- at any given time technology will permit obfuscation to a constant level.

> “Mike,” he told me when denying my request, “can you really look for people dumber than you and then take advantage of them? That’s what trading is all about.”

Ha!

> "I was very good at programming a computer. And that computer, with my software, touched billions of dollars of the firm’s money. Every week. That justified [my salary]. When you’re close to the money, you get the first cut. Oyster farmers eat lots of oysters, don’t they?"

Rationalization is such a powerful force! As is momentum. Feynman has an excellent point in one of his books where he's talking about forgetting why he worked on the Manhattan Project. He joined because, like his collaborators, the idea of Hitler having unilateral nuclear power was unimaginably scary. But, after Hitler was defeated, he forgot why it was he was working on this project and the momentum of the technical challenge remained. He regretted that he didn't re-evaluate his thinking after the VE day.

## Monday, May 4, 2009

### External Link: Energy Flux Graph from Lawrence Livermore Nat. Labs

I love this graph from Lawrence Livermore National Laboratory illustrating the flux of energy through the US economy. Some things that surprised me:

1) The amount of energy wasted in the transport of electricity is staggering, slightly more than the total amount of oil imported (in energy equivilent units); technological improvement in that sector would make an enormous contribution.

2) Transportation, as I expected, is woefully inefficient. What I didn't appreciate was the magnitude, the energy wasted by transport is approximately equal to all the coal burned!

3) The residential / commercial waste is surprisingly low. One assumes that some fraction of that waste is insulation and so-forth, but even if you took a big bite out of that with building improvements, you wouldn't make a dent in the big picture. It boils down to this: if one's goal is to reduce waste (which is a very different goal than reducing consumption) then electrical and transport are the obvious primary targets.

## Saturday, May 2, 2009

### Vaccinate your child or gramps gets it in the stomach!

There seems to be a growing ignorance about vaccination. From my informal queries of friends and acquaintances who have chosen not to immunize their children or who do not get flu vaccines, I have found that few people understand that vaccination is part of a greater social compact, not merely a personal cost/benefit analysis. The effect is called Herd Immunity. When we vaccinate ourselves, and especially our children, we are adding to the communal common defenses. Obviously, everyone would like to have a defensive wall built to protect a community yet everyone would prefer to not contribute. But that's not the way a good society works; we share the costs of doing things that benefit the common good.

Immunization of children is particularly important for two reasons: 1) Children's immune systems respond to vaccination much more effectively than do others, especially the elderly who are the most likely to die of viral diseases such as influenza. 2) Children are responsible for much of the transport viruses throughout a community owing to their mobility and lack of hygiene.

For example, a controlled experiment conducted in 1968 by the University of Michigan demonstrated that large-scale vaccination of children conferred a 2/3 reduction in influenza illnesses to all age groups. For a nice article on the subject, here's a Slate article from 2008.

I propose an old-fashioned poster campaign to inform about the social benefits of vaccination. Here's a couple of prototype posters I photoshopped up this afternoon. (Apologies to Norman Rockwell!)

(Original photo Adam Quartarolo via WikiCommons)

Immunization of children is particularly important for two reasons: 1) Children's immune systems respond to vaccination much more effectively than do others, especially the elderly who are the most likely to die of viral diseases such as influenza. 2) Children are responsible for much of the transport viruses throughout a community owing to their mobility and lack of hygiene.

For example, a controlled experiment conducted in 1968 by the University of Michigan demonstrated that large-scale vaccination of children conferred a 2/3 reduction in influenza illnesses to all age groups. For a nice article on the subject, here's a Slate article from 2008.

I propose an old-fashioned poster campaign to inform about the social benefits of vaccination. Here's a couple of prototype posters I photoshopped up this afternoon. (Apologies to Norman Rockwell!)

(Original photo Adam Quartarolo via WikiCommons)

Subscribe to:
Posts (Atom)