Projects of Zack Booth Simpson

Wednesday, April 15, 2009

Molecular computers -- A historical perspective. Part 2

We left off last time discussing the precision of an analog signal.

Consider a rising analog signal that looks like the following ramp.

Notice that there's noise polluting this signal. Clearly, this analog signal is not as precise as it would be without noise. How do we quantify this precision? The answer was described in the early 20th century and is known as the Shannon-Hartly theorem. When the receiver decodes this analog variable what is heard is not just the intended signal but rather the intended signal plus the noise (S+N); this value can be compared to the level of pure noise (N). Therefore the ratio (S+N)/N describes how many discrete levels are available in the encoding.

The encoding on the left is very noisy and therefore only 4 discrete levels can be discerned without confusion; the one in the middle is less noisy and permits 8 levels; on the right, the low noise permits 16 levels. The number of discrete encodable levels is the precision of the signal and is conveniently measured in bits -- the number of binary digits it would take to encode this many discrete states. The number of binary digits need is given by the log base 2 of the number of states, so we have log2( (S+N)/N ) which is usually algebraically simplified to log2(1+S/N).

It is important to note that although Shannon and Hartley (working separately) developed this model in the context of electrical communication equipment, there is nothing in this formulation that speaks of electronics. The formula is a statement about information in the abstract -- independent of any particular implementation technology. The formula is just as useful for characterizing the information content represented by the concentration of a chemically-encoded biological signal as it is for the voltage driving an audio speaker or the precision of a gear-work device.

We're not quite done yet with this formulation. The log2(1+S/N) formula speaks of the maximum possible information content in a channel at any given moment. But signals in a channel change; channels with no variation are very dull!

(A signal with no variation is very dull. Adapted from Flickr user blinky5.)

To determine the capacity of a channel one must also consider the rate at which it can change state. If, for example, I used the 2 bit channel from above I could vary the signal at some speed as illustrated below.

(A 2-bit channel changing state 16 times in 1 second.)

This signal is thus sending 2 bits * 16 per second = 32 bits per second.

All channels -- be they transmembrane kinases, hydraulic actuators, or a telegraph wires -- have a limited ability to change state. This capacity is generically called its "bandwidth" but that term is a bit over simplified so let's look at it more carefully.

It is intuitive that real-world devices can not instantaneously change their state. Imagine, for example, inflating a balloon. Call the inflated balloon "state one". Deflate it and call this "state zero". Obviously there is a limited rate at which you can cycle the balloon from one state to the other. You can try to inflate the balloon extremely quickly by hitting it with a lot of air pressure but there's a limit -- at some point the pressure is so high that the balloon explodes during the inflation due to stress.

(A catastrophic failure of a pneumatic signalling device from over-powering it. From gdargaud.net)

Most systems are like the balloon example -- they respond well to slow changes and poorly to fast changes. Also like the balloon, most systems fail catastrophically when driven to the point where the energy flux is too high -- usually by melting.

(A device melted from overpowering it. Adapted from flickr user djuggler.)

Consider a simple experiment to measure the rate at which you can switch the state of a balloon. Connect the balloon to a bicycle pump and drive the pump with a spinning wheel. Turn the wheel slowly and write down the maximum volume the balloon obtains. Repeat this experiment for faster and faster rates of spinning the wheel. You'll get a graph as follows.

(Experimental apparatus to measure the cycling response of a pneumatic signal.)

(The results from the balloon experiment where we systematically increased the speed of cycling the inflation state.)

On the left side of the graph, the balloon responds fully to the cycling and thus has a a good signal (S). But, on the left side very few bits can be transmitted at these slow speeds so there's not a lot of information able to be sent despite the good response of the balloon. But, further to the right the balloon still has a good response and now we're sending bits much more rapidly so we're able to send a lot of infrmation at these speed. But, by the far right of the graph, when the cycling is extremely quick, the balloon response falls off and finally hits zero when it popped so that defines the frequency limit.

The total channel capacity of our balloon device is an integral along this experimentally sampled frequency axis where we multiply the number of cycles per second at that location by the log2( 1+S/N ) where S is now the measured response from our experiment which we'll call S(f) = "The signal at frequency f". We didn't bother to measure noise as a function of frequency in our thought experiment, but we'll imagine we can do that just as easily and we'll have a new graph N(f) = "The noise at frequency f". The total information capacity (C) of the channel is the integral of all these products across the frequency samples we took up to the bandwidth limit (B) where the balloon popped.

If you want to characterize the computational/communication aspects of any system you have to perform the equivilent of this balloon thought experiment. Electrical engineers all know this by heart as they've had it beaten into them since the beginning of their studies. But, unfortunately most biochemists, molecular biologists, and synthetic biologist have never even thought about it. Hopefully that will start to change. As we both learn more about biological pathways and we become more sophisticated engineers of those pathways we will have an unnecessarily shallow understanding until we come to universally appreciate the importance of these characteristics.

Next, amplifiers and digital devices. To be continued...

Tuesday, April 14, 2009

Molecular computers -- A historical perspective. Part 1

I've been having discussions lately with Andy regarding biological/molecular computers and these discussions have frequently turned to the history of analog and digital computers as a reference -- a history not well-known by biologists and chemists. I find writing blog entries to be a convenient way to develop bite-sized pieces of big ideas and therefore what follows is the first (of many?) entries on this topic.

In order to understand molecular computers -- be they biological or engineered -- it is valuable to understand the history of human-built computers. We begin with analog computers -- devices that are in many ways directly analogous to most biological processes.

Analog computers are ancient. The first surviving example is the astonishing Antikythera Mechanism (watch this excellent Nature video about it). Probably built by the descendants of Archimedes' school, this device is a marvel of engineering that computed astronomical values such as the phase of the moon. The device predated equivilent devices by at least a thousand years -- thus furthering Archimedies' already incredible reputation. Mechanical analog computers all work by the now familiar idea of inter-meshed gear-work -- input dials are turned and the whiring gears compute the output function by mechanical transformation.

(The Antikythera Mechanism via WikiCommons.)

Mechanical analog computers are particularly fiddly to "program", especially to "re-program". Each program -- as we would call it now -- is hard-coded into the mechanism, indeed it is the mechanism. Attempting to rearrange the gear-work to represent a new function requires retooling each gear not only to change their relative sizes but also because the wheels will tend to collide with one another if not arranged just so.

Despite these problems, mechanical analog computers advanced significantly over the centuries and by the 1930s sophisticated devices were in use. For example, shown below is the Cambridge Differential Analyzer that had eight integrators and appears to be easily programmable by nerds with appropriately bad hair and inappropriately clean desks. (See this page for more diff. analyzers including modern reconstructions).

(The Cambridge differential analyzer. Image from University of Cambridge via WikiCommons).

There's nothing special about using mechanical devices as a means of analog computation; other sorts of energy transfer are equally well suited to building such computers. For example, in 1949 MONIAC was a hydraulic analog computer that simulated an economy by moving water from container to container via carefully calibrated valves.

(MONIAC. Image by Paul Downey via WikiCommons)

By the 1930's electrical amplifiers were being used for such analog computations. An example is the 1933 Mallock machine that solved simultaneous linear equations.

(Image by University of Cambridge via WikiCommons)

Electronics have several advantages over mechanical implementation: speed, precision, and ease of arrangement. For example, unlike gear-work electrical computers can have easily re-configurable functional components. Because the interconnecting wires have small capacitance and resistance compared to the functional parts, the operational components can be conveniently rewired without having to redesign the physical aspects of mechanism, i.e. unlike gear-work wires can easily avoid collision.

Analog computers are defined by the fact that the variables are encoded by the position or energy level of something -- be it the rotation of a gear, the amount of water in a reservoir, or the charge across a capacitor. Such simple analog encoding is very intuitive: more of the "stuff" (rotation, water, charge, etc) encodes more of represented variable. For all its simplicity however, such analog encoding has serious limitations: range, precision, and serial amplification.

All real analog devices have limited range. For example, a water-encoded variable will overflow when the volume of its container is exceeded.

(An overflowing water-encoded analog variable. Image from Flickr user jordandouglas.)

In order to expand the range of variables encoded by such means all of the containers -- be they cups, gears, or electrical capacitors -- must be enlarged. Building every variable for the worst-case scenario has obvious cost and size implications. Furthermore, such simple-minded containers only encode positive numbers. To encode negative values requires a sign flag or a second complementary container; either way, encoding negative numbers significantly reduces the elegance of the such methods.

Analog variables also suffer from hard-to-control precision problems. It might seem that an analog encoding is nearly perfect -- for example, the water level in a container varies with exquisite precision, right? While it is true that the molecular resolution of the water in the cup is incredibly precise, an encoding is only as good as the decoding. For example, a water-encoded variable might use a small pipe to feed the next computational stage and as the last drop leaves the source resivoir, a meniscus will form due to water's surface tension and therefore the quantity of water passed to the next stage will differ from what was stored in the prior stage. This is but one example of many such real-world complications. For instance, electrical devices, suffer from thermal effects that limit precision due to added noise. Indeed, the faster one runs an electrical analog computer the more heat is generated and the more noise pollutes the variables.

(The meniscus of water in a container -- one example of the complications that limit the precision of real-world analog devices. Image via WikiCommons).

Owing to such effects, the precision of all analog devices is usually much less than one might intuit. The theoretical limit of the precision is given by Shannon's formula. Precision (the amount of information encoded by the variable, measured in bits) is log2( 1+S/N ). It is worth understanding this formula in detail as it applies to any sort of information storage and is therefore just as relevant to a molecular biologist studying a kinase as it is to an electrical engineering studying a telephone.

.... to be continued.

Utility yard fence

In the last few days I've finished up the fence line that separates the backyard from the utility yard. This involved staining more boards with Pinofin which is as malodorous as it is beautiful. Thanks to Jules for the help with staining! Fortunately she is hard-of-smelling so didn't notice how bad it was!

Saturday, April 11, 2009

Finished workshop drawers

Today I finished attaching the hardware to my new tool drawers. I'm stupidly excited about them as I can put away all my tools and clear out a lot of clutter from my shop.

We ordered the boxes from Drawer Connection. They really did a great job; they are perfectly square, dovetailed joined, glued, sanded, and polyed. As Bruce said, "I'll never build another box again." It's a demonstration to me how custom web-based CNC construction is the future of a lot of products. We ordered about 30 boxes of all different sizes and the total was only about $1100 including shipping. There's no possible way we could have made them for that.

Thursday, April 9, 2009

Finished utility yard

Finished up the utility yard today which involved raising the AC units and changing grade a little bit. This weekend I'm going to stain the pickets and rebuilt the rear fence line.

Tuesday, April 7, 2009

The 21st Century Chemical / Biological Lab.

White Paper: The 21st Century Chemical / Biological Lab.

Electronic and computer engineering professionals take for granted that circuits can be designed, built, tested, and improved in a very cheap and efficient manner. Today, the electrical engineer or computer scientist can write a script in a domain specific language, use a compiler to create the circuit, use layout tools to generate the masks, simulate it, fabricate it, and characterize it all without picking up a soldering iron. This was not always the case. The phenomenal tool-stack that permits these high-throughput experiments is fundamental to the remarkable improvements of the electronics industry: from 50-pound AM tube-radios to iPhones in less than 100 years!

Many have observed that chemical (i.e. nanotech) and biological engineering are to the 21st century what electronics was to the 20th. That said, chem/bio labs – be they in academia or industry – are still in their “soldering iron” epoch. Walk into any lab and one will see every experiment conducted by hand, transferring micro-liter volumes of fluid in and out of thousands of small ad-hoc containers using pipettes. This sight is analogous to what one would have seen in electronics labs in the 1930s – engineers sitting at benches with soldering iron in hand. For the 21st century promise of chem/nano/bio engineering to manifest
itself, the automation that made large-scale electronics possible must similarly occur in chem/bio labs.

The optimization of basic lab techniques is critical to every related larger-scale goal be it curing cancer or developing bio-fuels. All such application-specific research depends on experiments and therefore reducing the price and duration of such experiments by large factors will not only improve efficiency but also make possible work that was not previously. While such core tool paths are not necessarily “sexy”, they are critical. Furthermore, a grand vision of chem/bio automation is one that no single commercial company can tackle as the vision for such requires both a very long time commitment and a very wide view of technology. It is uniquely suited to the academic environment as it both depends upon and affords cross-disciplinary research towards a common, if loosely
defined, goal.

Let me elucidate this vision with a science-fiction narrative:

Mary has a theory about the effect of a certain nucleic acid on a cancer cell line. Her latest experiment involves transforming a previously created cell line by adding newly purchased reagents, an experiment that involves numerous controlled mixing steps and several purifications. In the old-days, she would have begun her experiment by pulling-out a pipette, obtaining reagents out of the freezer, off of her bench, and from her friend's lab and then performed her experiment in an ad hoc series of pipette operations. But today, all that is irrelevant; today, she never leaves her computer.

She begins the experiment by writing a protocol in a chemical programming language. Like high-level languages used by electrical and software engineers for decades, this language has variables and routines that allow her to easily and systemically describe the set of chemical transformations (i.e. “chemical algorithms”) that will transpire during the experiment. Many of the subroutines of this experiment are well established protocols such as PCR or antibody
separation and for those Mary need not rewrite the code but merely link in the subroutines for these procedures just as a software engineer would. When Mary is finished writing her script, she compiles it. The compiler generates a set of fluidic gates that are then laid-out using algorithms borrowed from integrated circuit
design. Before realizing the chip, she runs a simulator and validates the design before any reagents are wasted – just as her friends in EE would do before they sent their designs to “tape out.” Because she can print the chip on a local printer for pennies, she is able to print many identical copies for replicate experiments. Furthermore, because the design is entirely in a script, it can be reproduced next week, next year, or by someone in another lab. The detailed script means that Mary’s successors won’t have to interpret a 10 page hand-waving explanation of her protocol translated from her messy lab notes in the supplementary methods section of the paper she publishes – her script *is* the experimental protocol. Indeed, this abstraction means that, unlike in the past, her experiments can be copyrighted or published under an open source license just as code from software or chip design can be.

Designing and printing the chip is only the first step. Tiny quantities of specific fluids need to be moved into and out of this chip – the “I/O” problem. But Mary’s lab, like any, both requires and generates thousands of isolated chemical and biological reagents each of which has to be stored separately in a controlled environment and must be manipulated without risking cross-contamination. In the old days, Mary would have used hundreds of costly sterilized pipette
tips as she laboriously transfered tiny quantities of fluid from container to container. Each tip would be wastefully disposed of despite the fact that only a tiny portion of it was actually contaminated – such was the cost when everything had to be large enough to be manipulated by hand. In the old days, each of the target containers – from large flasks to tiny plastic vials – would have had to be hand-labeled resulting in benches piled with tiny cryptic scribbled notes with all of the confusion and inefficiency that results from such clutter. Fortunately for Mary, today all of the stored fluids for her entire lab are maintained in a single fluidic database; she never touches any of them. In this fluidic database, a robotic pipette machine addresses thousands of individual fluids. These fluids are stored inside of tubes that are spooled off of a single supply and cut to length and end-welded by the machine as needed. Essentially, this fluidic database has merged the concepts of “container” and “pipette” – it simply partitions out a perfectly sized container on-demand and therefore the consumables are cheaper and less wasteful. Also, the storage of these tube-containers is extremely compact in comparison to the endless bottles (mostly filled with air) that one would have seen in the old days. The fluid-filled tubes could be simply wrapped around temperature-controlled spindles and, just like an electronic database or disk drive, the system can optimize itself by “defragmenting” its storage spindles ensuring there’s always efficient usage of the space. Furthermore, because the fluidic
database knows the manifest of its contents, all reagent accounting can be automated and optimized.

Mary has her experiment running. But, moving all these fluids around is just a means to an end. Ultimately she needs to collect data about the performance of her new reagent on the cancer line in question. In the old days, she would have run a gel, used a florescent microscope, or any number of other visualization techniques to quantify her results – any of these measurements would have required a large and expensive machine. But today, most of these measurements are either printed directly on the same chip as the fluidics using printable chemical / electronic sensors or those that can’t be printed are interfaced to a standardized re-usable sensor array. The development of those standards was crucial to the low capital cost of her equipment. Before far-sighted university engineering departments set those standards, each diagnostic had its own proprietary interface and therefore the industry was dominated by an oligopoly of several companies. But now, the standards have promoted competition and thus the price and capabilities of all the diagnostics has improved.

As Mary’s chemical program executes on her newly minted chip, she gets fluorescent read-outs on one channel and antibody detection on another – all such diagnostic were written into her experimental program in the same way that a “debug” or “trace” statement is placed into a software program. After her experiment runs, the raw sensor data is uploaded to the same terminal where she wrote the program and she begins her analysis without getting out of her chair.

After the experiment, the disposable chip and the temporary plumbing that connected to it are all safely incinerated to avoid any external contamination. In the old days, such safety protocols would have had to be known by every lab member and this would have required a time-consuming certification process. But today, all of these safety requirements are enforced by the equipment itself and therefore there’s much less risk of human mistake. Furthermore, because of the
enhanced safety and lower volumes, some procedures that were once classified as bio-safety level 3 are now BSL 2 and some that were 2 are now 1, meaning that more labs are available to work on important problems.

Mary’s entire experiment from design to data-acquisition took her under 1 hour – comparable to a week by old manual techniques. Thanks to all of this automation, Mary has evaluated her experiment and moved on to her next great discovery much faster than would have been possible before. Moreover, because so little fluid was used in these experiments her reagents last longer and therefore the cost has also fallen. Mary can contemplate larger-scale experiments than anybody dreamed of just a decade ago. Mary also makes many fewer costly mistakes because of the rigor imposed by writing and validating the entire experimental script instead of relying on ad hoc procedures. Finally, the capital cost of the equipment itself has fallen due to standardization, competition, and economies of scale. The combined result of these effects is to make the acquisition of chemical and biological knowledge orders of magnitude faster than was possible just decades ago.

Monday, April 6, 2009

Macro-scale examples of chemical principles

I like macro-scale examples of chemical principles. Here's two I've noticed recently.

I was very slowly pouring popcorn into a pot with a little bit of oil. The kernels did not distribute themselves randomly but instead formed some long chain aggregations because, apparently, the oil made them more likely to stick to each other than to stand alone. This kind of aggregation occurs frequently at the molecular scale when some molecule has an affinity for itself.

This is wheelbarrow chromatography. During a rain, water and leaves fell into this wheelbarrow. Notice that the leaves and the stems separated; apparently the stems are lighter than water and the leaves are heavier. This sort of "phase separation" trick is frequently used by chemists to isolate one type of molecule from another in a complex mixture. Sometimes the gradient of separation might be variable density as in this example, but other times it might be hydrophobicity or affinity to an antibody or many other types of clever chemical separations known generically as "chromatography". Note that the stems clustered. Like the popcorn above, apparently there is some inter-stem cohesion force that results in aggregation as occurs in many chemical solutions.