Blog

August 27, 2019

Using a Virtual Environment in Jupyter notebooks

Recently, I wanted to try out a library that goes along with a paper that was just published. I didn’t want to actually install the library, because it comes with a bunch of specific dependencies, so I used a virtual environment for my testing. This is easy

Read More

July 23, 2019

Color palettes

When I was working at Illumina one of my projects was to create a visualization for the results of the whole genome sequencing pipeline. I took my figure to one of my coworkers to ask him what he thought. His response was that he thought that the structure was nice, but he couldn’t distinguish between some of the colors because he was colorblind. He shared this article with me, and that was my first real introduction into using different color palettes for figures.

Read More

July 22, 2019

docopt

Awhile back I came across a really cool Python library for command-line interfaces called docopt. I was exposed to it through this video.

Read More

July 22, 2019

Information Criteria

When I was an undergrad I took a course that covered linear regression and ANOVA in depth. We covered both theory and basic statistical computing in R (mostly how to use the models we learned in R). At some point in the course we discussed model selection: forward selection and backward elimination. In forward selection, you start with no variables in your model and greedily “improve” the model by adding one variable at a time. In backward elimination you start with a model including all possible variables and greedily “improve” the model by removing one variable at a time.

Read More

July 16, 2019

pbapply

Last week I had a long running apply (essentially running correlations on vectors that were of length around 10^5, I had around 10^5 of these vectors to compare to a “reference”). It was taking a long time on my laptop, so I wanted a way to see how long it would take to see if I should use an HPC cluster instead.

Read More

July 8, 2019

Sublime Snippets

I often write code using Sublime. Today, I was writing a script in Python and scaffolding out the different functions I thought that I would need, using the pass keyword as a placeholder for the body of each of the functions. A habit that I have (maybe a bad one) is to leave the comment # todo to mark places where I need to make changes. This could be actually filling in an implementation, removing placeholder code, or any other sort of change. Today, one # todo was for refining a constant value, others were for placeholder data, etc. I realized that there was probably a way to make it so that when I typed pass Sublime would autocomplete (or I could tab-complete) to pass # todo.

Read More

June 21, 2019

f-divergence

There are many applications of the information theoretic idea of Kullback-Leibler divergence (also known as KL-divergence and relative entropy). It is a measure of how one probability differs from another. Given two discrete probability distributions \(p,q\) defined on the same probability space, the KL-divergence, \(D\) between \(p\) and \(q\) is defined as

Read More

June 18, 2019

Enbrel

Two weeks ago the Washington Post published this article about Pfizer’s rheumatoid arthrisis biologic, Enbrel (Etanercept). The main idea of the article is that Pfizer had data in 2015 showing that Enbrel could be useful in treating Alzheimer’s and didn’t act on it. I don’t want to rehash the discussion about whether or not Pfizer did the right thing. It’s covered nicely by Derek Lowe and John Carroll. Instead, since I’m trying to make it a priority to learn more biology this summer (and in general), I want to talk about Enbrel, its target (TNF alpha), and its history.

Read More

June 10, 2019

Law of the Unconscious Statistician

The Law of the Unconscious Statistician is something that I’m pretty sure I’ve used before, but I’m not sure that I’ve ever proved. The law is a theorem that states

Read More

June 7, 2019

Rejection Sampling

Sometimes we want to sample from a distribution that we don’t necessarily know how to sample from directly (i.e. it isn’t one of the distributions that comes built-in to our favorite software package). The classic example is generating random points uniformly distributed within a circle. The idea is to enclose the circle within a square. It is easy to generate uniform points inside of a square. We do that, and then throw away all points that don’t fall inside the circle. Here is some python code that does this:

Read More

June 5, 2019

Neyman-Pearson Lemma

The Neyman-Pearson Lemma is a fundamental result in the theory of hypothesis testing and can also be restated in a form that is foundational to classification problems in machine learning. Even though the Neyman-Pearson lemma is a very important result, it has a simple proof. Let’s go over the theorem and its proof.

Read More

May 21, 2019

Inverse Transform Sampling

Imagine that your computer is only able to sample from a uniform distribution on \([0,1]\). This is useful, but often you want to sample from a distribution other than uniform (for example, normal, binomial, Poisson, etc.). Is there a way to make your uniform random variable look like different distributions? For univariate distributions inverse transform sampling provides a solution to this problem.

Read More

May 14, 2019

Horner's method

I recently came across Horner’s method for the first time. It’s a simple algorithm for evaluating polynomials at a point and is a good example of why we don’t necessarily compute by using a definition directly (or by using the simplest method, or a method that is intuitive). For example: matrix operations (multiplications, inverses) are often done (under the hood) in ways that don’t follow the definitions or methods you might learn in your first linear algebra class.

Read More

March 28, 2019

Automated Science

When I started my master’s program in bioinformatics and genomics, I only had a view of computational biology and bioinformatics from the outside looking in. The program and my later experience at Illumina would push me towards genomics, but at this point I had a very shallow understanding of the field and envisioned myself working in any number of sub-areas. One kind of scary thought is that in the summer of 2014 I was starting to look for companies that I was excited about to try to find internships in and I came across Theranos. Luckily, I didn’t even end up applying there and I am grateful that I got to work at Illumina.

Read More

March 27, 2019

Eigenvalue Properties

When I took the math subject GRE I learned that in order to solve common linear algebra questions on the exam quickly, you should keep in mind a few facts about eigenvalues.

Read More

March 22, 2019

Sicherman Dice

Sicherman dice are a “relabeling” of a standard pair of dice in such a way that their sum has the same distribution as a standard pair of dice (i.e. there is only 1 way to get a 2, 2 ways to get a 3, etc.). While a standard pair of dice is labeled with (1,2,3,4,5,6) on each of the two dice, Sicherman dice are labeled with (1,2,2,3,3,4) and (1,3,4,5,6,8). It’s left as an exercise to the reader to confirm that this relabeling has the same distribution as a standard pair of dice :)

Read More

January 11, 2019

Favorite Books of 2018

In 2017 I set a goal of reading 52 books, and failed with only 31 books. In 2018, I was more methodical about accomplishing my goal. I started tracking my books publicly on Goodreads. My train commute from Boston to Providence also provided a lot of extra time for reading (when I probably should have been working instead). My final tally was 76 books in 2018 and you can see the full tally here if you are interested.

Read More

October 26, 2018

Academic Competitions

Over the past week, I’ve found myself thinking about academic competitions several times. Growing up, the only academic competitions I participated in were my elementary school’s annual spelling bee. In high school, I never joined the debate team or DECA or anything else. When I was in college I competed twice in the Mathematical Competition in Modeling (MCM), once as a completely unprepared sophomore and once as a more reasonably prepared senior. I flirted with doing the Putnam exam, but I doubt I would have accomplished much.

Read More

October 8, 2018

IR Sensitivity in Lips

Today I finished reading Physics for Future Presidents by Richard Muller. I found one short section pretty interesting (not that the rest of the book wasn’t interesting) in the discussion of applications of infrared radiation. I have copied it here:

Read More

July 17, 2018

How Google Works

I recently read How Google Works and found it really interesting. Usually I don’t take notes when reading (maybe I should…), but I found myself marking a lot of pages to come back to later. These notes aren’t at all a comprehensive representation of the book, just a collection of ideas that I found most striking.

Read More