Predicting The Frequency Of Asteroid Impacts With A Poisson Processes


An Application of the Poisson Process and Poisson Distribution to Model Earth Asteroid Impacts

Here’s some good news: if you’ve spent hours studying a concept by reading books and class notes on the theory and you just can’t seem to get it, there’s a better way to learn. Starting with theory is always difficult and frustrating because you can’t see the most important part of a concept: how it’s used to solve problems. In contrast, learning by doing — working through problems — is more effective because it gives you context, letting you fit the technique into your existing mental framework. Moreover, studying through applications is more enjoyable for those motivated by an intrinsic desire to solve problems.

In this article, we’ll apply the concepts of a Poisson Process and Poisson Distribution to model Earth asteroid impacts. We’ll build on the principles covered in The Poisson Process and Poisson Distribution Explained, putting into practice the ideas with real-world data. Through this project, we’ll get a sense of how to use statistical concepts to solve problems and how to compare the observed results with theoretical expected outcomes.

The full code for this article (with interactive apps) can be run on by clicking the image. The Jupyter Notebook is on GitHub.

Click to launch a Jupyter Notebook on for this article.

Probability Mass Function of Number of Asteroid Impacts

Read More

What I Learned From Writing A Data Science Article Every Week For A Year

School of Athens by Raphael (Source)

Lessons on the life-changing power of data science writing

There ought to be a law limiting people to one use of the term “life-changing” to describe a life event. Had a life-changing cup of coffee this morning? Well, hope it was good because that’s the one use you get! If this legislation came to pass, then I would use my allotment on my decision to write about data science. This writing has led directly to 2 data science jobs, altered my career plans, moved me across the country, and ultimately made me more satisfied than when I was a miserable mechanical engineering university student.

In 2018, I made a commitment to write on data science and published at least one article per week for a total of 98 posts. It was a year of change for me: a college graduation, 4 jobs, 5 different cities, but the one constant was data science writing. As a culture, we are obsessed by streaks and convinced those who complete them must have gained profound knowledge. Unlike other infatuations, this one may make sense: to do something consistently for an extended period of time, whether that is coding, writing, or staying married, requires impressive commitment. Doing a new thing is easy because our brains crave novelty, but doing the same task over and over once the newness has worn off requires a different level of devotion. Now, to continue the grand tradition of streak completers writing about the wisdom they gained, I’ll describe the lessons learned in “The Year of Data Science Writing.”

The five takeaways from a year of weekly data science writing are:

  1. You can learn everything you need to know to be successful in data science without formal instruction
  2. Data science is driven by curiosity
  3. Consistency is the most critical factor for improvement in any pursuit
  4. Data science is empirical: instead of relying on proven best methods, you have to experiment to figure out what works
  5. Writing about data science — or anything —is a mutually beneficial relationship as it benefits you and the entire community

Each of the takeaways is accompanied by a graph of my Medium article stats. You can find the Jupyter Notebook here or get your own stats with this article.

Read More

Interactive Controls For Jupyter Notebooks


How to use interactive IPython widgets to enhance data exploration and analysis

There are few actions less efficient in data exploration than re-running the same cell over and over again, each time slightly changing the input parameters. Despite knowing this, I still find myself repeatedly executing cells just to make the slightest change, for example, choosing a different value for a function, selecting various date ranges for analysis, or even adjusting the theme of a plotly visualization. Not only is this inefficient, but it’s also frustrating, disrupting the flow of an exploratory data analysis.

The ideal solution to this issue would be interactive controls to change inputs without needing to rewrite or rerun code. Fortunately, as is often the case in Python, someone has already run into this problem and developed a great tool to solve it. In this article, we’ll see how to get started with ( ipywidgets), interactive controls you can build with one line of code. This library allows us to turn Jupyter Notebooks from static documents into interactive dashboards, perfect for exploring and visualizing data.

You can view a completely interactive running notebook with the widgets in this article on mybinder by clicking the image below.

Click Here for an Interactive Notebook

IPython widgets, unfortunately, do not render on GitHub or nbviewer but you can still access the notebook and run locally.

Example of interactive widgets for data visualization

Read More

A Great Public Health Conspiracy


The collusion of science, medicine, and the government to improve public health. Episode Five of The Reality Project.

In January 2013, the city council of Windsor, Ontario made a curious choice. By a vote of 8–3, they implemented a plan to increase rates of tooth decay and cavities among the town’s children by more than 50%. What evil action did these councilors impose? Holding Halloween multiple times per year? Letting a candy company make school lunches? No, they did something far more disastrous: they deliberately choose to ignore mountains of medical advice, give in to public hysteria, and undo one of the greatest public health achievements of the last century, the fluoridation of public water.

Water fluoridation, adding the mineral fluoride to public water supplies to protect against tooth decay, has been a common practice since 1945. About 400 million people worldwide drink water with added fluoride, including at least 66% of the US population. Thousands of studies have found water fluoridation to be beneficial and risk-free, with the Centers for Disease Control and Prevention Concluding: “fluoride is both safe and effective in preventing and controlling dental caries” (caries being the medical term for tooth decay/cavities). Nonetheless, the government of one small Canadian town thought they knew better than the entire medical community.

After 6 years, a 51% increase in cavities among children, a 300% increase in low-income families needing financial support for dental care, and untold suffering for the town’s children, the Windsor council reluctantly reversed its decision in a December 2018 meeting. At the meeting, 5 dental experts testified, giving their full support to water fluoridation in accordance with all the medical evidence. On the other hand, 20 citizens— none with a medical degree — voiced their opposition based on unfounded fears and a belief government should not “medicate” the public even when that medication prevents children from going to the hospital. Fortunately, at the end of the day, reason, and a poll showing 80% of the citizens wanted water fluoridation prevailed, and Windsor will once again enjoy the benefits.

Read More

The Poisson Distribution And Poisson Process Explained


A straightforward walk-through of a useful statistical concept

A tragedy of statistics in most schools is how dull it’s made. Teachers spend hours wading through derivations, equations, and theorems, and, when you finally get to the best part — applying concepts to actual numbers — it’s with irrelevant, unimaginative examples like rolling dice. This is a shame as stats can be enjoyable if you skip the derivations (which you’ll likely never need) and focus on using the ideas to solve interesting problems.

In this article, we’ll cover Poisson Processes and the Poisson distribution, two important probability concepts. After highlighting only the relevant theory, we’ll work through a real-world example, showing equations and graphs to put the ideas in a proper context.

Read More