Putting personal data to use

Criteria for Analysis

Criteria for Analytic Approach

 

What do we need to consider for all approaches?

For each data situation, there are a number of approaches that can be applied, each with it's own assumptions and shortcomings.  However, there are certain overarching characteristics that should be quantified and analyzable for any adopted method, listed below.

 
 

How do we know if a difference is significant?

In statistics and probability, this is often described as the type I error, and is formally defined by alpha, which is the probability of rejecting the null hypothesis when it is actually true.  In plain words, this is the metric that we use to define significance (or 'p value') that is used to decide how likely it is that the finding that we observe in the data was due to random chance.   In order for each approach to be useful, we will need a method to determine this probability, which can be used either as a decision point in a binary decision (i.e., did the treatment work) or for screening results from a loop over many possible variables (i.e., which genes are associated with the outcome)

 

Do we have enough data to draw any conclusions?

The opposite of the type I error is the type II error, which is denoted by beta, and is formally defined as the probability of failing to reject the null hypothesis.  In plain words, this is a measure of whether we have collected enough data to say for certain that there is no difference between groups.  The opposite of this measure is often referred to as the 'power' (1 minus beta), and is very important when analyzing data because it can mean the difference between stating that there is no relationship between factors and stating that there is not enough data to determine.  

 

What are the assumptions of the approach, and where might there be bias?

Arguably the most important quote describing quantitative methods is that "all models are wrong, some models are useful." In developing an analytical approach, there are always going to be assumptions about how the data was generated or collected, and how it would otherwise exist in nature.  When these assumptions are violated in a systematic way, we say that there is bias.  Some bias is generally unavoidable, but can be limited if the method accounts for it in application.