Over the last few years, I have been trying to tease relationship out of data. I have tried a wide variety of methods and finally found one that been producing good results.
The method is conceptually straight forward:
- Take the actual reading and apply a monotonic increasing function to it. Thus if Valuea < valueb then func(Valuea) < func(valueb)
- With the resulting data, transform it to be a rectangular distribution for all samples
- Hypothesis test the values from people who recorded symptoms using P=0.01 as a threshold
Once the candidate association are done then we can also test if a sample’s item satisfies the hypothesis.
This approach has some nice characteristics, because it will detect patterns that:
- are not linear on the values
- does not assume a normal distribution
- does not not assume items are caused by end associations (i.e. too high or too low)
- In some cases, we see a shift into a middle range that is statistically significant
Adjusting “Middle Peak” patterns
Both of the above above are typical beliefs that people will attempt to apply to the data.