Tuesday, May 5, 2015

Data & Modeling

Recently, in research, I've come across the question of what proportion of data and modeling are ideal. You see both of them in any thesis - some are purely data-based without much effort put into modeling the data, and others are purely simulation-based and not tested (at times cannot be tested) in real life. My own research has seen both - some work that was entirely based on simulation and other based entirely on data, with very little effort made to model it.

Neither approach is right or wrong. There are genuinely situations where, due to limitations of money or even technology, where collecting data is next to impossible. Alternatively, there are situations where modeling is so complex and poorly understood that data and data-based models are the only practical remedies. Ideally, a mix of both would be suitable, though what proportion is right is subjective.

How does one go about judging such work then? Both approaches have their shortcomings and these should be scrutinized. Models need to make reasonable assumptions that are not obviously incorrect (easier said than done!) and should explore a range of possible solutions. Sensitivity analysis is a popular tool to understand to what extent the assumptions change the answer. The key is careful and in-depth analysis of the model. Purely data-based work should be judged strictly from experimental results - many researchers try to prove a pre-determined hypothesis without adequate data and this needs to be shot down. Repeatability is another issue for, as my adviser likes to say, 'anyone can prove anything with just one data point!'

The most ideal of research, a mixture of both modeling and data, is also the most difficult to assess because it carries both burdens. No wonder then that so few do it! 

No comments: