In our project, we looked at data on levels of inactivity, obesity, and associated diabetes rates in various counties to try to forecast diabetes. When we first tried basic linear regression models, we discovered that they were insufficient since the data had heteroskedasticity. We found that a quadratic model improved by an interaction term was more accurate for predicting diabetes.
In order to evaluate the test errors of several models, including the quadratic one, we picked counties having complete data for all three parameters. With more information, a wider pattern may be seen, allowing for a more straightforward and precise model. The results of our investigation are summarized below, along with a recommendation for potential future research using more information.