top of page
shutterstock-1350757706.jpg

PREDICTIVE MODEL & EDA ON WHO LIFE EXPECTANCY DATA

Everyone wants a healthy and long life, and does everything possible to make their bodies fit in order to increase their life expectancy. However, it depends on many factors and some of them are uncontrollable such as birthplace, gender, family background. A person can always have a chance to change the factors such as choices in terms of various addictions and BMI. This project aims to predict the average life expectancy in each nation, so governments and WHO can take actions to improve it.

us-life-expectancy-drops-722x406.jpg

These factors affect the average life expectancy of a person.

​

1   Year                      

2   Status    

3   Country                

4   Adult_Mortality  

5   Infant_Deaths     

6   Alcohol                 

7   Percentage_Exp  

8   HepatitisB            

9   Measles                

10  BMI                      

11  Under_Five_Deaths 

12  Polio                     

13  Tot_Exp               

14  Diphtheria          

15  HIV/AIDS             

16  GDP                      

17  Population          

18  thinness_1to19_years

19  thinness_5to9_years  

20  Income_Comp_Of_Resources

21  Schooling                 

​

There is a major difference in life expectancy between developing and developed nations. This can be seen in the chart below. 

developing vs developed.png

These countries have the highest life expectancy on average

highest.png

The correlation matrix is also plotted. The figure below shows the correlation matrix diagram of all features.

who_corr_edited.png

After that, the data is normalized, explored using various features of Keras and  SK-Learn, and finally, split into training and testing datasets. 20% of the total data is reserved for testing.


These models are used for forecasting the value of life expectancy. 

  1) Linear Regression Model

  2) Mixed Effect Model

  3) Deep Neural Networks

Linear Regression Model :

​

Firstly, Linear Regression is used to predict life expectancy. The performance is measured in terms of RMSE (Root Mean Squared Error). The RSME loss for Linear Regression was 4.9438. The chart below shows the relation between residuals and predicted values.

who lr.png

Mixed Effect Model


This model proved very effective for this dataset. RMSE loss for the mixed effect model is 1.556047. The chart below shows the relation between residuals and predicted values.

who_mixed.png

Deep Neural Networks


Neural network is designed as follows.


layers.Dense(128, activation='relu'),

layers.Dense(64, activation='relu'),

layers.Dense(32, activation='relu'), 

layers.Dense(16, activation='relu'), 

layers.Dense(1)


These are some predictions derived from neural networks and the actual value of the life expectancy. 

          Predictions    True

568    65.390266      64.5

569    66.062790      59.2

570    73.869949      73.3

571    71.569748      78.0


The RMSE loss for DNN is 3.1332

​

In this particular case, the mixed effect model is superior to both normal linear regression and the used neural network architecture, since it takes into account the dependence of the data.

bottom of page