When calculating different profitability measures for a life insurance company, one of the most important parameters to know is the probability of a policy being in force at any given time after the start of risk bearing. These probabilities are given by the survival function. In this paper, we examine data from a Hungarian insurance company, in order to build models for the survival functions of two life insurance products. For survival function estimation based on the unique parameters of a new policy, Cox regression is used. However, not all parameters of a new policy are relevant in estimating the survival function. Therefore, application of model selection algorithms is needed. Furthermore, if the exact effects of the policy parameters for the survival function can be determined, the insurance company can direct its sales team to acquire policies with positive technical results. When traditional model selection techniques proposed by the literature (such as best subset, stepwise and regularization methods) are applied on our data, we find that the effect of the selected predictors for survival cannot be determined, as there is a harmful degree of multicollinearity. In order to tackle this problem, we propose adding the hybrid metaheuristic from Láng et al. (2017) to the Cox regression in order to eliminate multicollinearity from the final model. On the test sets, performance of the models from the metaheuristic rivals those of the traditional algorithms with the use of noticeably less predictors. These predictors are not significantly correlated and are significant for survival, as well. It is shown in the paper that with the application of metaheuristics, we could produce a model with good predicting capabilities and interpretable predictor effects. These predictor effects can be used to direct the sales activities of the insurance company.
Allen, G. I. (2013): Automatic Feature Selection via Weighted Kernels and Regularization. Journal of Computational and Graphical Statistics 22(2): 284–299.
Arnold, T. B. (2017): KerasR: R Interface to the Keras Deep Learning Library. The Journal of Open Source Software 2.
Begg, C. B. – Cramer, L. D. – Venkatraman, E. S. – Rosai, J. (2000): Comparing Tumour Staging and Grading Systems: A Case Study and a Review of the Issues, Using Thymoma as a Model. Statistics in Medicine 19(15): 1997–2014.
Breheny, P. (2013): ncvreg: Regularization Paths for SCAD-and MCP-Penalized Regression Models. R package version 2.6-0.
Calaway, R. – Weston, S. (2014): doParallel: Foreach Parallel Adaptor for the Parallel Package. R package version 1 (8).
Calcagno, V. – de Mazancourt, C. (2010): glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models. Journal of Statistical Software 34(12): 1–29.
Cox, D. (1972): Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological) 34(2): 187–220.
Cox, D. R. (1975): Partial Likelihood. Biometrika: 269–276.
Fan, J. – Li, R. (2002): Variable Selection for Cox's Proportional Hazards Model and Frailty Model. Annals of Statistics 74–99.
Fan, J. – Feng, Y. – Wu, Y. (2010): High-Dimensional Variable Selection for Cox's Proportional Hazards Model. In: Berger, J. O. – Cai, T. T. – Johnstone, I. M. (eds): Borrowing Strength: Theory Powering Applications–A Festschrift for Lawrence D. Brown. Beachwood, OH: Institute of Mathematical Statistics, pp. 70–86.
Furnival, G. M. – Wilson, R. W. (1974): Regressions by Leaps and Bounds. Technometrics 16(4): 499–511.
Gillespie, B. (2006): Checking Assumptions in the Cox Proportional Hazards Regression Model. Presented at the Midwest SAS Users Group (MWSUG) Dearborn, Michigan.
Grosen, A. – Jørgensen, P. L. (2000): Fair Valuation of Life Insurance Liabilities: The Impact of Interest Rate Guarantees, Surrender Options, and Bonus Policies. Insurance: Mathematics and Economics 26(1): 37–57.
Jia, J. – Yu, B. (2010): On Model Selection Consistency of the Elastic Net. Statistica Sinica 20: 595–611.
Láng, B. – Kovács, L. (2014): Linear Regression Model Selection Using Improved Harmony Search Algorithm. SEFBIS Journal 9(1): 15–22.
Láng, B. – Kovács, L. – Mohácsi, L. (2017): Linear Regression Model Selection Using a Hybrid Genetic – Improved Harmony Search Parallelized Algorithm. SEFBIS Journal 11(1): 2–9.
Leng, C. – Zhang, H. (2006): Model Selection in Nonparametric Hazard Regression. Nonparametric Statistics 18(7–8): 417–429.
Lumley, T. – Therneau, T. (2004): The Survival Package. R News 4(1): 26–28.
Minerva, T. – Paterlini, S. (2010): Regression Model Selection Using Genetic Algorithms. Proceedings of the 11th WSEAS International Conference on RECENT Advances in Neural Networks, Fuzzy Systems & Evolutionary Computing: 19–27.
Saldana, D. F. – Feng, Y. (2018): SIS: An R Package for Sure Independence Screening in Ultrahigh Dimensional Statistical Models. Journal of Statistical Software 83(2): 1–25.
Sheldon, T. J. – Smith, A. D. (2004): Market Consistent Valuation of Life Assurance Business. British Actuarial Journal 10(3): 543–605.
Tibshirani, R. (1997): The Lasso Method for Variable Selection in the Cox Model. Statistics in Medicine 16(4): 385–395.
Vanderhoof, I. T. – Altman, E. (Eds.) (2013): The Fair Value of Insurance Liabilities (Vol. 1). Springer Science & Business Media.
Zhang, Z. (2016): Variable Selection with Stepwise and Best Subset Approaches. Annals of Translational Medicine 4(7): 136.
Zhao, P. – Yu, B. (2006): On Model Selection Consistency of Lasso. The Journal of Machine Learning Research 7: 2541–2563.