Data Mining of Telematics Data: Unveiling the Hidden Patterns in Driving Behaviour


With the advancement in technology, telematics data which capture vehicle movements information are becoming available to more insurers. As these data capture the actual driving behaviour, they are expected to improve our understanding of driving risk and facilitate more accurate auto-insurance ratemaking. In this paper, we analyze an auto-insurance dataset with telematics data collected from a major European insurer. Through a detailed discussion of the telematics data structure and related data quality issues, we elaborate on practical challenges in processing and incorporating telematics information in loss modelling and ratemaking. Then, with an exploratory data analysis, we demonstrate the existence of heterogeneity in individual driving behaviour, even within the groups of policyholders with and without claims, which supports the study of telematics data. Our regression analysis reiterates the importance of telematics data in claims modelling; in particular, we propose a speed transition matrix that describes discretely recorded speed time series and produces statistically significant predictors for claim counts. We conclude that large speed transitions, together with higher maximum speed attained, nighttime driving and increased harsh braking, are associated with increased claim counts. Moreover, we empirically illustrate the learning effects in driving behaviour: we show that both severe harsh events detected at a high threshold and expected claim counts are not directly proportional with driving time or distance, but they increase at a decreasing rate.