Statistical Analysis of Transport Data

Transport is one of the economic sectors that generate the largest volumes of automated and often spatially explicit data. These include records on the movement of passengers, goods and vehicles, as well as information on ticket purchases, the condition of the transport network, traffic counts, financial transactions, and a wide range of performance indicators.

This abundance of data gives rise to confusion for at least two reasons. First, while transport operators and agencies possess vast amounts of data, they are often uncertain how to use it effectively. At the same time, the costs of data storage and compliance with privacy regulations are increasing.

Second, as I often emphasise on this website, transport is an inherently interdisciplinary field. Economists, engineers and social scientists approach transport data with very different traditions and methodological toolkits. It is essential that both researchers and practitioners recognise the importance of addressing statistical uncertainty appropriately and of distinguishing causal relationships from mere correlations. These principles are well understood in some parts of the community, but remain largely unfamiliar or overlooked in others.

Most of the research projects I work on involve statistics and data analysis to some extent, ranging from purely empirical studies to predominantly theoretical work where we use randomised quantitative methods to assess model sensitivity. Below, the relevant research outputs are grouped into four categories.