Benford’s Law is a statistical regularity of a large number of datasets; assessing the compliance of a large dataset with the Benford’s Law is a theme of remarkable relevance, mainly for its practical consequences. Such a task can be faced by introducing a statistical distance concept between the empirical distribution of the data and the random variable associated with Benford’s Law. This paper deals with the problem of measuring the compliance of a random variable – which can be seen as describing the empirical distribution of a collection of data – with the Benford’s Law. It proposes a statistical methodology for detecting the critical values related to conformity/nonconformity with Benford’s Law in some well-established cases of statistical distance. The followed approach is grounded on the proper selection of a family of parametric random variables – the lognormal distribution, in our case – and of a reference statistical distance concept – mean absolute deviation. A discussion of the obtained results is carried out on the ground of the existing literature. Moreover, some open problems are also presented.
Data validity and statistical conformity with Benford's Law
Mario Maggi
2021-01-01
Abstract
Benford’s Law is a statistical regularity of a large number of datasets; assessing the compliance of a large dataset with the Benford’s Law is a theme of remarkable relevance, mainly for its practical consequences. Such a task can be faced by introducing a statistical distance concept between the empirical distribution of the data and the random variable associated with Benford’s Law. This paper deals with the problem of measuring the compliance of a random variable – which can be seen as describing the empirical distribution of a collection of data – with the Benford’s Law. It proposes a statistical methodology for detecting the critical values related to conformity/nonconformity with Benford’s Law in some well-established cases of statistical distance. The followed approach is grounded on the proper selection of a family of parametric random variables – the lognormal distribution, in our case – and of a reference statistical distance concept – mean absolute deviation. A discussion of the obtained results is carried out on the ground of the existing literature. Moreover, some open problems are also presented.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.