Home arrow Examples arrow Benford analysis the easy way
Benford analysis the easy way
As interest in fraud detection grows, the use of Benford analysis is growing in the audit community. This technique relies on comparing the distribution of the leftmost digit in a set of numbers to a theoretical distribution. This is something nextanalytics does with ease - no spreadsheet forumlas to copy or break, adapting to new data sets with almost no effort.

As reference and background, we have used a dataset provided by the Journal of Accountancy in an article written by Lehman, Weidenmeir Watson and Jones.

Assuming we start with their source data (saved to a CSV file) and have to accept it as-is, which is typical in an audit situation, our first task is to import the data we want in the format we need.

Transaction,Date,Employee,Amount,Manager
1,01/01/2006,1981,2.46,
2,01/01/2006,1947,54.34,
3,01/01/2006,1947,12.84,
4,01/01/2006,1033,31.86,

A simple ImportData command brings in the first 'Transaction' column from the CSV file as numeric data, ignores the date because we won't use it, grabs the 'Employee' column as the row label, the 'Amount' as a text column, and ignores the last 'Manager' column. 

importdata,SuperBenford.csv,0,1,numeric_,ignore_,rowlabel_,text_,ignore_

Employee Amount Transaction
1981 2.46 1
1947 54.34 2
1947 12.84 3
1033 31.86 4
8136 17.68 5
1337 80.80 6
1985 350.25 7
1619 20.30 8
1981 8.51 9
1947 25.10 10

Once we have the data, we can filter to just the rows where the Amount starts with '1', then get a collapsed list of Employee numbers and a count. 

selectbycaptions,KeepBySel,Row,0,False,-1,False,,True,StartsWith,1,1
combinesimilarbyindex,-1,Sum

Employee Count Transaction
1033 1,370 39,148,765
1337 1,347 37,876,936
1619 1,414 39,200,048
1947 2,035 56,999,933
1952 1,663 48,389,110
1981 1,444 40,872,509
1985 1,992 55,278,885
4311 1,693 48,044,769
8004 2,249 64,520,338
8136 1,579 43,246,663

Repeating for the digits 2 through 9, we are left with nine different pages, one for each digit, which we join together with a single command.

ConcatenatePagesWith,Digit,StartsWith

Employee From Count
1033 1 1,370
1337 1 1,347
1619 1 1,414
1947 1 2,035
1952 1 1,663
1981 1 1,444
1985 1 1,992
4311 1 1,693
8004 1 2,249

We then perform a simple pivot to get the digits across the columns. This gives us a table profiling each employee and the number of times they had amounts starting with each digit.

SwapTextColumnWithColumn,From

Employee 1 2 3 4 5 6 7 8 9
1033 1,370 807 571 442 364 306 265 234 210
1337 1,347 793 557 422 353 302 264 231 206
1619 1,414 837 594 456 377 317 275 243 218
1947 2,035 1,197 844 658 537 455 393 351 309
1952 1,663 983 692 538 440 369 321 284 253
1981 1,444 859 608 487 378 321 273 245 218
1985 1,992 1,172 829 644 530 444 385 337 305
4311 1,693 992 705 540 445 378 329 287 260
8004 2,249 1,315 938 728 592 503 431 384 343
8136 1,579 927 654 572 414 353 305 271 245

Converting each row into a ratio of the row total, we have the proportion of each employee's amounts that started with each digit.

Compare,ToAxis,Ratio,Sum,Row,,,-1,-1,0,0,False,False,30,

These results can now be compared to the theoretical or ideal value for each digit, loaded as a separate ('Target') page. This comparison can be expressed as percentage of the target, where 100 is a perfect match.   

Compare,ToARow,PercentOf,Actual,Row,Target,,Target,-1,0,0,True,False,1,

With nextanalytics, we can also easily create arbitrary alert levels, such as plus or minus 2.5 or 5 percent. Simple steps that make the headlines stand out on the page.

Employee 1 2 3 4 5 6 7 8 9
1033 100 100 100 100 101 100 100 100 100
1337 100 101 100 97 100 101 102 101 101
1619 99 100 100 99 101 100 100 100 101
1947 100 100 100 100 100 100 100 101 100
1952 100 101 100 100 100 99 100 100 100
1981 99 101 101 104 99 99 97 99 99
1985 100 100 100 100 101 100 100 99 100
4311 100 100 100 99 100 100 101 100 101
8004 100 100 100 100 100 100 99 100 100
8136 99 99 98 111 98 99 99 100 101

That is all there is.  Got another dataset, simply change one line to import the data - the rest of the project runs as is.

 

 
< Prev   Next >