Predictive Analytics and Fraud Detection Detect, Identify, and Protect

 This is an essay I wrote for an IBM business partner on their Counterfraud end-to-end software stack and solutions in 2015.

For anyone who has ever been a victim of identity theft, there is the realization that fraud can have a heavy cost to all it touches. It has heavy financial costs and to its victims, and it can have heavy, lasting personal costs. When we discuss the numbers involved with fraud, often in the millions of dollars, we can forget the faces of victims, often who take years to recover from the effects upon their lives. Fraud examiners, those we call Subject Matter Experts (SME’s), build business rules based upon what has occurred, or the posthumous examination of crimes committed. With predictive analytics, running on powerful enterprise architectures, such as IBM’s predictive analytics stack for Counterfraud or Anti-Money Laundering, which often includes SPSS and i2, we can look forward to detect, identify, and to protect against future incidences of fraud before it is likely to occur.


Fraud impacts every industry from healthcare, retail, insurance, banking, etc. It can appear as internal theft, as cheating, money laundering, ghost employees, or as identity theft. In fact, the Association of Certified Fraud Examiners, from the 2014 Report to the Nations, reported that a typical company loses 5 percent of its annual revenue to fraud, and has a median loss of $145,000. More than $3.7 trillion worldwide ends up in the wrong hands. The costs of fraud to a company are passed on to you and to me, consumers, as increased costs. 

Modeling fraud is different from other types of predictive analytics in that fraud has unique characteristics.  Van Vlasselaer (2013) shared the five characteristics of fraud. 1) Fraud is uncommon, meaning that it occurs in less than 1 percent of cases, making it hard to detect.. 2) Fraud is  well-considered. Those who commit fraud do so with a plan in place. 3) Fraud is imperceptibly concealed. Fraud is often difficult to detect both because of its infrequency and because of its characteristics. 4) It is carefully organized. Those who commit fraud are in social networks that can be tracked and whose patterns can also be predicted.  Lastly, fraud is 5) time-evolving. 

To avoid detection, fraudsters will purposefully alter their patterns. Often those who commit fraud or money laundering may also be involved in other criminal activities. Thus, adding external data sources, such as the Dow Jones, LexisNexis, or even Social Media to a company’s internal data sources is an invaluable way of identifying fraudsters. To model fraud, subject matter expertise is an essential place to start. SME’s hold both industry knowledge and knowledge about fraudster’s past behaviors. They understand the red flags or the business rules and profiles of fraudsters (Nisbet, 2009). However, statistical algorithms have the ability, with powerful datasets, to predict future behavior. IBM’s solutions for AML and Fraud, both which are built on IBM’s powerful software stack combining  SPSS Predictive Analytics, i2, Cognos Business Intelligence, and Watson Cognitive Computing software stacks. This means, for example, that if a person has gone by several identities or partials at multiple addresses, these will be resolved to one known person. This software also has the ability to ingest multiple data sources. The algorithms in the predictive software can analyze for anomalies, social networks, and important predictors and key phrases or related terms. 

In a recent case study, a Midwestern state contracted to search for known financial fraud using IBM’s Social Media Analytics. Data Sources included Board Reader (blogs, social media sites, microblogs, Facebook) and Twitter. Several text analytics ontologies were built, meaning a particular way of organizing terms that were being searched for on the web. On a high level, this included searching for locations in the state where high crime and fraud were known to have occurred, as well as gathering areas, such as sporting areas, shopping, clubs etc. Knowing that the fraud was occurring in networks, and it was imperceptibly concealed (Van Vlasselaer, 2013), meaning that those who were discussing it on the web were not doing so openly, other methods were employed to discover it.  

It was determined that those committing the fraud knew not to enable geolocation for social media on mobile devices and other sites. However, language is a code and a marker. Just as millennials will use Twitter the same as many will use texting – with the idea that their messages are opaque to their parents, or to others they do not share slang with, fraudsters are much the same (Swanson, 2008). They use language in unique and changing ways. Language is the indicator, and it changes quickly. To keep up with these changes, algorithms follow and suggest the next set of terms. In this way, SME’s knowledge was the essential starting place. Next, a knowledge of text analytics, psychology, linguistics, and predictive analytics, were the invaluable skillsets that “cracked the code” to find the fraud and prevented future fraudulent activities.  

References:

ACFE (2014) Report to the Nations on Occupational Fraud and Abuse 2014 Global Study. Association of Certified Fraud Examiners.

Friedman, J., & Combs, G. (1993) Narrative Therapy: The Social Construction of Preferred Realities. W.W. Norton, & Comp: NY.

Nisbet, R. (2009) “Fraud Detection” in Statistical Analysis & Data Mining Applications,Robert Nisbet, John Elder, & Gary Miner. Elsevier: NY.

Swanson, E. (2008). Generation Text. Thesis for Creighton University.

Van Vlasselaer, V. (Nov 2013) Lecture: Social Analysis for Fraud Detection: http://www.kdnuggets.com/2013/11/lecture-social-network-analysis-for-fraud-detection.htmlretrieved on 3/3/2014.





Popular Posts