Kognitio examples
Basket Analysis | Basket Analysis |
|
Querying the databaseThe Kognitio WX2 database is super-scalable, enabling it to query the buying behavior from 1,000 stores with 16,000 products over 52 weeks -- 11 billion rows of data -- with only a two or three second delay. On a smaller scale, the following example was run on a typical desktop computer with a reduced scope of just 25 stores over a 23 week period (21 million rows) in only a few minutes. the complete nextanalytics script along with the database query is included at the bottom of the page The process starts with the query itself, included in full within the nextanalytics script for clarity. Usually, the query would be stored in a separate file where it can be developed, tested and maintained by a database expert, freeing the business analyst from having to deal with any intricacies of the SQL command syntax. Supplying external prompt valuesThe script includes the use of a variable, or 'prompt' to define the product of interest - in this case, products containing 'after eight'. Because nextanalytics supports external prompt values, it would be easy to design a customized user interface that allows the user to choose their own value for this prompt and supply it to the nextanalytics script. Since nextanalytics comes with an Open Source user interface, customization is a simple matter. For an example on how to do this, click here. Contents of the basketThe query returns a list of products that were purchased in the same transaction as 'after eight' -- i.e. they were in the same shopping basket and purchased by the same person. The WEEK column is a number representing the calendar week of the year. The ITEMS_SOLD column is the total number of these companion items sold with 'after eight's for each week.
Pivot (swap) the sales week dimension to the columnsUsing nextanalytics, we create a crosstab of query result. See the example of a Simple pivot operation to get a better explanation of the challenges with converting SQL queries to a crosstab and how nextanalytics makes it easy. The page that results from the operation shows the how many items were sold in each week that were in baskets that included 'after eight' in them.
Total and sortThe next step in the example adds a summary column; in this case a Sum of the values across the row. As an alternative (or even additionally) to Sum, nextanalytics could just as easily have calculated the average, median or maximum. After adding the summary column, it's useful to sort to see the most commonly purchased items (that had 'after eight' in the basket). By looking at the crosstab, we can see the buying behaviour over the course of the weeks from the query. There are many analytic operations that could be employed on this including but not limited to normalization, standard deviations (showing the outliers), coloring by cell value (revealing patterns), and many more. If you incorporated a Data Visualization package, it would be interesting to view this data plotted with a multi-line chart that showed the top products.
Looking at the report, we see plenty of counts of 1 and 2 items in each week with the occasional 3 and 5 items appearing. You might wonder, for each product, how many times only 1 was purchased with 'after eight' each week, how many times 2 were purchased, and so on. This would indicate whether the sales were steady or fluctuated frequently. This is the kind of information that would be good to know when correlating with a specific marketing activity such as an advertisement or a point of purchase display. Getting the count of values in each rowWith nextanalytics, this kind of 'frequency distribution' analysis is easy. Script lines 31 and 32 take care of answering this question. The page below shows how many times each number appeared in each row. For the Chubby Rabbit&Chicken (the first product in the list), there were 7 weeks when only 1 item was sold with 'after eight', 5 weeks where 2 were sold, and 2 weeks where 3 were sold.
Analytical reports should be self-explanatory so nextanalytics provides an easy way to add the context for the end user. With the nextanalytics script language, you can change column headings, add a page title and subtitle, and insert the Total column from another page. Now the meaning of the report is more clear, and one can see whether the top selling companion products were steady, low volume inclusions or whether they were more 'lumpy' in their distribution.
You have seen how easily nextanalytics can take an initial query of many millions of records to answer just a couple of questions about buying behaviour. You have also seen how the answers of one question can raise others, and how quickly and easily nextanalytics can reveal answers. This is adhoc analysis in its best possible form: minimal complexity to get the answers you seek. Summary and complete script listingThe following is the entire script required to produce the foregoing analytics.
Complete script listing
|
| Next > |
|---|
The Kognitio WX2 database can perform complex queries against a very large dataset and return the results very quickly. When coupled with nextanalytics, the query result can be further processed and viewed in a wide variety of customizable and useful ways.
These examples demonstrate how nextanalytics can leverage the power of the Kognitio WX2 database to deliver a complete high speed analytical solution. These examples are provided as an introduction to some of the things that can be done, even when working from a very large dataset (e.g. 11 billion records). Everything you see is extensible and customizable.
Visit Kognitio's web site for more information.