Home arrow Examples arrow Iterative sorting, filtering and more
Iterative sorting, filtering and more

population_thumb.pngWhen dealing with analytics, there are times that a combination of filtering, sorting and calculations are wanted. Because of the limitations and complexity of many business reporting solutions, it has become common practice to extract the data and manually finish the analysis in a spreadsheet like Microsoft Excel. That extra effort is not required with nextanalytics -- it is a simple matter to perform almost any combination of filtering, sorting and common calculations like growth or net change.

 

In this example, we'll extract data from a 540,000-row database of U.S. city populations over a 44-year period, looking for the cities that have grown over the previous 20 years, and with populations over 1 million people, in states with more than 5 such cities. None of these criteria are complex concepts, and many business analytics scenarios have similar combinations of filtering, growth calculations, and sorting but many reporting solutions are incapable of performing the sequential operations required. With nextanalytics, we generate a simple script to solve a simple problem. 

The entire nextanalytics script, complete with database query, is included at the bottom of the page

Querying the database

From a 540,000-record database of U.S. city population statistics, we extract the state and city populations for each year available. Note that the year column is only two digits, and for 2000 through 2004, a simple SQL query returns single digits. While we could write a longer SQL query to reformat the years to 4-digits, we can also perform this type of change within nextanalytics and reduce the need for skilled SQL coding.

population1.png

Swap the year to the columns to make a crosstab

With one command (script line 6), we swap the text column "year" to the columns. This is effectively a "pivot operation" because it creates a crosstab that has a column for every unique value contained in the year column. Please take note that this crosstab was created without a cube and there was no need to predefine metadata or or even know what the values were in the original column.

Script lines 9 through 17 convert the column captions into 4-digit dates, ensure they are in the right order, and set a page title.

population2.png

Filter 1 million plus and calculate 20-year growth

In one script command (line 20), we filter the data to keep only the cities with populations over 1 million in 2004. To calculate the 20-year growth, we compare each cell to one 20-columns to the left (line 24) and express the result as a percentage. As you can see by looking at the number of parameters in these two script lines, there are plenty of other filtering and comparison options available within nextanalytics, but have no fear -- our open source authoring tool porvides a wizard-like interface to make sense of it all.

We give this page a title because we will come back to it. All intermediate steps within nextanalytics are kept as 'pages' in memory, and can be referenced freely from anywhere else in the script sequence.

population3.png 

Large, growing cities by state

Another quick filter operation to keep only growing cities (with growth numbers above 100 percent), we then combine similar rows by state to get a count of the number of cities and, in this case, an average growth figure (line 33).

population4.png

States with the most large growing cities

One more filter operation to keep the counts of 5 or higher, and we have the states with the most large (>1 million population) growing (compared with 20-years ago) cities. To find out what those cities are, we go back to a previous page 

population5.png 

Large growing cities in the states with the most

To go back to a previous page, we simple reference the page title or caption (line 40). Then we perform another type of filtering operation (line 41) - filtering by the rows of another page, in this case, the Winning States page we just created. We keep only those rows that start with one of the states in the Winning States list.

To make the results a little more useful, we apply some alert colors to the final page, showing red/green for numbers below/above the average of the entire page, showing the larger and smaller numbers on the page and making a few patterns more evident.

population.png

Summary and complete script listing

As you have seen here, it is not difficult to perform successive analytics operations with nextanalytics. The scenario here occurs daily within business - filtering sales growth figures, looking for shrinking manufacturing volumes, calculating the month-over-month expenses for cost centers that are over budget. These analytic concepts are not complicated, but typical SQL query and OLAP reporting tools make it difficult to get the information. Spreadsheets are frequently used, but they are labor-intensive and error-prone.

With nextanalytics, sequential analytic steps are easily handled in simple scripts like this one.

Complete script listing
  1. prompt,add,[CONNECTION],"Driver={PostgreSQL ANSI};Server=postgres;Database=population;UID=enterprisedb;PWD=password;"
  2. query,command,"SELECT state as rowlabel_state, city as text_city, year as text_year,sum(pop) as numeric_pop FROM population GROUP BY 1,2,3"
  3. query,runsave,ODBC,[CONNECTION],stateCityYearPop.nod
  4. ; pivot the date to the columns to make a crosstab
  5. swapTextColumnWithColumn,year
  6. ; rename the column captions to 4-digit dates
  7. changeLabelCaption,false,Column,*,19*
  8. changeLabelCaption,false,Column,190,2000
  9. changeLabelCaption,false,Column,191,2001
  10. changeLabelCaption,false,Column,192,2002
  11. changeLabelCaption,false,Column,193,2003
  12. changeLabelCaption,false,Column,194,2004
  13. ; sort the columns in numeric order
  14. sortColumnLabel,false,false
  15. PageCaption,All city populations by year
  16. ; keep only cities over 1 million in 2004
  17. FilterByValue,RemoveByVal,LessThan,Row,-1,2004,False,0,1000000,True,Actual
  18. PageCaption,All cities over 1 million in 2004
  19. ; keep only cities that were growing in 2004
  20. Compare,ToRelativeCell,PercentOf,Actual,Row,,,-1,-1,0,-20,False,False,Current,
  21. FilterByValue,RemoveByVal,LessThan,Row,-1,2004,False,0,100,True,Actual
  22. selectrange,column,last,10
  23. keepselected
  24. PageCaption,Population growth in cities over 1 million
  25. ; Count the number of cities per state
  26. select,column,Last
  27. KeepSelected
  28. combinesimilarbyindex,-1,Average
  29. ; keep only states with 5 or more cities that meet all the criteria
  30. FilterByValue,RemoveByVal,LessThan,Row,-1,Count,False,0,5,True,Actual
  31. PageCaption,Winning States~
  32. ; go back to the earlier page and filter matching states, then highlight
  33. Select,Page,Population growth in cities over 1 million
  34. selectbycaptions,KeepBySel,Row,-1,True,Winning States,False,,False,ExactMatch,0,
  35. SortRowLabel,False
  36. SetTupleValue,Page,Average,[Average]
  37. ChangeAlertTo,axis,-1,Page,Average,0,False,,-1
  38. PageCaption,Growth of Qualifying Cities in Top States
  39. PageDescription,Twenty-year growth (%) for cities over 1 million and growing in 2004. Green/Red indicates above/below the page average ([Average]).

 

 
< Prev   Next >