Home arrow Blog

nextanalytics blog

A new perspective of business analytics

ward's Blog
Description:
No desc available

My Ramblings on Open Source BI

Posted by: ward in Untagged  on

Seth Grimes published an article about the potential growth of four BI companies. 

“That core software components are free makes open source attractive to users as well as systems integrators and independent software vendors that sell products and services built on open source BI components.” 

Take note: People at least start out expecting products which advertise as Open Source to be free.

The companies he wrote about leverage the words "open source" in their marketing albeit with the word "Commercial" associated with "Open Source". 

I am worried that, to someone who's not "in the know," this distinction is too subtle.  I am thinking of the general press (ie. not BI press) and customers who are new to BI, etc etc. 

They latch onto the words "Open Source," we all know that. They think it's all one and the same, they're talking about True Open Source.

I would worry that might feel as if a bait-and-switch occurred once they learned what's really going on.  To varying degrees, this will eventually detract from the brand and philosophy of "Open Source" (per the OSI definition).  In essence, this is what Roberto Gallopini joked about when he read one of our early days press releases.    BTW, that particular press release was a mistake, some of the wording about Open Source got out of our control and published before we could correct it -- Roberto was quite right to tease us about it :).

Seth said:

 “Commercial open source BI vendors, notably Pentaho and JasperSoft, offer these components in free community editions with open source licenses, and also packaged with non-open source extensions in paid, supported, indemnified editions.
"It's hard to get going without help from the vendor, and anytime I need to upgrade, I get a little bit nervous. (from Seth's article, a quote by Venkat Gaddipati, CTO at online marketplace OnForce)"
“Beyond Compliance harnesses the Palo OLAP Server on the back end and the non-open source Palo Worksheet Server for report distribution.”

To me, these quotes imply that a customer needs closed software to find it useful.   Are these companies really Open Source vendors? I see what I see.   Form your own opinion.

In a separate article by Stephen Swoyer, Vincent Pineau, GM Talend Americas, said:

"We definitely do not feel that open source should equate to free. Yes, we are lower [in price] than our competitors, but we also do not want people to think that cheap in price or cheap in features. We believe that the right features are what people should focus on," Pineau says
At least that's a pretty plain statement, nobody can complain about that.

Ok, then, we have had a project up at Codeplex for a long time now.  It is true open source.  I'd really prefer you pay for the commercial one (at this site).  Can we now call ourselves Commercial Open Source? Is that now an acceptable term? 

Premise-Promise of Open Source

Seth said “For most open source BI adopters, however, the solution search starts with cost.” 

Folks, from my experience, people seeking Open Source want a FREE, totally free, experience.  They want to pay nothing. Period.

This isn't the case with Commercial Open Source vendors. Customers end up paying $30,000.  And they might spend a month or two of expensive IT time trying to get it working before they resort to these Commercial Open Source vendors. 

Remember, this is something they originally thought would get free.  $30,000+ and several months of work is not free.  One might even argue that "Commercial Open Source" is not in-expensive.

Sometimes it's better to pay and get documentation, support, and quality control. IMO, you will end up spending less.

Our own philosophy is that the stuff you need to be open-source, is. 

For example, our User Interface and programmatic interfaces are all Open Source. Supported, customizable Open Source.

Therefore, one shouldn't divide the BI world into two categories.  There are more than the four "commercial open source vendors" and the monolithic mainstream BI vendors who were mentioned. 

In fact, there are plenty of other vendors who can fulfill BI needs very satisfactorily for less cost than Pentaho and Jaspersoft.

Those are good companies. I know some of the founders and they are great people. I can't quite comment on their product quality since Open Source by its nature can easily have random bugs injected and no one is accountable.  

My Point is, if you're shopping for Commercial Open Source, then also open your minds to other vendors too.  The TCO will be about the same and you might have a better experience. Not everyone charges as much as the big guys, although, I wish :)  ...


Cindy Howson on cool things happening in BI

Posted by: ward in Untagged  on

Cindi Howson wrote an article on November 10, 2008  for Intelligent Enterprise
I found the article very informative.  She wrote about a number of things. I took an interest in "In-memory analytics" and "Advanced Visualization".  Following are my comments on the topics she wrote about. 


In-memory analytics

:  One of the potential benefits of in-memory is that it replaces the expensive & time-consuming aspects of reporting databases or cubes. 

When a user has "new" business question, underlying data is created on-the-fly with no IT overhead either to design it, or to maintain it.


Advanced Visualization

:  Most "visualization" technology gets its data directly from a database or cube and suffers the limitations of query technology (and not using a business tier in a 3 tier model): 

* single-pass queries, such as SQL;
* single data source;
* complex SQL & MDX being the gating factor in answering business questions;

Due visualization being based on query technology, the data you're going to discover just isn't going to be that interesting.


Here's my solution

What these products need is a "business layer" sitting on top of the query results. 

Advanced metrics and kpis can be developed in a business layer, without much impact on the database.

If it can be created in-memory, so much the better.

If visualization products can plot it, even better.

Our product acts is an in-memory business layer and can supply data to any BI or visualization tool. We give them useful numbers to display and explore visually.

On top of that, per your previous page in this article, it is highly desirable that the business user can experiment with and author their own metrics and KPIs.  Again, we've solved that.  Come take a look!

 




First steps in developing KPIs & Metrics

Posted by: ward in Untagged  on

Craig Schiff wrote an article at http://www.intelligententerprise.com/showArticle.jhtml?articleID=51201364 some time ago.  Following are some salient points he made:

"Without steady involvement by the business side, IT's dashboard projects are doomed to failure."
"IT thinks in terms of improving access to data, whereas finance focuses on improving its processes (particularly the painful ones)."
"What's most important is the content: that is, what's being measured and displayed on the dashboard."
"confront one of the toughest challenges: How do you get to that short list of agreed-upon key measures?"
"Once you've settled the corporate strategy, chosen the metrics development team, and put a facilitator in place, you can get down to business. The team needs to work down from the strategy to determine the key business drivers."
"the organization may not have the underlying data necessary to perform the calculations to support the KPI. What do you do then? Throw the KPI out? Wrong answer. What most companies do is manually enter the data for the KPI."
"KPIs should offer a good mix of financial as well as operational measures."

I think I've represented some of his key points accurately.  It's enough to tell me (and you) that Craig and I think alike.   His article was written in 2006 and following are my comments on what I think has evolved since then. 

To begin, I've think it's necessary to cut down the reliance on IT in providing the metrics and generating the KPIs because:

  1. Managers like to generate their own information whether it be KPIs or metrics. The key evidence of this is the popularity of Excel. They need this to be able to react to fast changing business needs.  Routing new business questions and performance metrics and indicators through IT is too slow.

  2. He suggested that, if a company isn't tracking the data, entering it by hand is risky and an auditable process won't be likely.  For me, this would be a least preferable approach, but at the time it was written, it was a decent & practical one.

    Instead, I believe there's a very good chance the data is there in the company, it's just that conventional BI tools are limited to what they can do. 

    If you're using a cube, then the data has to be in the one cube or you have to be a grandmaster in MDX to get what you want. If you're using a reporting tool, then the data has to be available from a single SQL query. Relying on this kind of technology and its limitations is just plain in-adequate now that better choices are avaialble. Let's for the moment ignore the fact that they charge way too much money and the IT overhead is a trememendous financial burden.
  3. Users, not IT, need a way to read in data from multiple heterogenous data sources into the same report.

My work at nextanalytics has been addressing this evolving requirement.   I think we've made great gains and will continue to do so in the future.

 













How to Improve Your Scorecards and Dashboards

Posted by: ward in Untagged  on

Most people don't realize just how much work there is in populating the data for a dashboard or scorecard.  Let’s make a list of some of the things that contribute to the problem.

The Query.  Getting the data

To get useful data, SQL is too difficult and expertise is scarce and expensive.  Moreover, SQL can’t do all the things you need. Often, the solution requires extra tables which are usually undesirable.  MDX & Cubes are an alternative, but they are expensive to design, create and maintain especially if the needs are likely to change.

Once the query has been submitted, too much data returns.  The ODBC layer or DataSet Object uses too much memory and causes failures at the presentation layer.

Putting The Data Into The Chart

Now that you have the data, there's too many labels, and large ones affect how the legends and chart labels appear. It is way too easy to get a chart that looks horrible.  Visualizations fail on large number of rows and columns. Overwriting risers, lines, symbols are all quite common.

Most chart package support some  local data manipulation and calculation operations, but they are not integrated with the query, causing a design & implementation burden for the presentation layer developer.

Crosstabs & Datagrid Objects

Just like with charts, too many and large labels take up too much screen real-estate. There’s no room for numbers and you can’t visualize anything about your data. Too much scrolling.

As with chart packages, local math and sort operations are not integrated with the query, and it’s difficult to save the user’s actions to the same state the next time they open the view.

UI Controls  (Combos, listboxes, etc )

Plainly said, these fail given a normal amount of production data.  UI developers resort to hard-coding but slowly changing data (in the business) makes the dashboard a maintenance nightmare.  Metadata mitigates this, but at a great overhead expense .

Three Part Solution

Reduce the volume, provide subsets as a result of analytical workflows, not just arbitrary hierarchy like OLAP provides.  From the user interface, let the users choose an analyzed subset to look at with their scorecard or dashboard.  Alternatively, use analytics to aggregate lower level detail into upper level views but avoid the overhead of a cube or temporary database tables.

Use analytic workflows to improve the content. Pre-calculate patterns, trends, outliers, comparisons, and whatever else suits your business. 

Provide your users with valuable "already analyzed" numbers.  If you do this, you don’t need to chart as much data, and the data being shown is more useful than raw “database data”.  

Avoid the trap of encouraging them to export to Excel, spreadsheets are an entity that requires auditability per Sarbanes-Oxley. Don't get yourself into a legal mess.

Improve your charts and crosstabs by using programmatic techniques to perform text and numeric substitutions.  This makes your visualizations more readable.

Three easy steps. That's all it takes to make scorecards & dashboards more useful.   

Nextanalytics makes these steps easy.  It offers additional processing on query results and transparently provides the analyzed data to presentation layers.  A simple, elegant solution that is fast and easy to implement.

 


Questions to ask before deciding to use a cube for analytics

Posted by: ward in Untagged  on

Is a cube  what you need?

Bill Inmon (aka: the father of the data warehouse) is a world famous expert in data warehousing and a big advocate of cubes, which he calls “Dimensional Modeling”.   Inmon wrote an article published under the title "Issues with Dimensional Modeling" (http://www.dmreview.com/issues/20000501/2184-1.html). 

Despite being an advocate of cubes, he explains at least ten root cause for problems that occur when cubes are in-appropriately.  I’m not allowed to quote him due to a restrictive copyright statement at the bottom of that URL. However, a paraphrased subset of his view seems to be:

  • dimensional modeling fits where the requirements are known before the infrastructure is built.
  • dimensional modeling is not a good candidate for “exploration warehouses.”
  • dimensional modellers have crafted a fine hammer. Now everything looks like a nail to them.

Basically, one cube won’t answer the questions for all people. If you try to deliver a system that does that, you will fail. 

He also says you must take the time to know the specific needs of the user before you start.  Implicitly, if you don’t know what reports people will want, then cubes aren’t the way to go.

To me, this means a cube doesn’t sound right for analysis, at least not my definition of it.  My definition of an analytical workflow is that you ask a question, get the answer and then either revise the question or move on to another one.

In “A Dimensional Modeling Manifesto,” http://www.dbmsmag.com/9708d15.html, Ralph Kimball discusses problems that arise from building fact tables that start at too high a level.  He advocates working at the transaction level.  He states that, if a fact table were designed at a higher level, then it would be difficult to find and maintain the correct dimensional members, the attributes, and facts for the fact table. 

Unfortunately, computing resources and operating systems were relatively less powerful back then (circa 90 to 97) and for that and many other reasons, packing fact tables full of transaction level detail was rarely done.   The cube sizes just got too big when designers followed that mantra. In addition to data volumes being too big for memory, probably the most evident problem were the OLAP Clients and Reporting tools.  They have memory and GUI limitations, so they could not expose a large number of dimensions and fact table data.  Thus, it is necessary to simplify the data, to aggregate multiple physical entities into a single abstract entity.  Summarized data is better than nothing.

Kimball's recommendation of working at the transaction level is a key point that should be re-visited with more modern tools than the current round of OLAP cube clients, ones that don't have the limitations in the GUI or their crude visualization.

So, as volume creeps up, what happens? Who hasn't loaded up production data only to find the tool just crawls along, doesn’t work, or looks terrible.  Is the problem that there's too much data and metadata for GUI tools to render properly? Or is it the tool?  If you have a great hammer, does everything have to look like a nail?

  • With too much metadata, GUI controls operate slowly and with errors, users can’t find the members they are looking for, and the labels get too long.
  • With too much data and metadata, visualizations get cluttered because labels overwrite and there are too many datapoints, colors get re-used, and you can’t discern individual points.

If the size of the data and metadata is going to be a problem, it must be reduced to an acceptable level. Most cube designers therefore have no choice but to aggregate transaction level values to a single cube member.  As-is, this is fine for visualization of a high level view of your business, but it doesn’t necessarily lead to useful or correct analysis of your business.

Life to date historical information, the Root Cause

Cubes will usually contain life to date information.  For example, a fact table must contain every product that’s ever been sold and the product dimension must contain all products ever sold.  If a company is large enough to afford BI, it’s not uncommon to see dimensions with levels that have 100,000 to 500,000 (often more) members.

It's also a good point to mention another intrinsic problem that happens: Slowly Changing Dimensions.  Wiki documents various techniques for dealing with the gradual degradation of cubes that they cause. There are no good answers, just workarounds. Bottom line: In a production world, cubes become larger, and slower.  If you keep the focus of the reports they produce narrow, the effect is less. The more reports you try to produce, the worse it gets.  http://en.wikipedia.org/wiki/Dimensional_database

The Downstream Effect of the Life to Date and Slowly Changing dimensions

The query design parts of the tools don’t have any awareness of the age of the data. Even products that haven’t been sold in ten years show up alongside those that were sold this week. The problem is: GUI controls tends to freeze up when asked to fetch more than 30,000 members.

To help alleviate this, parent child relationships are established. Many potential analytic operations become difficult once members are no longer siblings which represents a challenge either in cube design or necessitates the needs for some very complex cube queries.

If a user is forced to work within an arbitrary parent-child relationship then it’s difficult to find members.  To mitigate this, to make these more self-documenting, many report designers display the hierarchy as part of the name. We have seen 250 character product names where only the trailing few characters are different.  Think for a moment how that appears in a drop down list box or a crosstab or a the legend of a chart.

The visualization experience suffers, because long labels tend to cause overwriting in the UI (notably the charts) and the extra characters reduces the amount of meaningful data on the screen (e.g. extra indenting is required, more text is shown rather than numbers).

This, and for other reasons due to relative superiority of Excel in filtering and calculations, most cube users open a crosstab and immediately export to Microsoft Excel.   Building a cube to feed Excel sounds like a lot of overhead.  If they’re going to end up in Excel, what does that say about the investment you put into the cube and it’s dimensional approach?

Because of aggregation, it’s very easy to completely miss important facts when working with cube data

The business analyst is expected to recognize important changes in data. Let’s consider some scenarios that might reasonably arise in a production environment. Start by assuming you have a member labelled “soda”. Soda is a summary of at least three distinct inventory items. “cola in bottle 10 oz english label,  orange in can 10 oz english label,  orange in can 10 oz spanish label” all get rolled into “3 soda”.

Here’s what might happen, and you’d never even know it:

  1. Suppose that sales of cans takes market share from the bottle packaging.  The analyst will never notice that the shift in purchasing behaviour .  To track cans and bottles separately after the cube is designed and being built is a complicated task and will take some time. They will be chasing the trend long after its happened.
  2. Suppose you had a special promotion with inventory from a new supplier whereby you sold 1000 extra colas but refunded 500. In a cube, you’d see positive change of 500 and you’d report a positive new trend, the organization would order more supplies but what really should be reporting is an issue relating to a new supplier.
  3. Suppose that you sold a 10,000 extra colas and correspondingly fewer orange. That’s an important shift in market share, you’d want to know about it, but through aggregation, you wouldn’t know about it. 

Because of the aggregation of individual transactions to a single “soda”, changes within soda are hidden. An analyst would reasonably conclude “Sales of Soda are basically unchanged” but we know they are anything but unchanged:  

  1. There’s a shift to cans,
  2. Coke market share rose, Sprite dropped, and
  3. you’re refunding a large portion of what you sell because of a new supplier.

The point is: The culprit in the wrong conclusion was the aggregation to a single member. If you had had the foresight and a client tool that could handle the volume, you could've seen those separate transactions as separate members.  Cube technology being what it is, it's not easy to go back and re-engineer the incoming data to fix it after the fact. 

The lesson learned is that aggregation hides the important data but aggregation is a necessary aspect of using cubes. Important anomalies, patterns, and trends that occur at the shop floor level will not be seen by most cube users due to aggregation.     

The OLAP cube vendor will tell you start your analysis by looking from the top down and drilling to detail when you see important values.   This story sounds reasonable, it suggests you can see important business trends after which you can drill to detail relational records.  The problem is this: Those scenarios posed above would not have been flagged by a cube user. There'd be no drill thru to detail because the user would not have known the problem was occurring.

Other points can be made that detract from the strategy of exposing analytics through summarized data:

  • People cannot reliably and consistently see that a pie slice or bar riser in one period is smaller or larger than another period, unless it is extreme.
  • aggregation will often mask changes, whereby the sum of child values net to essentially the same value, even though individually they have experienced dramatic change
  • the sheer difference in the magnitude of a parent’s value can hide the significant changes that occur one of it’s child’s values. (e.g. would a 20% change in Albany change the sales total for New York state?)
  • conclusions made by observing superficial members are not practical and actionable.  What’s the point of the business analyst extending trend lines on summary data when it has known issues in the deriving the final values?
Due to aggregation, detail is rarely in a cube.   It’s only in the detail that will you see real-world outliers, patterns, trends, and anomalies as they happen, not in the summarized data.

Don't Ignore License Cost and Installation 

If you are still considering doing analytics with an OLAP Cube, you will need the cube server license and then you will need to install it. Don’t under-estimate the cost of the license.

If you are OEM or ISV, you will end up with another licensee in your installation and their will erode your profit, year after year.

Also, don't underestimate the magnitude of effort associated with installing their software. Even just installing Microsoft SSAS (from scratch) can take days -  try it!

In my opinion...

The only reasonable time to use a cube is if you already have it built and it has the calculations you want already in it.  But you are still exposed to the risk of not seeing the issues that arise at the layer below your fact table or the level of summarization that you're looking at, so beware.

 


Rapidly Changing Query Environments

Posted by: ward in Untagged  on

For small and medium size businesses, SMB, cubes are too burdensome. 

Mainly, there are too many skill sets required and the risk of being dependent on highly specialized people is too high a business risk and cost.   Even with the expertise, cubes are slow to design and build and the manufacturing overhead is large. 

For example, it can easily take a day just to install the OLAP software, then it can take days to design the dimensional model.  Then building the production processes to populate the cube on an ongoing basis can be a significant risk and expense.  Finally, once you have a cube being built, you need an MDX expert, and these people aren't very common and, after all these years that the industry has had to learn MDX, that says something about how difficult it is. 

Even with successful query results coming back from a cube, you have to figure how to render the results.  There aren't very many good OLAP clients that don't cost a lot, and the simple ones are probably not what you need for a dashboard or scorecard. 

If you're a solution provider or application developer deploying cubes for a living, then you will also have noticed that it's difficult to design and build cube related solutions remotely, so travel costs are higher than they need be, all detracting from the profitability of the solution.  

In short, I am suggesting that cube related costs typically overwhelm the cost-beneifit of most applications and many business needs are simply deferred or reduced in scope.

All in all, cubes involve quite a bit of effort and time.  If your original business questions change dramatically, it's all wasted. You may not need the cube anymore, but you may not able to drop the cube either, so your disposable IT budget is permanently eroded.

The net benefit of cubes are an almost factorial combinations of crosstabs.  But, in the same consideration, much if not most business information is now being delivered in a scorecard or dashboard.  Since a high degree of interactivity and user education is mutually exclusive with the goals of a dashboard or scorecard, I really don't see a benefit to having access to such a large number of crosstabs.  

In short, the benefit (of having lots of crosstabs) over cost of a cube (design, implementation and maintenance) isn't there.

 


Great Alternatives to Cubes

Posted by: ward in Untagged  on

OLAP isn't going away, but it is time to stop equating the word OLAP with cubes.  OLAP means high performance, optimized for fast retrieval. There's an increasing number of alternatives to delivering OLAP.  Search for keywords such as  BI appliances, cloud hosted BI, column oriented databases, in-memory analytics.  You'll find plenty of good quality products including in our partner list.

Most of them are less complex and less expensive than cubes yet they deliver on original promise of OLAP but rarely use cube technology.  How is this possible?  Because of the evolution to low cost memory and high speed large volumes of disk space, the data processing algorithms were able to improve.

It's time to stop equating the word Cube with the word OLAP. OLAP is alive and well, but cubes are not the only way to deliver OLAP. 

nextanalytics delivers high performance, better information, and is less expensive to implement.  It's better and more flexible than cube technology.


Can SMB afford Open Source BI or Traditional BI?

Posted by: ward in Untagged  on

I was looking for statistics on BI in the Small and Medium Sized (SMB) market today. It’s surprisingly hard to find material. This blog documents some of my findings and makes a few comments.

First, a Definition of SMB. Techtarget suggests (http://searchcio-midmarket.techtarget.com/sDefinition/0,,sid183_gci1005201,00.html) anyone with less than 1,000 employees is medium sized or smaller. 

Based on my anecdotal experience, one could assume that if you have 1,000 employees, you could make $100M a year in revenue and 100 employees could make $10M in revenue.

If you have $100M in revenue, it’s likely that your annual IT budget is about $3M.  $500k of that might be BI, probably less, but not all of that is money spent on BI software itself, you also have to have your IT staff and the equipment.   I read that somewhere but I can’t find the URL anymore. I think it might have been something O'Reilly wrote (If I find it, I'll add it as a comment).   

If you really wanted to see the benefit of general purpose BI, then perhaps a 10% of your users would need BI.  The rest could probably read the static copies of the report in a web browser or spreadsheet, so I'll ignore those in this discussion.

Can you afford BI?

In 2008, the price for end user BI has been driven down.

Let’s consider a product recently released by SAP/Business Objects.  CrystalReports 2008 costs $495 for the entry level, $895 for the ability to create, and $995 for the ability to integrate with Xcelsius.  If someone is outfitting 100 users, they are going to spend $50,000 just for the software.  I don’t know if that includes all the BI infrastructure, somehow I think not.  I am going to look into that and might report my findings in a separate blog entry (this blog entry is supposed to be about money).  We don’t know what the fees are to install and run it.

I don't think Cognos is so public about their pricing. If I can find a reference, I'll add it as a comment to this (or maybe someone else can).

We can estimate the support and maintenance costs by looking at what Open Source vendors are charging.   They market their products as Open Source, so theoretically the software being referred to in the following quote is free: “JasperSoft charges $25,000 for a year's worth of licensing and support, according to Gentile, while Pentaho's Open BI Suite starts at $30,000 per year. Actuate's Business Report Studio goes for $695 per user per year.” 
(http://searchdatamanagement.techtarget.com/news/article/0,289142,sid91_gci1307929,00.html)

This means it’s probably a good general purpose number to add to the cost of a Business Objects (BOBJ) or Cognos (COGN) estimate if you wanted to guess at the labour and support costs. It's ignoring the differences in complexity, but at the end of the day, it's probably similar complexity to install CrystalReports 2008 versus Pentaho or Jaspersoft.

In conclusion, to get the software in and installed, it looks like you’re going to spend $100,000 for 100 users, in other words, $1,000 per user. 

Software Licenses + Support and Installation is not the Whole Cost !!! 

But, remember, getting the software is just the start!!!  Per my recent blog: http://www.nextanalytics.com/MyBlog/MyBlog/Can-a-business-intelligence-product-be-used-to-answer-analytic-questions.html, each time you build a report or analytic, you will have to face upfront and ongoing IT costs.  These are the costs that will drive your cost of BI up each time you try to build a report or analytic.

The goal for at least SMB is to reduce the cost of each new report or analytic, especially if it is not one you need to run every day or if it is likely to change in the foreseeable future. 

 

To deliver something in BI,  one has to write a SQL query or create a cube.    But doing this is actually a two step process.  

1.       you must find the data and the relationships between tables, if they exist. 

2.       the data usually needs filtering, grouping, and calculations and other things that change it from a list of data, to something that you make decisions with.

Maybe the SQL or MDX can do it, but more than likely, it can’t.  If it can’t, then you must design intermediate tables and views and run a series of processes to populate them.   If you’re going for a cube, a cube needs to be built or an existing one modified, and this might even be on top of those intermediate tables and views.  

The costs of doing all this can be significant, but that’s just a start.  The IT deparment will never know when it can drop support for these tables, views, and cubes so it’s probably going to be a permanent investment.

That’s the crux of the problem:  The nature of analytics often means that the need evolves.  It’s either because the analytics identified the problem or the analytic led to other questions.  By their nature, analytics are iterative and sequential. For example:

  • I want to see the fastest growing products but it's more complicated than just telling me one measurement.
  • for each region, I want to see which products were growing faster than the rolling six month average, at least six times out of the last N periods.
  • next, again for each region, of those products that were just discovered, which ones grew by more than twenty five percent from the start of the period to the end of the period.
  • I want to know the intersection of those two sets of products across all regions
  • By the way, I won’t need these questions answered until next year at this time, although I might change them a bit.

That’s an example of one analytic combined with another, iteratively and sequentially. Neither of which would be possible in MDX or SQL without the creation of a complex underlying data structure.  Notice that this isn’t simply an OLAP  “drill down” or a “slice and dice”. These are distinct questions, with the answers needing to be merged and intersected.  This kind of query would take only a few hours in nextanalytics, or a few months in a BI tool.

The problem for the company who wants to use BI is this: Once those products have been identified, the analytic request isn’t needed for another year.   When the request comes through next year, it might be a very different set of questions, so the IT money that’s been spent is wasted. It might even have become part of the infrastructure and can’t be removed without great expense or risk. In the long run, an IT budget is eroded by most analytic requests unless they are the trivial sort you can get from an existing OLAP cube.

The real question is: Why incur such a significant up front and ongoing cost for something that isn’t regularly needed and might need dramatic changes the next time it’s requested?   In my opinion, a business intelligence approach is not really a very good fit for analytics, unless it is very clear that the analytic question is permanent and static.  Even then, it’s entirely possible that the implementation and infrastructure cost is too high to justify many early stage analytic requests where they're not even sure where the questions will lead yet.

Upcoming blogs will consider the roles of dashboards, charting components, spreadsheets, and writing analytic software from scratch.


Improve your data access

Posted by: ward in Untagged  on

It Starts By Improving On the Data Access Layer

Many reporting and analysis products rely on SQL and MDX to obtain their data. In the context of delivering analytics, these query languages can get extremely complicated and, even if the syntax does exist to deliver the result, it might require changes to the database or datawarehouse or for a new cube to be designed and maintained. Any of these mitigations raise the cost and risk of the solution.

The root cause is that many analytics require sequential passes, multi-datasource data access, and iterative processing. In some situations, Excel is used to mitigate, but it is not a strong solution because formulae and ranges and relationships between data can be easily broken due to dynamic nature of production data and business in general.

In contrast, nextanalytics offers the ability to design "simple" SQL or MDX queries, ones that don’t need a lot of training and expertise, and certainly won’t require changes to a database or to build a complex relationship with an Excel workbook. Then, nextanalytics follows up with scripts to perform the requisite sequential, iterative, and multi-data source steps. This saves a great deal of complexity with producing dashboards that have analytics in them, and makes many projects feasible which otherwise wouldn’t be.

There are over 100 commands that can be put together in almost any combination. These are stored in text based script files, playable on demand. These can be parameterized to give the end user some degree of control over what the operations do. The scripts can be run in real-time or in batch and can read and write data in a wide variety of ways. In fact, since we also offer a unix version, the scripts can do a data processing on one server and send full or partial subsets to another platform.

nextanalytics is multi-page, which means that each operation you perform is still resident and can be rendered or cached. Pages are referenceable objects, which means they serve as input to the next command. This is how we deliver on the ability to be iterative and sequential. An analogy would be that pages are similar in concept to worksheets. Each time you issue a nextanalytics command, a new "sheet" is created and it is automatically the input to the next command.

And, an important point: nextanalytics is multi-datasource. It can load a CSV file as well as a database or cube query, all at the same time, each one in its own page. And then the result sets can be compared. This has great advantage when you want to visualize a comparison between a budget spreadsheet and actual values.


<< Start < Prev 1 2 Next > End >>
read.png 
explore.png
discuss.png