What you are describing is a challenge that R&D has been wrestling for a long time. Let me try to describe what the problem is from the perspective of the internal workings of the product:
All data sets in Datawatch Panopticon are organized into hierarchic data trees in memory as soon as the data set is applied to a visualization.
The data tree is capable of getting re-created and changed often - that happens each time someone changes a breakdown or drills or changes the visible detail level on a breakdown. For each data set and each data tree, there is no meta data at all; this means, there is no information stored about how many items you have of a certain value, what distribution you have in a numeric column and so on. The reason for this is that everything in the product is engineered to cope with continuously updating, streaming data, where aggregate values are calculated on-the-fly for each level of the data tree.
And as a consequence of that, there is no record or "awareness" about how many times a particular value occurs in a data tree, for example the value "Customer 1". That is why a Count of values must be based on a numeric column, instead of just counting values in a text column. The data model knows about the number of rows in the original data set, and the number of nodes in the data tree. Each row in the data set is associated with a value for each numeric column on that row, and each node in the data tree is associated with the aggregate values of numeric columns of all rows that are part of that node and its child nodes.
So, when C1 is a child node of both R1, R2, and R3, it will inevitably contribute with a count of 1 in all three places, since it causes a data tree node to be created in all three places. And putting Client before Region in the breakdown (in the data tree) makes no difference.
The closest thing that we have been able to create, is an aggregation method called Combinations. It works like Count, with this difference:
Count will count the number of rows in a numeric column of a data set
Combinations will count the number of nodes at the most granular level of the tree, i.e. the number of leaf nodes in the tree.
This means that if you use a data set like this:
and put ONLY Client on the breakdown, you will see that:
Aggregation method COUNT applied to a numeric column gives a total of 12 (the number of rows)
Aggregation method COMBINATIONS applied to a numeric column gives a total of 9 (unique values in Client column)
However, as soon as you put Region into the breakdown as well, you will be splitting up the previous single tree node for C1 and C2 into additional nodes (3 for C1 and 2 for C2), which means the number of leaf nodes grow from 9 to 12, which is equal to the number of rows in the data set, so COUNT = COMBINATIONS at that point.
So, the short answer is No, you can't do Count Distinct on a text column in Datawatch Designer, but I hope my response makes it understandable why that is so.
Best regards, Theo Klemming
Problem is that this solution can only work if we have only one counting column in the table.
I would have to precise I need to have several count(unique) columns in the same table, for example :
- # of clients
- # of new clients
- # of closed clients
It seems "Combination" will give always the same count in all columns like this.
Exactly Theo the data seem like this :
But can be written like this too with Boolean :
With that data, and a breakdown containing only the Client column, you would get a correct values for Closed and New if you aggregate them as MEAN. You would use one or the other and aggregate as COMBINATIONS and name it "Count".
HOWEVER: This will only look right for each row in a visual table; the Grand Total summary will be wrong for Closed and New since it will show the over all mean. What you would want to do here is aggregate as a mean for each value of Client, but then aggregate all the mean values as a sum across all clients. That would mean using different aggregation formulas on different levels in the same tree, and you can't. Further, calculated columns are no help here, since they can only access values on one and the same row.
I suppose what all this boils down to is that it is difficult to get correct counts for unique entities in the data if each unique entity exists more than once in the data set. If the product had supported a count distinct on text columns while regarding an entire multi-level data tree, then the count distinct part would be covered for any kind of breakdown, but at the moment it doesn't.
It is also difficult to device a work-around for Count Closed and Count New, due to the requirement to apply the same aggregation method from top to bottom of a data tree. I actually can't see how the Count Closed and Count New would ever be done, as long as each Client exists on more than 1 row in the data set, without using mean aggregation up to one point in the breakdown data tree, and then sum for everything beyond that point down to the root. But that is not available functionality.
The last thing I can recommend that you take a look at is to use External Aggregates, which means that you provide the aggregate values for each measure, for each level in the breakdown, on separate rows in the data set. Please read more on this page:
Best regards, Theo