I am having data like this
Studid Date Perf
001 01/01/2008 90
001 02/01/2008 89
001 03/02/2008 91
002 01/01/2008 75
002 02/01/2008 79
002 03/02/2008 69
I gave Perf as PREDICT. When I use the
"SELECT * FROM [Cluster_Model]"
Query I am getting
Perf
82.
Can anyone help me how clustering works? and how to write a Query to group the values here based on StudId?
Based on the query result, it seem that your model uses Perf as a Predictable column.
First, how to solve the clustering problem:
Make sure the Studid column is used as an Input attribute (and not a key). If you only want to use Studid to build clusters, then ignore all other columns.
Once the model is trained, the cluster of a new data point can be determined with a query like:
SELECT Cluster() FROM [Cluster_Model] PREDICTION JOIN <new data>
Note that Cluster() is a function computed by the model on top of new data, and not a predictable variable.
If you need to see the distance from the new data point to all the clusters, then the query should look like:
SELECT PredictHistogram( Cluster() ) FROM [Cluster_Model] PREDICTION JOIN <new data>
Now, here is what the query does (and explanation of the results):
The query executes a prediction, based on the model, against new data (general DMX syntax, not related to clustering yet). The prediction is executed for all the predictable attributes of the mining model (*), i.e. Perf and the new data is empty (all attributes have the missing state).
Now, here is how Clustering prediction works: the algorithm computes the distances from the input data point to all the clusters, then predicts the target attribute by using a weighted average between the distributions of the target attribute across all the clusters.
If you want to cluster by Studentid only, but still make predictions for Perf (based on the distribution of Perf in the clusters), then make sure that Studentid is Input and Perf is Predict Only
Hope this helps
No comments:
Post a Comment