Skip to content

Commit

Permalink
Merge pull request apache#156 from cristiancrc/patch-2
Browse files Browse the repository at this point in the history
Update evaluation.html.md.erb
  • Loading branch information
pferrel committed Oct 12, 2015
2 parents bad760a + 4805eb0 commit e8e0550
Showing 1 changed file with 14 additions and 14 deletions.
28 changes: 14 additions & 14 deletions docs/manual/source/templates/recommendation/evaluation.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ mandatory parameter,
2. the `EngineParamsGenerator`, it contains a list of engine params to test
against.
The following command kickstarts the evaluation
workflow for the classification template.
workflow for the classification template (replace "org.template" with your package).

```
$ pio build
...
$ pio eval org.template.recommendation.RecommendationEvaluation \
org.template.recommendation.EngineParamsList
$ pio eval org.template.RecommendationEvaluation \
org.template.EngineParamsList
```

You will see the following output:
Expand Down Expand Up @@ -100,7 +100,7 @@ Metrics:

The console prints out the evaluation meric score of each engine params, and finally
pretty print the optimal engine params. Amongs the 3 engine params we evaluate,
The second yeilds the best Prediction@k score of ~0.1521.
the best Prediction@k has a score of ~0.1521.


## The Evaluation Design
Expand All @@ -109,7 +109,7 @@ We assume you have read the [Tuning and Evaluation](/evaluation) section. We
will cover the evaluation aspects which are specific to the recommendation
engine.

In recommendation evaluation, the raw data is a sequence of known rating. A
In recommendation evaluation, the raw data is a sequence of known ratings. A
rating has 3 components: user, item, and a score. We use the $k-fold$ method for
evaluation, the raw data is sliced into a sequence of (training, validation)
data tuple.
Expand All @@ -126,7 +126,7 @@ using the known rating of a user.
There are multiple assumptions we have to make when we evaluate a
recommendation engine:

- Definition of 'good'. We want to quantity if the engine is able to recommend
- Definition of 'good'. We want to quantify if the engine is able to recommend
items which the user likes, we need to define what is meant by 'good'. In this
examle, we have two kinds of events: 'rate' and 'buy'. The 'rate' event is
associated with a rating value which ranges between 1 to 4, and the 'buy'
Expand All @@ -138,7 +138,7 @@ above the threshold is considered 'good'.
data contains rating for all user-item tuples. In contrast, of a system containing
1000 items, a user may only have rated 20 of them, leaving 980 items unrated. There
is no way for us to certainly tell if the user likes an unrated product.
When we examinte the evaluation result, it is important for us to keep in mind
When we examine the evaluation result, it is important for us to keep in mind
that the final metric is only an approximation of the actual result.

- Recommendation affects user behavior. Suppose you are a e-commerce company and
Expand All @@ -158,7 +158,7 @@ behavior is homogenous.

In MyRecommendation/src/main/scala/***Engine.scala***,
we define the `ActualResult` which represents the user rating for validation.
It stores the list of rating in the validation set for a user.
It stores the list of ratings in the validation set for a user.

```scala
case class ActualResult(
Expand All @@ -168,9 +168,9 @@ case class ActualResult(

### Implement Data Generate Method in DataSource

In MyRecommendatin/src/main/scala/***DataSource.scala***,
In MyRecommendation/src/main/scala/***DataSource.scala***,
the method `readEval` method reads, and selects, data from datastore
and resturns a sequence of (training, validation) data.
and returns a sequence of (training, validation) data.

```scala
case class DataSourceEvalParams(kFold: Int, queryNum: Int)
Expand Down Expand Up @@ -292,7 +292,7 @@ to determine what the candidates know.
A good metric should be able to distinguish the good from the bad.

A way to define relevant is to use the notion of rating threshold. If the user
rating for an item is higher than certain threshold, we say it is relevant.
rating for an item is higher than a certain threshold, we say it is relevant.
However, without looking at the data, it is hard to pick a reasonable threshold.
We can set the threshold be as high as the maximum rating of 4.0, but it may
severely limit the relevant set size, and the precision scores will be close to
Expand Down Expand Up @@ -338,12 +338,12 @@ We have two lists of parameters (lines 2 to 3): `ratingThreshold` defines what r
and `k` defines how many items we evaluate in the `PredictedResult`.
We generate a list of all combinations (line 11).

These metrics are expecified as `otherMetrics` (lines 9 to 11), they
These metrics are specified as `otherMetrics` (lines 9 to 11), they
will be calculated and generated on the evaluation UI.

To run this evaluation, you can:

```
$ pio eval org.template.recommendation.ComprehensiveRecommendationEvaluation \
org.template.recommendation.EngineParamsList
$ pio eval org.template.ComprehensiveRecommendationEvaluation \
org.template.EngineParamsList
```

0 comments on commit e8e0550

Please sign in to comment.