§
Collaborative filtering
You can describe the basic concept of collaborative filtering in one sentence:
This means that if two people liked the same items multiple times in the past, they will probably both like the same items in the future. So the algorithm would work by storing the review of each user on each item (0 if the user didn’t rate the item). And to recommend items to user A, we will find other users, like B and C, that are similar to A and recommend what B and C liked to A. When I say similar I mean there is some function simil(A, B) representing the similarity rate between users A and B. So the probable user u rating of the i-th item is an aggregation of some similar users' rating of the item:
There are a lot of different aggr functions. The Pearson correlation and vector cosine-based similarity are the most popular. I will only define these functions:
1) Cosine Similarity
2) Pearson correlation
Here A and B are some rx and ry, respectively. To keep everything simple, I will be using the Consine Similarity. Let's go through an example to see how collaborative filtering works. In this example, ratings range from 1 to 10.
| User 1 | User 2 | User 3 | User 4 | User 5 | User 6 | User 7 | User 8 | User 9 | User 10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Item 1 | 5 | 2 | 0 | 0 | 9 | 5 | 2 | 0 | 0 | 9 |
| Item 2 | 10 | 0 | 7 | 2 | 1 | 10 | 0 | 7 | 2 | 1 |
| Item 3 | 0 | 0 | 5 | 8 | 0 | 0 | 0 | 5 | 8 | 0 |
| Item 4 | 1 | 9 | 8 | 0 | 0 | 1 | 9 | 8 | 0 | 0 |
| Item 5 | 0 | 6 | 0 | 0 | 10 | 0 | 6 | 0 | 0 | 10 |
| Item 6 | 10 | 10 | 4 | 0 | 8 | 10 | 10 | 4 | 0 | 8 |
| Item 7 | 0 | 3 | 2 | 5 | 1 | 0 | 3 | 2 | 5 | 1 |
| Item 8 | 0 | 3 | 2 | 5 | 1 | 0 | 3 | 2 | 5 | 1 |
| Item 9 | 0 | 3 | 2 | 5 | 1 | 0 | 3 | 2 | 5 | 1 |
| Item 10 | 0 | 3 | 2 | 5 | 1 | 0 | 3 | 2 | 5 | 1 |
| Item 11 | 0 | 3 | 2 | 5 | 1 | 0 | 3 | 2 | 5 | 1 |
| Item 12 | 0 | 3 | 2 | 5 | 1 | 0 | 3 | 2 | 5 | 1 |
The goal is to recommend some items to user 1. First, calculate the cosine similarity of all the users and user number 1.
| Similarity to User 1 | |
|---|---|
| User 1 | UNKNOWN |
| User 2 | UNKNOWN |
| User 3 | UNKNOWN |
| User 4 | UNKNOWN |
| User 5 | UNKNOWN |
| User 6 | UNKNOWN |
| User 7 | UNKNOWN |
| User 8 | UNKNOWN |
| User 9 | UNKNOWN |
| User 10 | UNKNOWN |
Here 1 represents complete similarity and 0 no similarity. How using this information, can we predict ru, i? One trendy function for doing so is:
Where U denotes the set of top N similar users to u.
| Predictions for User 1 | |
|---|---|
| Item 1 | UNKNOWN |
| Item 2 | UNKNOWN |
| Item 3 | UNKNOWN |
| Item 4 | UNKNOWN |
| Item 5 | UNKNOWN |
| Item 6 | UNKNOWN |
| Item 7 | UNKNOWN |
| Item 8 | UNKNOWN |
| Item 9 | UNKNOWN |
| Item 10 | UNKNOWN |
| Item 11 | UNKNOWN |
| Item 12 | UNKNOWN |
Now to recommend items, just look at the items with the most significant predictions that aren't viewed.
§
Links
https://github.com/MrMineev/Different-Recommendation-Systems
https://en.wikipedia.org/wiki/Cosine_similarity
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient