Will customer buy the product?

Recommending Items to Users

Collaborative Filtering is a popular technique that is used for coming up with User-Item recommendations. In this example, we will use the list of which users bought which items and build an affinity score between items. When a customers buys an item, we use this affinity score to recommend other items to the customer

In [44]:
import pandas as pd
import numpy as np
from __future__ import division

Loading the Dataset

The source data set contains in each row a User ID and an Item ID. This is the list of users and the items they bought, one line at a time

In [4]:
user_ItemData = pd.read_csv('ratings.csv')
user_ItemData.head(5)
Out[4]:
userId ItemId
0 1001 5001
1 1001 5002
2 1001 5005
3 1002 5003
4 1002 5004

Building the Affinity Score

For building the affinity score, we can use out-of-the-box open source and commercial collaborative filtering libraries. In this example though, we are going to write a simple algorithm to generate user / item affinities.

In [8]:
user_ItemData.ItemId.unique()
Out[8]:
array([5001, 5002, 5005, 5003, 5004], dtype=int64)

There are 5 unique items. Let’s build an affinity score for each pair of these 5 items.

In [12]:
# Get the list of unique items
item_List = list((user_ItemData.ItemId.unique()))
item_List
Out[12]:
[5001, 5002, 5005, 5003, 5004]
In [28]:
# get count of user
userCount = len(list(user_ItemData.userId.unique()))
userCount
Out[28]:
5
In [45]:
#Create an empty data frame to store item affinity scores for items.
itemAffinity = pd.DataFrame(columns=['item1', 'item2', 'AffinityScore'])
rowCount=0

for idx1 in range(len(item_List)):
    
    #Get the list of users who got this item
    item1Users = user_ItemData[user_ItemData.ItemId == item_List[idx1]]['userId'].tolist()
    print 'item1', item1Users
    
    #Get item 2 - items that are not item 1 
    for idx2 in range(idx1, len(item_List)):
        
        if ( idx1 == idx2):
            continue
        
        item2Users = user_ItemData[user_ItemData.ItemId == item_List[idx2]]['userId'].tolist()
        print 'item2', item2Users
        
        #Find score. Find the common list of users and divide it by the total users.
        commonUsers = len(set(item1Users).intersection(set(item2Users)))
        print 'Common Users are -->', commonUsers    
        #Calculate the score
        #print (commonUsers, 'common users and', userCount, 'User counts')
        score = commonUsers/userCount
        print 'Score is ', score
        
        #Add a score for item 1, item 2
        itemAffinity.loc[rowCount] = [item_List[idx1],item_List[idx2],score]
        rowCount +=1
        #Add a score for item2, item 1. The same score would apply irrespective of the sequence.
        itemAffinity.loc[rowCount] = [item_List[idx2],item_List[idx1],score]
        rowCount +=1
        
#Check final result
itemAffinity.head()
        
item1 [1001, 1003, 1004]
item2 [1001, 1003, 1005]
Common Users are --> 2
Score is  0.4
item2 [1001, 1002, 1004]
Common Users are --> 2
Score is  0.4
item2 [1002]
Common Users are --> 0
Score is  0.0
item2 [1002, 1004, 1005]
Common Users are --> 1
Score is  0.2
item1 [1001, 1003, 1005]
item2 [1001, 1002, 1004]
Common Users are --> 1
Score is  0.2
item2 [1002]
Common Users are --> 0
Score is  0.0
item2 [1002, 1004, 1005]
Common Users are --> 1
Score is  0.2
item1 [1001, 1002, 1004]
item2 [1002]
Common Users are --> 1
Score is  0.2
item2 [1002, 1004, 1005]
Common Users are --> 2
Score is  0.4
item1 [1002]
item2 [1002, 1004, 1005]
Common Users are --> 1
Score is  0.2
item1 [1002, 1004, 1005]
Out[45]:
item1 item2 AffinityScore
0 5001.0 5002.0 0.4
1 5002.0 5001.0 0.4
2 5001.0 5005.0 0.4
3 5005.0 5001.0 0.4
4 5001.0 5003.0 0.0

Recommending Items

Let us say that a customer bought an item 5001. We can query this data frame for item1-5001 and get items2 with score in desending order. This is the item list that you can recommend to the user in that order.

In [54]:
searchItem = 5001
reco_list=itemAffinity[itemAffinity.item1 == searchItem][['item2', 'AffinityScore']]\
.sort_values('AffinityScore',ascending=False)

print 'Recommedations for ', searchItem, 'are', list(reco_list.item2)
Recommedations for  5001 are [5002.0, 5005.0, 5004.0, 5003.0]
In [ ]:

 


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s