# Assignment 7

Modified: 2015/01/14 02:43 by admin - Uncategorized
We create a Google Group for the discussions. If you have any question, please post it there so everybody can see our reply, and try not to send us your questions directly via emails.

# The Task: Participating in the Kaggle Competition

In this assignment, you are asked to participate in an ongoing real-world data mining competition held by kaggle (a place where you can compete with world-class data miners on real data mining applications and at the same time possibly earn a handsome prize). This year, you are asked to design and implement data mining methods to solve the following problem:

Click-Through Rate Prediction, \$15,000 (prize)

You can click the title above to get all the information you need to accomplish the competition, including how to register, how to get the data, how to evaluate your method. Nonetheless, we give a brief guidance for the newcomers of kaggle:
• Find the competition homepage here, where you should read the description, evaluation, rules, prize and all the information you need to know.
• Then use any methods you have learned to build a model from training examples, and then use this model to predict the test samples, and submit the prediction file here to evaluate your result and get your rank almost instantaneously which will be shown in a leaderboard.
• Modify the Team Name (the name shown in the leaderboard, not your account name) in My Team as your student number, such as

• Participate in this competition and get higher rank as you can.
• You are encouraged to use the codes you have already written in previous assignments. (5% rewards)
• Write a report to describe your methods and implementation.
• Submit your code, as well as the prediction file.

# Data

We have downloaded these data for you, which you can find at ftp://lamda.nju.edu.cn/mg_dm14/Data/

# Submission

• Do NOT plagiarize, plagiarism will be seriously penalized: You should be careful on writing your report. Whenever you are using words and works of others, citations should be made clear such that one can tell which part is actually yours. Details about how to identify a plagiarism can be found in "Introduction to the Guidelines for Handling Plagiarism Complaints".
• Do NOT falsify results, data fraud will be even more seriously penalized: You should honestly record your results in the report, NEVER EVER modify the performance results manually.
• Pack your report , code and Submission.csv into a zip file named with your student ID, e.g., 'MG1433001.zip'. If you have multiple submissions, add an extra '_' with a number, such as 'MG1433001_1.zip'. We will use the the version with the largest number as your final submission.

• The file format should be zip, no other format is acceptable!
• NO submission after the deadline is acceptable!
• NO email submission will be accepted!

ftp://lamda.nju.edu.cn/mg_dm14/assignment7/

# Evaluation

For rank:
We will get your absolute rank (the rank over all the participants, p1) and relative rank (the rank over all the students from this course, p2) from the leaderboard, say, r1 and r2, then your final score for the rank part is given by:

 $\text{Score}_{\text{rank}} \propto \frac{1}{2}(\frac{\sum_{p \in p_1}I(r_{p}>r_1)}{|p_1|-1}+\frac{\sum_{p \in p_2}I(r_{p}>r_2)}{|p_2|-1})$

where I is an indicator function and I(condition)=1 if condition is true, 0 otherwise.

For report:
Technique: clearly explain why you choose such method, how you implement the method, and how the method perform on this data mining task
Language: concise, precise, and logical.
Organization: good structure, clearly and properly separated sections and paragraphs.
Citations: all works of non-yourself should have correct references.

If plagiarism is identified, no scores will be given to this report.

# Contact TA

Mr. Qing Da and Mr. Yue Zhu

