# Assignment 6

Modified: 2014/09/17 11:56 by admin - Uncategorized
# The Task: Mining from a real-world data set (1)

In this assignment, you are asked to participate in an ongoing real-world data mining competition held by kaggle (a place where you can compete with world-class data miners on real data mining applications and at the same time possibly earn a handsome prize). This year, you are asked to design and implement data mining methods to solve the following problem:

Recognize users of mobile devices from accelerometer data, \$5,000 (prize)

You can click the title above to get all the information you need to accomplish the competition, including how to register, how to get the data, how to evaluate your method. Nonetheless, we give a brief guidance for the newcomers of kaggle:
• Find the competition homepage here, where you should read the description, evaluation, rules, prize and all the information you need to know.
• Then use any methods you have learned to build a model from training examples, and then use this model to predict the test samples, and submit the prediction file here to evaluate your result and get your rank almost instantaneously which will be shown in a leaderboard.
• Modify the Team Name (the name shown in the leaderboard, not your account name) in My Team as your student number, such as

• Participate in this competition and get higher rank as you can.
• Write a report to describe your methods and implementation.
• Submit your code, as well as the prediction file.

NOTES:
• Please use this MSWord template to write your report in Chinese with English abstract
• Do NOT plagiarize, plagiarism will be seriously penalized: You should be careful on writing your report. Whenever you are using words and works of others, citations should be made clear such that one can tell which part is actually yours. Details about how to identify a plagiarism can be found in "Introduction to the Guidelines for Handling Plagiarism Complaints".

# Submission

• Pack all the files needed to be submitted, e.g., report.docx, code.zip, prediction_on_test.csv. The prediction_on_test.csv should be in same format as sampleSubmission given by kaggle.
• Name this pack using your student ID, e.g., 'MG1333001.zip'.

The file format should be zip, no other format is acceptable.
NO submission after the deadline is acceptable!
NO email submission is acceptable!

ftp://lamda.nju.edu.cn/mg_dm13/assignment6/

# Evaluation

For rank:
We will get your absolute rank (the rank over all the participants, p1) and relative rank (the rank over all the students from this course, p2) from the leaderboard, say, r1 and r2, then your final score for the rank part is given by:

 $\text{Score}_{\text{rank}} \propto \frac{1}{2}(\frac{\sum_{p \in p_1}I(r_{p}>r_1)}{|p_1|-1}+\frac{\sum_{p \in p_2}I(r_{p}>r_2)}{|p_2|-1})$

where I is an indicator function and I(condition)=1 if condition is true, 0 otherwise.

For report:
Technique: clearly explain why you choose such method, how you implement the method, and how the method perform on this data mining task
Language: concise, precise, and logical.
Organization: good structure, clearly and properly separated sections and paragraphs.
Citations: all works of non-yourself should have correct references.

If plagiarism is identified, no scores will be given to this report.

# Leaderboard (final version, 2013-11-20 12:45)

# Presentation

About 5 submissions will be selected and presented (by the author) in the class.
MG1333011
MG1333023
MG1333033
MG1333062
MG1333069

# Contact TA

Mr. Sheng-Jun Huang and Mr. Qing Da

