Quick Search

# Assignment 2

Modified: 2014/09/28 12:27 by admin - Uncategorized
We create a Google Group for the discussions. If you have any question, please post it there so everybody can see our reply, and try not to send us your questions directly via emails.

Edit

Description:
• Implement the Stochastic Gradient Descent for Logistic Regression (Named as SGDLR)
• The log-likelihood of Logistic Regression in binary classification case can be written as:
 $l(\beta)=\sum_{i=1}^N\{y_i\beta^\top x_i-log(1+e^{\beta^\top x_i})\}, y_i \in \{0,1\}$

• The gradient of the log-likelihood is
 $\dfrac{\partial l(\beta)}{\partial \beta}&=\sum_{i=1}^N x_i(y_i-p(x_i;\beta)),$

where
 $p(x_i;\beta)=\dfrac{exp(\beta_0 + \beta_1^\top x_i)}{1+exp(\beta_0 + \beta_1^\top x_i)}.$

• A detailed introduction to Stochastic Gradient Descent for Logistic Regression can be found here
• The task is a multiclass classification problem, so you have to extend the binary classifier to the multiclass case. To achieve this goal, there are 2 strategies, "one vs. rest" and "one vs. one". Detailed information can be obtained here. Besides, Logistic Regression can be applied in multiclass case directly, and if you are interested in it, have a try!
• Conduct 10 fold cross validation on the benchmark data used in Assignment 1 by SGDLR, report the mean and standard deviation of accuracy
• Write a brief report to show your results, and also compare your results with your naive Bayes solution, which one is better?

Edit

## Benchmark Dataset

The same as Assignment 1.

Edit

## Programming Language

• The choice you have made in the first assignment

Edit

# Submission

• Do NOT plagiarize, plagiarism will be seriously penalized: You should be careful on writing your report. Whenever you are using words and works of others, citations should be made clear such that one can tell which part is actually yours. Details about how to identify a plagiarism can be found in "Introduction to the Guidelines for Handling Plagiarism Complaints".
• Do NOT falsify results, data fraud will be even more seriously penalized: You should honestly record your results in the report, NEVER EVER modify the performance results manually.
• Pack your report and code into a zip file named with your student ID, e.g., 'MG1433001.zip'. If you have multiple submissions, add an extra '_' with a number, such as 'MG1433001_1.zip'. We will use the the version with the largest number as your final submission.

• The file format should be zip, no other format is acceptable!
• NO submission after the deadline is acceptable!
• NO email submission will be accepted!

ftp://lamda.nju.edu.cn/mg_dm14/assignment2/

Edit

# Evaluation

For implementation :
• Efficiency
• Performance
• Code style

For report:
• Technique: clearly explain all the component you used in your implementation
• Language: concise, precise, and logical.

If plagiarism is identified, no scores will be given to this report.

Edit

# Contact TA

Mr. Qing Da and Mr. Yue Zhu

Back to assignment homepage
Back to course homepage