News
- [7 March] The exam calendar
- [5 March] UPDATED: Calendar for the project 3 presentations
- [28/2-13] Questions for the oral exam. Linear regression + logistic regression + further methods.
- Download links for the course literature have been updated! See below. You can now download the pdf of each book chapter!
- Welcome letter (updated version, 11th October 2013)
- First meeting Monday January 21, 2013, 13.15-15.00 in E:3308.
- Lectures Mondays 13.15-15.00 in E:3308 and Wednesdays
10.15-12.00 in E:C (except where noted in red).
Exercises Wednesdays 15.15-17.00 in E:1145.
Computer exercises Fridays 10.15-12.00 in the rooms indicated in the schedule below. - Do you wish to follow the course? If you are an LTH student please send an email to anna@maths.lth.se, otherwise you should make a late application at http://www.lu.se/lubas/info-uh-lu-MASM22 .
R
We will use the statistical program R which can be downloaded from http://ftp.sunet.se/pub/lang/CRAN/ free of charge for all major platforms. It is a good idea to install it on your own computer, if you have one. Also, a good programming practice is to consider an appropriate editor for writing R programs; therefore I have set a page for the Tinn-R editor (Tinn is for Windows only, but the page contains alternative suggestions for other operative systems) and one for Rstudio (for Windows/Linux/MacOS).Notice that this course is about statistics and not an in-depth course about R. We will discuss the commands needed to produce the desired output and answer the relevant statistical questions. However we will not consider tips-and-tricks, good programming practice or any advanced use of such powerful computer language. R has a large and friendly user community and you will be able to find plenty of good guides, tutorials and answered questions by a simple Google search. Here follow some of the many guides freely available on the web:
- R.pdf A (small) R Tutorial
- RTutorial.pdf A Short R
Tutorial
- R-intro.pdf An Introduction to R (<--it can also be found from the R's Help menu)
Schedule
Preliminary plan.
| Week | Contents | Chapter | Notes | |
|---|---|---|---|---|
| v3 | Mon 21/1 in E:3308 | Introduction; Review of simple linear regression | Rawlings, Ch. 1 | lecture_1.pdf (updated! only third last slide has been changed) |
| Wed 23/1 in E:C | Multiple regression | Rawlings, Ch. 2-3 | lecture_2.pdf (updated! only slides 8 and 11 have been modified)
The Matrix Cookbook (<-- might be useful) R code for lectures 1-2 (can be opened with R or any text editor). | |
| Wed 23/1 in E:1145 | Exercises: 1.4, 1.9 (*), 1.14, 1.15, 1.16, 1.19, 1.21 (*); 3.5, 3.6, 3.9, 3.10, 3.12, 3.14 (*): skip the "analysis of variance" part. |
Exercises 1.x, Exercises 3.x | Answers to exercises 1.x and 3.x
R code for exercise 1.9 | |
| Fri 25/1 in E:Hacke and E:Panter | Computer exercise, work on project 1 |
project 1 manual project1_data.Rdata (R data-file) project1_data.csv (semicolon-separated ascii data-file) | ||
| v4 | Mon 28/1 in E:3308 | Analysis of variance | Rawlings, Ch. 1.4, 1.6, 4.1-5 | lecture_3.pdf
some R code for multiple regression some R code for t and F tests |
| Wed 30/1 in E:C | Class variables and variable selection | Rawlings, Ch. 9 and 7 respectively | lecture_4.pdf
R code for lecture 4 data for lecture 4 (<--right mouse click + save-link-as) |
|
| Wed 30/1 in E:1145 | Exercises: older ones + 4.6(a)(i)+(b)(ii)+(c)(i), 4.10(a)+(b)+(c)+(e) | Exercises 4.x | Answers to exercises 4.x | |
| Fri 1/2 in E:Falk and E:Val | Computer exercise, work on project 1 | |||
| v5 | Mon 4/2 in E:3308 | Problems areas in least squares. | Rawlings, Ch. 10 | lecture_5.pdf |
| Wed 6/2 in E1406 | Regression diagnostics | Rawlings, Ch. 11 | lecture_6.pdf
R code for lecture 6 data for lecture 6 | |
| Wed 6/2 in E:1145 | Exercises: old ones + 10.1-10.2, 11.1-11.2 | answers Ch. 10-11 | ||
| Fri 8/2 in E:Falk and E:Val | Computer exercise, work on project 1 | |||
| v6 | Mon 11/2 in E:3308 | Peer assessment, project 1. Summary of linear regression. If time allows I will start talking about binary data and odds ratios. | ||
| Wed 13/2 in E:C | Binary data and odds ratios. Logistic regression | Christensen, Ch. 1, 2.1-3, 2.6; also check sec. 1.4.1-1.4.2 in Agresti | lecture_7.pdf
R code for lecture 7 (as shown at lecture) R code for lecture 7 (richer version) | |
| Wed 13/2 in E:1145 | Exercise: ex. on binary data (but skip (d)) | Solution for excercise on binary data | ||
| Fri 15/2 in E:Falk and E:Val | Computer exercise, final deadline project 1, work on project 2 | project2.pdf (manual) | ||
| v7 | Mon 18/2 in E:3308 | Maximum Likelihood and likelihood ratio tests | Agresti:, sec. 3.4.4; check here section 3 on Newton-Raphson | lecture 8
R code for lecture 8 |
| Wed 20/2 in E:C | Residuals and model validation in logistic regression | lecture_9.pdf
R code for lecture 9 made up data more made up data | ||
| Wed 20/2 in E: Falk, E:Val | Computer exercise, work on project 2 | |||
| Fri 22/2 in E:Falk and E:Val | Computer exercise, work on project 2 | |||
| v8 | Mon 25/2 in E:3308 | Poisson, Negative binomial | Christensen, Ch. 1.5, 9 | lecture_10.pdf
R code Poisson regr data Neg bin regr data |
| Wed 27/2 in E1406 | 10-11: Peer assessment, project
2; 11-12: Quantile regression |
Quantile Regression
An R "vignette" |
lecture_11.pdf
f11a.R (simple version, as shown at lecture) f11.R (applies QR to several situations) | |
| Wed 27/2 in E: Falk, E:Val | (Computer) exercise | |||
| Fri 1/3 in E:Falk and E:Val | Computer exercise, Final deadline project 2, work on project 3 | project3.pdf | ||
| v9 | Mon 4/3 in E: Varg, E:Val | Work on project 3 | ||
| Wed 6/3 in E:C | CANCELLED Lecture: Summary | |||
| Wed 6/3 in E: Falk, E:Val | Work on project 3 | |||
| Fri 8/3 in E:Falk and E:Val | Work on project 3 | |||
| v10-11 | Presentation of project 3. See the calendar for the project 3 presentations (updated) | |||
| v12--... | Individual oral exams. See the exam calendar | |||
Level
Advanced level.
Aim
Regression analysis deals with modelling how one characteristic (height, weight, price, concentration, etc) varies with one or several other characteristics (sex, living area, expenditures, temperature, etc). Linear regression is introduced in the basic course in mathematical statistics but here we expand with, e.g., "how do I check that the model fits the data", "what should I do if it doesn't fit", "how uncertain is it", and "how do I use it to draw conclusions about reality".
When performing a survey where people can answer "yes/no" or "little/just fine/much", or "car/bicycle/bus" or some other categorical alternative, you cannot use linear regression. Then you need logistic regression instead. This is the topic in the second half of the course.
Contents
Least squares and maximum-likelihood-method; odds ratios; Multiple linear and logistic regression; Matrix formulation; Methods for model validation, residuals, outliers, influential observations, multi co-linearity, change of variables; Choice of regressors, F-test, likelihood-ratio-test; Confidence intervals and prediction. Introduction to: Correlated errors, Poisson regression as well as multinomial and ordinal logistic regression.
Prerequisites
At least 60 ECTS at university level including an introductory course in mathematical statistics, e.g. MASA01 Matematical statistics, basic course, 15hp, or MASB02 Mathematical statistics (for chemists) 7.5hp, or MASB03 Mathematical statistics (for physicists) 9hp or MASB11 Biostatistics, basic course 7.5hp, or equivalent.
Teaching and examination
The teaching consists of lectures, exercises, computer exercises and project work. The examination is written and oral in the form of project reports, written and oral opposition, and individual oral examination.
Literature (NEWS of 10th January: download links have been fixed and you can now download the pdf for each chapter!)
- Rawlings, J.O., Pantula, S.G., Dickey, D.A.: Applied Regression Analysis - A Research Tool, 2ed, Springer, available as e-book,
- Christensen, R.: Log-Linear Models and Logistic Regression, 2ed, Springer, available as e-book.
Lecturer
Umberto Picchini, tel 046 222 9270, office MH:125a, email address.
Teaching Assistants
Behnaz Pirzamanbin, tel 046 222 4623, office MH:326,
behnaz@maths.lth.se
Elham Pirnia, elham@maths.lth.se
Learning outcomes
Knowledge and understanding
For a passing grade the student must
- Describe the differences between continuous and discrete data, and the resulting consequences for the choice of statistical model
- Give an account of the principles behind different estimation principles,
- Describe the statistical properties of such estimates as appear in regression analysis,
- Interpret regression relations in terms of conditional distributions,
- Explain the concepts of odds and odds ratio, and describe their relation to probabilities and to logistic regression.
Skills and abilities
For a passing grade the student must
- Formulate a multiple linear regression model for a concrete problem,
- Formulate a multiple logistic regression model for a concrete problem,
- Estimate the parameters in the regression model and interpret them,
- Examine the validity of the model and make suitable modifications of the model,
- Use the model resulting for prediction,
- Use some statistical computer program for analysis of regression data, and interpret the results,
- Present the analysis and conclusions of a practical problem in a written report and an oral presentation.
Judgement and approach
For a passing grade the student must
- Always control the prerequisites before stating a regression model,
- Evaluate the plausibility of a performed study,
- Relect over the limitations of the chosen model and estimation method, as well as alternative solutions.
Validate: HTML / CSS | Top of page