## News

- [16/5-18] Quantile regression is deleted from this year.
- [15/5-18] Lecture 10 added. And fixed the bug in the R-code. And added Project 3 for MASM22/FMSN30
- [2/5-18] Lectues 9 added. List of questions for the oral exam updated.
- [27/4-18] Lecture 8 added.
- [25/4-18] Lecture 7 added and the bugs in the R-fil fixed.
- [24/4-18] Instructions and data for Project 2 added.
- [23/4-18] Solutions to compulsory computer exercise 3 added.
- [18/4-18] Lecture 6 and Compulsory Lab 3 added. Also added a list of questions for the Oral exams for the Liear regression part of the course.
- [16/4-18] Slides and R-code for Lecture 5 added.
- [13/4-18] You can now start booking times for the presentations of Project 3 and for your oral exams.
- [4/4-18] Solutions to compulsory computer exercise 2 added.
- [27/3-18] Corrected the code and slides for the interaction plot in Lecture 4. Also updated Project 1 with question 3.3 and 3.4.
- [26/3-18] Solutions to compulsory Lab 1 added. R-code from Lecture 3 added. Slides and R-code for Lecture 4 added.
- [23/3-18] Slides for Lecture 3 added.
- [22/3-18] Compulsory Lab 2 added.
- [21/3-18] Bugfix in R-code for Lecture 1. Added R-code for Lecture 2.
- [20/3-18] Slides for Lecture 2 and compulsory Lab 1 added. Also added Project 1. This will be updated later!
- [19/3-18] Sign-up (Choose time) now availabe for all labs and project helps where the two groups have different times.
- [19/3-18] Slides and R-code for Lecture 1 added. Also added an introduction to R: Lab 0.
- [23/2-18] Welcome letter
- [6/2-18] The course starts
**Monday 19 March, 2018, 13.15-15.00 in MH:Riesz**. - This course is taught jointly with FMSN40 Linear and logistic regression with data gathering, 9hp: please check http://www.maths.lth.se/matstat/kurser/fmsn40/
- [6/2-18] FMSN30/MASM22 schedule
- [6/2-18]
**Compulsory**computer exercises: Wednesday**21 March**, 10.15-12.00 or 13.15-15.00; Tuesday**27 March**, 10.15-12.00 or 13.15-15.00; Wednesday**18 April**, 10.15-12.00 or 13.15-15.00. - [6/2-18]
**Peer review**of project reports: Project 1 Monday**23 April**, 13.15-14.00; Project 2 Wednesday**16 May**, 8.15-9.00. - [6/2-18]
**Project 1+2 deadlines**: Project 1 Thursday**26 April**, 16.00; Project 2 Thursday**17 May**, 16.00. - [6/2-18]
**Project 3 presentations**: Thursday**24 May**8.15-10 or 13.15-17.00 or Friday**25 May**, 10.15-12.00 or Monday**28 May**, 8.15-17.00 - [6/2-18]
**Oral exams**: 30 May - 21 June.

### R

We will use the statistical program R which can be downloaded from http://ftp.acc.umu.se/mirror/CRAN/ free of charge for all major platforms. It is a good idea to install it on your own computer, if you have one. Also, a good programming practice is to consider an appropriate editor for writing and executing R programs; therefore I have set a page for Rstudio (for Windows/Linux/MacOS).

Notice that this course is about **Statistics** and is not
an in-depth course about R. We will discuss the commands
needed to produce the desired output and answer the relevant
statistical questions. However we will not consider
tips-and-tricks, good programming practice or any advanced use
of such powerful computer language. R has a large and friendly
user community and you will be able to find plenty of good
guides, tutorials and answered questions by a simple Google
search. Here follow some of the many guides freely available
on the web:

- R.pdf A (small) R Tutorial
- RTutorial.pdf A Short R
Tutorial
- R-intro.pdf An Introduction to R (<--the most up-to-date version can be found from the R's Help menu)

### Computer Labs

You will have the chance to book specific computer labs
sessions. That is you do not have to attend all labs reported in the
schedule below, only the ones you book. Special attention should be
devoted to mandatory labs denoted in **bold**: you MUST
attend one of those each week for the first three weeks.

### Literature

- Rawlings, J.O., Pantula, S.G., Dickey, D.A.:
*Applied Regression Analysis - A Research Tool*, 2ed, Springer, available as e-book, - Agresti, A.
*An Introduction To Categorical Data Analysis*, 2ed Wiley, 2007, available as e-book.

### Schedule spring 2018

>Week | Place | Contents | Additional Support | Files |
---|---|---|---|---|

w12 | Mon 19/3, 13.15-15, MH:Riesz | 1.Introduction; Review of simple linear regression, linear relationships, linear models and basic assumptions (normality, homoscedasticity, linearity, independence), least squares estimation, basic properties of expectation, variance and covariance; mean and variance of least squares estimators | Rawlings, Ch. 1 | f1_vt18.pdf;
f1_vt18.R (bugfix 21/3) |

Wed 22/3, 8.15-10, MH:Riesz | 2. Continuation of simple linear regression; distribution of least squares estimators; prediction; confidence intervals; hypothesis testing, p-values, quantiles | Rawlings, Ch. 1 | f2_vt18.pdf;
f2_vt18.R | |

Wed 22/3, 10.15-12 or 13.15-15,
MH:230 |
compulsory computer lab 1 |
Lab 0;
Lab 1; Lab 1-solutions | ||

Thur 22/3, 8.15-10, MH:230 orFri 23/3, 13.15-15, MH:231 |
work on Project 1 |
Project 1
(updated 27/3);
plasma.txt | ||

w13 | Mon 26/3, 13.15-15, MH:Riesz | 3. Multiple Regression: matrix notation, properties of least squares estimators for multiple regression; confidence intervals for multiple regression; critical requirements; ill-ranked design matrices, lack of invertibility. | Rawlings, Ch. 3, 4, 6.5 |
f3_vt18.pdf;
f3_vt18.R; f3_matriser.R |

Tue 27/3, 8.15-10, MH:Riesz | 4. Categorical variables. Analysis of variance: variability decomposition. Global F-test. ANOVA tables. Partial F-test. | Rawlings, Ch. 4, 9. | f4_vt18.pdf (plotfix 27/3);
f4_vt18.R (bugfix 27/3) | |

Tue 27/3, 10.15-12 or 13.15-15, MH230 |
Compulsory computer lab 2 |
Lab 2;
sleep.txt; Lab2-solutions | ||

Wed 28/3, 13.15-15, MH:230 or MH:231 |
Work on project 1 | |||

w14 | EASTER BREAK and RE-EXAM PERIOD | |||

w15 | ||||

w16 | Mon 16/4, 13.15-15, MH:Riesz | 5. R-squared, Adjusted-R-squared. AIC & BIC, automatic selection methods | Rawlings, Ch. 7 | f5_vt18.pdf (error on
p.4 fixed);
f5_vt18.R |

Wed 18/4, 8.15-10, MH:Riesz | 6. Problem areas in least squares; Regression diagnostics: outliers w.r.t. X (leverage), distribution of residuals, standardised and studentised residuals; graphical tools for residual analysis. Influential observations (Cook's distance, DFBETAS) | Rawlings, Ch. 10-11 |
f6_vt18.pdf;
f6_vt18.R; f6data.txt; f6_residvar.pdf; | |

Wed 18/4, 10.15-12 or 13.15-15,
MH:230 |
Compulsory computer lab 3 |
Lab 3
CDI.txt; Lab3-solutions | ||

Thu 19/4, 13.15-15, MH:230 or
MH:231 | Work on project 1 | |||

w17 | Mon 23/4, 13.15-15, MH:Riesz | 13.15-14:00: Peer assessment,
project 1
.14:15-15: Wrapping up linear regression |
||

Wed 25/4, 8.15-10, MH:Riesz | 7. Binary data, Bernoulli and binomial distributions, odds ratios and started talking of Logistic regression | Agresti: ch. 1, sec 1.2.1, sec 2.3 | f7_vt18.pdf;
f7_vt18.R (bugfix 25/4) | |

Wed 25/4, 10.15-12 or 13.15-15,
MH:230 |
Work on project 1 and start on project 2 |
Project 2;
pm10.txt |
||

Thu 26/4, 16.00 | Project 1 final deadline at
16:00. MASM22/FMSN30
students email the report to
FMSN30@matstat.lu.se. Subject field: Project1
by studid1 and studid2 |
|||

w18 | Wed 2/5, 8.15-12, MH:Riesz | 8. Maximum likelihood estimation, Newton-Raphson, properties, deviance and likelihood ratio tests. | Agresti: 1.3.1, 1.4.1, 2.3.1-2.3.3; several topics scattered in chapter 4, particularly sections 4.1-4.2. | f8_vt18.pdf;
f8_vt18.R |

Thu 3/5, 8.15-10, MH:Riesz | 9. Akaike (again), Pseudo-R2, residuals and model validation in logistic regression. | f9_vt18.pdf;
f9_vt18.R; f9_data.txt | ||

w19 | Tue 8/5,
10.15-12 or 13.15-15, MH:230 |
Work on project 2 | ||

Wed 9/5, 13.15-15, MH:230 or MH:231 |
Work on project 2 | |||

w20 | Tue 15/5, 15.16-17, MH:Riesz | 10. Poisson distribution and Poisson regression; Negative binomial regression | Agresti: several sections in Chapter 3. | f10_vt18.pdf;
f10_vt18.R (bugfix 15/5); poisson_sim.csv; f10b.txt |

Wed 16/5, 8.15-10, MH:Riesz | 8.15-9.00: Peer assessment
project 2:
9.15-10: 11. |
|||

Wed 16/5, 13.15-15, MH:230 or
MH:231 |
Work on project 2 and start on project 3 | Project 3;
cardio.txt | ||

Thu 17/5, 13.15-15, MH:230 or
MH:231 | Work on project 2 and/or start on project 3 | |||

Thu 17/5, 16.00 | Project 2
final deadline at 16.00.. MASM22/FMSN30
students
email the report to FMSN30@matstat.lu.se. Subject field:
Project2 by studid1 and studid2 |
|||

w21 | Tue 22/5, 13.15-15, MH:230
or MH:231 | Work on project 3 | ||

Wed 23/5, 8.15-10, MH:230 or MH:231 |
Work on project 3 | |||

Thu 24/5, 9.15-10, MH:Sigma | Project 3
oral resentations: Per Niklas+Lampros,
Karolina | |||

Thu 24/5, 13.15-15, MH:Sigma | Project 3 oral
presentations: Rita+Elisabeth, Rickard+Gabriella,
Johan+Martin, Amanda, Carl+Amanda | |||

Thu 24/5, 15.15-16, MH:Sigma | Project 3 oral
presentations: Kevin+Nikolaos, Martin | |||

Fri 25/5, 10.15-12, MH:Sigma | Project 3 oral
presentations: Yen+Dongni, Oskar,
Carl+Jesper | |||

w22 | Mon 28/5, 9.15-10, MH:Sigma | Project 3
oral presentations: Rasmus+Nathaniel,
Mara | ||

Mon 28/5, 10.15-11, MH:Sigma | Project 3 oral
presentations: Adrian+Henrik, Björn | |||

Mon 28/5, 13.15-14, MH:Sigma | Project 3 oral
presentations: Zongguo, Christ-Roi | |||

Mon 28/5, 15.15-17, MH:Sigma | Project 3 oral
presentations: Marcus+Jan, Justinas+Niklas,
Kasper | |||

Wed 30/5, 10.15-11, MH:227 | Project 3 oral
presentations: Emmy+Evelina, Juan
Pablo+Anamda | |||

Wed 30/5, Thu 31/5, Fri 1/6 | Oral exams | Choose time;
Questions updated 16/5. | ||

w23 | Mon 4/6, Tue 5/6, Thu 7/6, Fri 8/6 | Oral
exams | ||

w24 | Mon 11/6, Tue 12/6, Wed 13/6, Thu 14/6, Fri 15/6 | Oral exams |
||

w25 | Mon 18/6, Tue 19/6, Thu 21/6 | Oral exams |

### Level

Advanced level.

### Aim

Regression analysis deals with modelling how one characteristic (height, weight, price, concentration, etc) varies with one or several other characteristics (sex, living area, expenditures, temperature, etc). Linear regression is introduced in the basic course in mathematical statistics but here we expand with, e.g., "how do I check that the model fits the data", "what should I do if it doesn't fit", "how uncertain is it", and "how do I use it to draw conclusions about reality".

When performing a survey where people can answer "yes/no" or "little/just fine/much", or "car/bicycle/bus" or some other categorical alternative, you cannot use linear regression. Then you need logistic regression instead. This is the topic in the second half of the course.

### Contents

Least squares and maximum-likelihood-method; odds ratios; Multiple linear and logistic regression; Matrix formulation; Methods for model validation, residuals, outliers, influential observations, multi co-linearity, change of variables; Choice of regressors, F-test, likelihood-ratio-test; Confidence intervals and prediction. Introduction to: Correlated errors, Poisson regression as well as multinomial and ordinal logistic regression.

### Prerequisites

At least 60 ECTS at university level including an introductory course in mathematical statistics, e.g. MASA01 Matematical statistics, basic course, 15hp, or MASB02 Mathematical statistics (for chemists) 7.5hp, or MASB03 Mathematical statistics (for physicists) 9hp or MASB11 Biostatistics, basic course 7.5hp, or equivalent.

### Teaching and examination

The teaching consists of lectures, computer exercises and project work. Attendance to the three exercises is compulsory. The examination is written and oral in the form of written reports for project 1 and 2, oral presentation of project 3 and individual oral examination.

### Lecturer

Anna Lindgren, tel 046-2224276, office MH:136, Matematikcentrum anna@maths.lth.se.

### Teaching Assistants

Rachele
Anderson, tel 046 2224580 , office MH:323,
rachele@maths.lth.se

Vladimir
Pastukhov, tel 046 2227974, office
MH:324, pastuhov@maths.lth.se

### Learning outcomes

#### Knowledge and understanding

For a passing grade the student must

- Describe the differences between continuous and discrete data, and the resulting consequences for the choice of statistical model
- Give an account of the principles behind different estimation principles,
- Describe the statistical properties of such estimates as appear in regression analysis,
- Interpret regression relations in terms of conditional distributions,
- Explain the concepts of odds and odds ratio, and describe their relation to probabilities and to logistic regression.

#### Skills and abilities

For a passing grade the student must

- Formulate a multiple linear regression model for a concrete problem,
- Formulate a multiple logistic regression model for a concrete problem,
- Estimate the parameters in the regression model and interpret them,
- Examine the validity of the model and make suitable modifications of the model,
- Use the model resulting for prediction,
- Use some statistical computer program for analysis of regression data, and interpret the results,
- Present the analysis and conclusions of a practical problem in a written report and an oral presentation.

#### Judgement and approach

For a passing grade the student must

- Always control the prerequisites before stating a regression model,
- Evaluate the plausibility of a performed study,
- Relect over the limitations of the chosen model and estimation method, as well as alternative solutions.

Validate: HTML / CSS | Top of page