- [22/3-19] Slides and R-code for Lecture 1 added. Sign-up for all three compulsory exercises open.
- [20/3-19] Instructions for computer exercise 1 added, see the appropriate time in the schedule below.
- [12/3-19] Sign-up for compulsory lab 1 open. If either of the times gets booked full we will add a second room for that time slot. Update 14/3: 10-12 has got a second room.
- [12/3-19] Welcome letter
- [23/1-19] The course starts Monday 25 March, 2019, 13.15-15.00 in MH:Riesz.
- This course is taught jointly with FMSN40 Linear and logistic regression with data gathering, 9hp: please check http://www.maths.lth.se/matstat/kurser/fmsn40/
- [23/1-19] FMSN30/MASM22 schedule
- [23/1-19] Compulsory computer exercises: Wednesday 27 March, 10.15-12.00 or 13.15-15.00; Wednesday 3 April, 10.15-12.00 or 13.15-15.00; Wednesday 10 April, 10.15-12.00 or 13.15-15.00.
- [23/1-19] Peer review of project reports: Project 1 Monday 15 April, 13.15-14.00; Project 2 Wednesday 15 May, 8.15-9.00.
- [23/1-19] Project 1+2 deadlines: Project 1 Wednesday 17 April, 16.00; Project 2 Friday 17 May, 16.00.
- [23/1-19] Project 3 presentations: Monday 27 May 8.15-10 or 10.15-12 or 15.15-17 or Tuesday 28 May 10.15-12 or 15.15-17 or Wednesday 29 May, 8.15-10 or 13.15-15 or 15.15-17.
- [23/1-19] Oral exams: 3-20 June.
We will use the statistical program R which can be downloaded from http://ftp.acc.umu.se/mirror/CRAN/ free of charge for all major platforms. It is a good idea to install it on your own computer, if you have one. Also, a good programming practice is to consider an appropriate editor for writing and executing R programs; therefore I have set a page for Rstudio (for Windows/Linux/MacOS).
Notice that this course is about Statistics and is not an in-depth course about R. We will discuss the commands needed to produce the desired output and answer the relevant statistical questions. However we will not consider tips-and-tricks, good programming practice or any advanced use of such powerful computer language. R has a large and friendly user community and you will be able to find plenty of good guides, tutorials and answered questions by a simple Google search. Here follow some of the many guides freely available on the web:
Course specific help:
You will have the chance to book specific computer labs sessions. That is you do not have to attend all labs reported in the schedule below, only the ones you book. Special attention should be devoted to mandatory labs denoted in bold: you MUST attend one of those each week for the first three weeks.
- Rawlings, J.O., Pantula, S.G., Dickey, D.A.: Applied Regression Analysis - A Research Tool, 2ed, Springer, available as e-book,
- Agresti, A. An Introduction To Categorical Data Analysis, 2ed Wiley, 2007, available as e-book.
Schedule spring 2019
|w13||Mon 25/3, 13.15-15, MH:Riesz||Lecture 1: Introduction; Review of simple linear regression, linear relationships, linear models and basic assumptions (normality, homoscedasticity, linearity, independence), least squares estimation, basic properties of expectation, variance and covariance; mean and variance of least squares estimators.||Rawlings Ch.1;
|Wed 27/3, 8.15-10, MH:Riesz||Lecture 2: Continuation of simple linear regression; distribution of least squares estimators; prediction; confidence intervals; hypothesis testing, p-values, quantiles.||Rawlings Ch.1|
|Wed 27/3, 10.15-12 MH:230+231 or 13.15-15, MH:230||Compulsory computer lab 1||Sign-up;
|Thur 28/3, 8.15-10, MH:230+231||work on Project 1|
|w14||Mon 1/4, 13.15-15, MH:Riesz||Lecture 3. Multiple Regression: matrix notation, properties of least squares estimators for multiple regression; confidence intervals for multiple regression; critical requirements; ill-ranked design matrices, lack of invertibility.||Rawlings Ch.3, 4, 6.5|
|Wed 3/4, 8.15-10, MH:Riesz||Lecture 4. Categorical variables. Analysis of variance: variability decomposition. Global F-test. ANOVA tables. Partial F-test.||Rawlings Ch.4, 9.|
|Wed 3/4, 10.15-12 MH:230+231 or 13.15-15, MH230||Compulsory computer lab 2||Sign-up;|
|Thu 4/4, 8.15-10, MH:230+231||Work on project 1|
|w15||Mon 8/4, 13.15-15, MH:Riesz||Lecture 5. R-squared, Adjusted-R-squared. AIC & BIC, automatic selection methods||Rawlings Ch.7|
|Wed 10/4, 8.15-10, MH:Riesz||Lecture 6. Problem areas in least squares; Regression diagnostics: outliers w.r.t. X (leverage), distribution of residuals, standardised and studentised residuals; graphical tools for residual analysis. Influential observations (Cook's distance, DFBETAS)||Rawlings Ch.10-11|
|Wed 10/4, 10.15-12 MH:230+231 or 13.15-15, MH:230||Compulsory computer lab 3||Sign-up;|
|Thu 11/4, 8.15-10, MH:230+231||Work on project 1|
|w16||Mon 15/4, 13.15-15, MH:Riesz||13.15-14:00: Peer assessment,
14:15-15: Wrapping up linear regression
|Tue 16/4, 8.15-10, MH:Riesz||Lecture 7. Binary data, Bernoulli and binomial distributions, odds ratios and started talking of Logistic regression||Agresti Ch. 1, sec 1.2.1, sec 2.3|
|Wed 17/4, 13.15-15, MH:230||Finish project 1 and start on project 2|
|Wed 17/4, 16.00||Project 1 final deadline at 16:00. MASM22/FMSN30 students email the report to FMSN30@matstat.lu.se. Subject field: Project1 by studid1 and studid2|
|w17||EASTER BREAK and RE-EXAM PERIOD|
|w19||Mon 6/5, 13.15-15, MH:Riesz||8. Maximum likelihood estimation, Newton-Raphson, properties, deviance and likelihood ratio tests.||Agresti: 1.3.1, 1.4.1, 2.3.1-2.3.3; several topics scattered in chapter 4, particularly sections 4.1-4.2.|
|Wed 8/5, 8.15-10, MH:Riesz||9. Akaike (again), Pseudo-R2, residuals and model validation in logistic regression.|
|Wed 8/5, 10.15-12, MH:230||Work on project 2|
|Thu 9/5, 8.15-10, MH:230||Work on project 2|
|w20||Mon 13/5, 13.15-15, MH:Riesz||10. Poisson distribution and Poisson regression; Negative binomial regression||Agresti: several sections in Chapter 3.|
|Wed 15/5, 8.15-10, MH:Riesz||8.15-9.00: Peer assessment
9.15-10: 11. Summary of logistic regression.
|Wed 15/5, 10.15-12, MH:230||Work on project 2 and start on project 3|
|Thu 16/5, 8.15-10, MH:230||Work on project 2 and/or start on project 3|
|Fri 17/5, 16.00||Project 2 final deadline at 16.00.. MASM22/FMSN30 students email the report to FMSN30@matstat.lu.se. Subject field: Project2 by studid1 and studid2|
|w21||Wed 22/5, 13.15-15, MH:230||Work on project 3|
|Thu 23/5, 8.15-10, MH:230||Work on project 3|
|w22||Mon 27/5, 8.15-10, MH:Sigma||Project 3 oral presentations|
|Mon 27/5, 10.15-12, MH:Sigma||Project 3 oral presentations|
|Mon 27/5, 15.15-17, MH:Sigma||Project 3 oral presentations|
|Tue 28/5, 10.15-12, MH:Sigma||Project 3 oral presentations|
|Tue 28/5, 15.15-17, MH:Sigma||Project 3 oral presentations|
|Wed 29/5, 8.15-10, MH:Sigma||Project 3 oral presentations|
|Wed 29/5, 13.15-15, MH:Sigma||Project 3 oral presentations|
|Wed 29/5, 15.15-17, MH:Sigma||Project 3 oral presentations|
|w23||Mon 3/6, Tue 4/6, Wed 5/6, 8.15-17.00||Oral exams|
|w24||Mon 10/6, Tue 11/6, Wed 12/6, Fri 14/6, 8.15-17.00||Oral exams|
|w25||Mon 17/6, Tue 18/6, Wed 19/6, Thu 120/6, 8.15-17.00||Oral exams|
Regression analysis deals with modelling how one characteristic (height, weight, price, concentration, etc) varies with one or several other characteristics (sex, living area, expenditures, temperature, etc). Linear regression is introduced in the basic course in mathematical statistics but here we expand with, e.g., "how do I check that the model fits the data", "what should I do if it doesn't fit", "how uncertain is it", and "how do I use it to draw conclusions about reality".
When performing a survey where people can answer "yes/no" or "little/just fine/much", or "car/bicycle/bus" or some other categorical alternative, you cannot use linear regression. Then you need logistic regression instead. This is the topic in the second half of the course.
Least squares and maximum-likelihood-method; odds ratios; Multiple linear and logistic regression; Matrix formulation; Methods for model validation, residuals, outliers, influential observations, multi co-linearity, change of variables; Choice of regressors, F-test, likelihood-ratio-test; Confidence intervals and prediction. Introduction to: Correlated errors, Poisson regression as well as multinomial and ordinal logistic regression.
At least 60 ECTS at university level including an introductory course in mathematical statistics, e.g. MASA01 Matematical statistics, basic course, 15hp, or MASB02 Mathematical statistics (for chemists) 7.5hp, or MASB03 Mathematical statistics (for physicists) 9hp or MASB11 Biostatistics, basic course 7.5hp, or equivalent.
The teaching consists of lectures, computer exercises and project work. Attendance to the three exercises is compulsory. The examination is written and oral in the form of written reports for project 1 and 2, oral presentation of project 3 and individual oral examination.
Knowledge and understanding
For a passing grade the student must
- Describe the differences between continuous and discrete data, and the resulting consequences for the choice of statistical model
- Give an account of the principles behind different estimation principles,
- Describe the statistical properties of such estimates as appear in regression analysis,
- Interpret regression relations in terms of conditional distributions,
- Explain the concepts of odds and odds ratio, and describe their relation to probabilities and to logistic regression.
Skills and abilities
For a passing grade the student must
- Formulate a multiple linear regression model for a concrete problem,
- Formulate a multiple logistic regression model for a concrete problem,
- Estimate the parameters in the regression model and interpret them,
- Examine the validity of the model and make suitable modifications of the model,
- Use the model resulting for prediction,
- Use some statistical computer program for analysis of regression data, and interpret the results,
- Present the analysis and conclusions of a practical problem in a written report and an oral presentation.
Judgement and approach
For a passing grade the student must
- Always control the prerequisites before stating a regression model,
- Evaluate the plausibility of a performed study,
- Relect over the limitations of the chosen model and estimation method, as well as alternative solutions.