Engineering economics problems
Project Instructions
For this project, you will be required to use data from one of the following articles:
Angrist, J. and V. Lavy (2009). The effects of high stakes high school achievement awards: Evidence from a randomized trial. American Economic Review 99(4), 1384 – 1414. [Data]
Banerjee, A., E. Duflo, R. Glennerster, and C. Kinnan (2015). The miracle of microfinance?
Evidence from a randomized evaluation. American Economic Journal: Applied Economics 7(1),
22 – 53. [Data]
Banerji, R. J. Berry, and M. Shotland (2017). The impact of maternal literacy and participation programs: Evidence from a randomized evaluation in India. American Economic Journal:
Applied Economics 9(4), 303 – 337. [Data]
Bertrand, M., and S. Mullainathan (2004). Are Emily and Greg more employable than Lakisha
and Jamal? A field experiment on labor market discrimination. American Economic Review
94(4), 991 – 1013. [Data]
Chong, A. I. Cohen, E. Field, E. Nakasone, and M. Torero (2016). Iron deficiency and schooling attainment in Peru. American Economic Journal: Applied Economics 8(4), 222 – 255. [Data]
Gneezy, U., J. List, J. Livingston, X. Qin, S. Sadoff, and Y. Xu (2019). Measuring success in
education: The role of effort on the test itself. American Economic Review: Insights 1(3), 291 –
308. [Data]
Muralidharan, K. and V. Sundararaman (2011). Teacher performance pay: Experimental evidence from India. Journal of Political Economy 119(1), 39 – 77. [Data]
You must let me know (via email) which of these articles you are interested in no later than 8am on January
25; if I do not hear from you by then, I will make the choice for you.
As noted on the syllabus, you are not being asked to replicate the article that you are getting your data
from. Instead, once you let me know which of these articles you are interested in, I will suggest a slight
variation on it for you to do (e.g., if the original article analyzed performance for all students, I might suggest
that you focus only on the performance of boys).
Ultimately, your aim in this project is to answer a causal (not “casual”) question such as the following:
Does being placed into a small class cause students to perform better academically? To do so, you will be
required to use a regression model of the following form:
Outcomei = α + βTreatmenti + Xiγ + Ui
,
where Outcomei
is the outcome (e.g., a test score) for the ith individual, Treatmenti
is equal to 1 if the ith
individual receives the treatment (e.g., being placed into a small class) and 0 otherwise, Xi
is a vector of
control variables (e.g., age, gender, etc.) for the ith individual, and Ui
is an idiosyncratic error term.
The main parameter of interest is β, which is known as the “average treatment effect” or ATE (no one
really cares what α or γ are). Thus, the null hypothesis you will want to test is H0 : β = 0. If any of this
is unclear to you, please make sure to spend some time watching my videos that review the background
material you are expected to be familiar with from your previous courses in statistics/econometrics.
Submission Instructions
The project will be completed through 4 “instalments” each worth 20% of your final grade (the other 20%
is the test). You should think of these instalments not as 4 separate pieces of work, but rather 4 versions of
the same piece of work, each one being “better” than the one that came before it. That is, each instalment
should not only add new features, but also improve the existing features (this means fixing any technical
errors you had previously, making your writing more clear, etc.).
1
Instalment submissions must be made via a private Google Drive folder that I will share with you once
you let me know which article you are interested in. Specifically, each instalment will require you to upload
files named paper-x.pdf and code-x.txt, where x is the instalment number (e.g., your first instalment
will require files named paper-1.pdf and code-1.txt). The file named paper-x.pdf is to be a PDF file
containing the latest version of the write-up for your project. The file named code-x.txt is to be a plain text
file containing the latest version of your R code. For the first instalment, you will also need to upload your
data file (do not modify this file in any way, i.e., do not rename it or convert it to a different format). All of
these files must be contained entirely within the Google Drive folder I share with you; please do not create
any subfolders within this folder! If you have done everything correctly, you will have uploaded exactly 9
files in this folder by the end of the course (2 for each instalment plus 1 containing your data).
Please note that failure to precisely follow the above submission guidelines will result in a mark of zero.
For example, if you were to upload your write-up as a Word file rather than a PDF file, or your R code as a
rich text file rather than a plain text file, you would get a zero.1
Your R Code
The single most important thing to keep in mind about this project is that I need to be able to replicate all
of your results. To run your code, I will set my working directory to the Google Drive folder I have shared
with you and enter the following command:
source(“code-x.txt”)
(where x is the instalment number). Your code needs to be written so that the above produces every
single number that appears in your write-up. If this doesn’t work for any reason, you will get a mark of
zero. Thus, I recommend that you work in exactly the same fashion yourself rather than “interactively”
(i.e., typing commands directly into the R console). Indeed, before submitting any instalment, you should
re-start the R console and run the above command to make sure you get the results you are expecting.
For the purpose of this project, you are not permitted to use any R packages except for haven (for reading
data saved in Stata format) and sandwich (for computing HC standard errors).
Below are some general guidelines for your R code. If you fail to follow of any of these guidelines, you
will get a mark of zero.
• Do use the following as your very first line to ensure R’s memory is cleared: rm(list=ls())
• Do not include any line beginning with > (i.e., lines that you copied from the R console).
• Do not include any calls to the setwd() function.
• Do not include any “path” references when reading in your data. That is, you should have something
like read.table(“data.txt”) rather than read.table(“/Users/JaneDoe/ECN723/data.txt”), and
just manually set the working directory in R to the location where you’ve saved your data file (remember: when I run your code, I will set my working directory to the Google Drive folder containing I have
shared with you, i.e., the folder containing your data file).
• Do not include any calls to the install.packages() function or the remove.packages() function.
However, do make sure to include a call to the library() function for any package(s) you use.
• Do not include any calls to functions that open a graphical interface such as the View() function (you
can use this yourself if you would like, but it will just create an error for me).
• Do not create separate data frames for your treated and non-treated groups. You should have a single
data frame containing all of your observations, and within this data frame, there should be a treatment
variable equal to 1 for observations in the treated group and 0 for observations in the non-treated
group.
• Use the attach() function exactly once (and make sure to do so only after you have “cleaned” your
data).
1Please make sure you understand the difference between “plain text” and “rich text”.
2
Your Write-up
You will need to create a short write-up describing precisely what you have done/found. It must be no more
than 10 pages in length, but all else equal, shorter is better (clearly explain everything you are doing in
detail, but keep it concise).
Your write-up should be written so that it would be easy for another student in this course to read it and
understand exactly what you have done/found. That is, your “target audience” consists of readers who know
something about economics and econometrics, but don’t necessarily know anything about the specific topic
you are writing about (do not assume that your readers have read the article that you obtained your data
from). This means that you can skip explaining straightforward things like how to calculate a T-statistic
and put all of your energy into explaining the design of your experiment, what all of your variables measure,
and what your results tell you about your causal question of interest.
Your write-up must be split into 3 sections:
1. Introduction
This section should very clearly explain what your causal question of interest is, and how your experiment is designed. Make sure to explain exactly what your “treatment” is. This section should be 1 to
2 pages in length.
2. Data and Model
This section should provide a very clear explanation of the model you are estimating and how all
the different variables in it are defined. Be very specific. For example, if your outcome variable is
“TestScore” you need to explain exactly what this is measuring, i.e., what kind of test it is, when the
test took place, what the score is out of, etc. Information about your outcome, treatment, and control
variables should be summarized in a table (call it Table 1; see ecn723-project-sample.pdf for an
example).
You should also include a table here providing the sample mean (and its standard error) of all of these
variables for the entire sample and also for each group (treated and non-treated) separately; this table
should also clearly list the total number of observations as well as the number of observations in each
group (call it Table 2; see ecn723-project-sample.pdf for an example).
In addition to your “full” regression model that includes all of your variables, you will also be required
to estimate a “basic” version of it that does not include your control variables, i.e., a model of the
following form:
Outcomei = α + βTreatmenti + Ui
.
Rather than writing out equations for both models, however, just write out the equation for your full
model and then explain in words that your basic model is identical but excludes the control variables.
You do not need to go into any of the details about your econometric methods, but you should
clearly state what methods you are using. For example, you might tell us that you are estimating the
parameters in your model using OLS and that you are providing us with HC standard error for them.
Finally, make sure to clearly describe exactly what hypothesis you will be testing and how this relates
back to your causal question of interest.
Overall, this section should be 3 to 4 pages in length.
3. Results
This section should clearly describe your results. You should have a table here showing your average
treatment effect estimates (and their standard errors) from your basic and full models (call it Table 3;
see ecn723-project-sample.pdf for an example). Remember that no one cares about the estimates
of α or γ; all that we care about is your estimate of β (the ATE). Most importantly, you need to
formally test the hypothesis you described in Section 2 (do this using the results from both your basic
model and your full model, but base your overall conclusion on the full model as it should provide
a more accurate estimate of the average treatment effect). This section should be about 2 pages in
length.
3
Your write-up does not need a “Conclusions” section or any appendices (remember that I have your R code,
so there is no need to include it in your write-up). You only need the 3 sections (and 3 tables) described
above; no more, no less.
In addition to this outline, you must adhere to the following formatting guidelines:
• Use 1 inch margins on all sides, and number each page inside the bottom margin (centered).
• Use “justified” alignment for all paragraphs (i.e., text stretched out from the left margin to the right
margin).
• Double-space everything (except footnotes and notes for tables, which should be single-spaced). However, do not include an extra space between sections (i.e., there should be exactly one line between the
last word of a section and the title of the next section, not two or three).
• Use a 12 pt font size for everything (except footnotes and notes for tables, which should be 10 pt).
• Do not include a title page. The first line of text should be your main title (centered and in bold), the
second line of text should be your name (centered), and the third line of text should be title of the
first section (left-justified and in bold), and so on.
• Do not indent the first line of the first paragraph of a section, but do indent the first line of each
subsequent paragraph.
• Use bold for your main title, the number/title of each section, and the title of each table, but nowhere
else.
• Use footnotes rather than endnotes.
• Do not paste any R code or output into your write-up.
• Tables should only contain horizontal lines, and these horizontal lines should only be at the top of the
table, after the header row, and at the bottom of the table.
• Above each table, you must write “Table X: Blah blah blah” (without the quotation marks) where
“X” is the table number and “Blah blah blah” is the description.
• Always refer to tables by writing “Table X” (without the quotation marks) where X is the table number
(notice that Table is capitalized). For example, you might write “… are shown in Table 1”.
• You do not need a “References” section since you are only going to cite one paper (the paper you
found your data from). Instead, include a full reference to this paper in a footnote the first time you
mention it, and always refer to it as “Lastname1 and Lastname2 (year)” (if there are two authors) or
“Lastname1 et al. (year)” (if there are 3 or more authors). For example, you might write something
like “Angrist and Lavy (2009) estimate…” or “Banerjee et al. (2015) examine…”. Do not ever write
first names, article titles, or journal names in the main body of text.
All of these formatting rules are demonstrated in the file named ecn723-project-sample.pdf. Please read
it very closely. If you don’t follow these formatting guidelines, I will just stop reading and give you a zero.
Timeline
Again, you must let me know (via email) the paper that you would like to get your data from no later than
January 25 at 8am; if you fail to do so, I will make the choice for you. The important point is that you
need to start working on your first instalment absolutely no later than January 25; if you think you can do
it on the weekend of February 13/14, you are setting yourself up for disaster (you will probably find the
first instalment to be the most difficult one since it will require you to get your data imported into R and
“cleaned up”; indeed, each instalment should be progressively easier for you).
For each instalment, there are a set of minimum tasks that you need to achieve:
4
Instalment Due (8am) Minimum Tasks
1 February 15 -Create the basic layout of your write-up and ensure you have it formatted
properly. If you do not follow the formatting guidelines on this or any other
instalment, you will get a mark of zero.
-Write the entirety of Section 1.
-In Section 2, give a detailed description of what your outcome/treatment/control variables are and complete Table 1.
-After getting rid of any observations with NA values (for any of your variables), compute the number of observations you have in the entire sample
and in both the treated and non-treated groups (you will know you are on
the right path if the total number in these two groups is equal to the number
in the entire sample; you will get a mark of zero if this is not the case). Fill
these values into the bottom row of Table 2.
-Make sure that your data file is uploaded into the Google Drive folder I
share with you.
2 March 8 -Compute your summary statistics and complete Table 2. To check that you
are on the right path, make sure that (a) the sample mean of your treatment
variable for the entire sample is equal to the number of observations in the
treated group divided by the number of observations in the entire sample,
and (b) for every variable, the sample mean for the entire sample lies somewhere between the sample mean for the treated group and the sample mean
for the non-treated group. If either of these conditions is not satisfied, you
will get mark of zero.
-Specify your regression model and describe the hypothesis you will be testing in order to complete Section 2 (this should come after Table 2).
-Read over Section 1 again and spend some time to improve your writing
(please don’t think it is already “perfect”; your writing can always be improved). Do not neglect this step!
3 March 29 -Use OLS to estimate your basic model and fill in the first column of Table
3. To check that you are on the right path, use the numbers in Table 2 to
compute the two-sample T-statistic for comparing the mean of the outcome
variable between the treated and non-treated groups (you can just do this
by hand to check for yourself; do not use the t.test() function in R as
it makes some silly assumptions); the numerator and denominator of this
test statistic should be equal to the estimated coefficient on your treatment
variable and its standard error, respectively (you don’t need to report the
value of this test statistic in your write-up; just compute it in R to check
that you are on the right path). If this condition is not satisfied, you will
get mark of zero.
-Use the results in the first column of Table 3 to test the null hypothesis that
the coefficient on your treatment variable is equal to zero (if you don’t do
this correctly, you will get a mark of zero). Discuss your findings in Section
3 right below Table 3.
-Read over Sections 1 and 2 again and spend some time to improve your
writing. Again, do not neglect this step!
4 April 19 -Use OLS to estimate your full model and fill in the second column of Table
3. Use these results to again test the null hypothesis that the coefficient on
your treatment variable is equal to zero (if you don’t do this correctly, you
will get a mark of zero). Discuss your findings in Section 3, making sure to
compare it to what you found using your basic model. In case your findings
differ, you should base your conclusion on the full model as it should provide
a more accurate estimate of the average treatment effect. Make sure that
Section 3 is very clearly written as you will not have an opportunity to revise
it.
-Read over Sections 1 and 2 again and spend some time to improve your
writing.
5
Remember that paper-2.txt and code-2.txt should be improved/expanded versions of paper-1.txt and
code-1.txt, respectively, and so on. Nothing is “written in stone”; you can add/remove/modify any part
of your code or write-up for any new instalment. For example, even though you will have written your
introduction for the first instalment, you still need to put some effort into improving that section in each
subsequent instalment.
Feedback
Inside the private Google Drive folder I share with you, there will be a file named feedback.txt that I will
use to give you feedback on each instalment (this will be updated within one week of every new submission;
I will make sure to indicate which instalment I am referring to so that there is no confusion). Please make
sure to incorporate all of the feedback I leave into your next instalment. The absolute worst thing you can
possibly do in this course is to ignore my feedback. If I start reading a new instalment and see that you have
ignored the feedback I gave on your previous instalment, I will just stop reading and give you a zero.
6
Image Processing and Deep Learning (EEEM063) DIGITS Introductory Deep Learning Lab Dr John Collomosse – Autumn 2017 Introduction In this lab you will use a web based interface called “DIGITS” to run some simple classification experiments using a convolution neural network (CNN). This lab assumes use of DIGITS v3.0.0 or higher. CNNs are state of the art deep neural networks that perform well at machine perception tasks such as image classification. You will start learning about these in detail within Week 6. To do this work you will need to connect to a teaching server called aineko1 on the campus network. The server is not visible off campus, so you are encouraged to use the lab or library PCs on campus to do this work. If you want to use your own laptop then you will be able to connect via the campus wide “eduroam” wifi network. The campus wide “The Cloud” network will not work. It is possible to connect from offcampus using the https://anywhere.surrey.ac.uk facility and entering the web interface address including http:// into the text box in the upper-right of the portal. The address of the web interface is http://aineko.eps.surrey.ac.uk:34448 Verify now that you can connect to the web interface via your browser, or you will not be able to progress any further. It is possible to install your own version of DIGITS on your local lab PC (e.g. in the Penguin lab). Since we have a large class this year this may be useful if the aineko server becomes very busy. You can refer to the supplementary instructions on SurreyLearn if you find is desirable to do this. If you want to install DIGITS yourself on your own machine then you might find those instructions a useful starting point but note that we do not have the resource to assist 50+ students installing their own DIGITS on their own variants/configurations of Linux so if you try this you are on your own. The aineko server (pictured right) is an Intel i7 PC with 4 Nvidia Titan-X GPUs which power our deep learning experiments. 1. Getting Started We will be working with a dataset of hand-written numbers 0-9, collected by the US postal service from mail. The dataset is called MNIST and contains 70k images each only 28×28 pixels in size. 1 https://en.wikipedia.org/wiki/Accelerando You must first create your own work area on the server and download the database. Remotely login to the teaching server using ssh. In the Linux labs you can open up a terminal window using Ctrl-Alt-T and then enter the following ssh aineko.eps.surrey.ac.uk –l your_username Note that character after the hyphen is a lower case L not a 1! Please substitute username you’re your actual username. The password is your URN. If you are prompted by an “are you sure…” prompt just type the word yes. If you can’t log in the we will have to create you an account. Alternatively you may issue the same command in the Mac OS “Terminal” (usually found under /Applications/Utilities) or use a Windows ssh client such as PuTTY. When you have logged in, please create a folder in the ‘scratch’ area of the server to work in: mkdir /scratch/Teaching/your_username Please note that anyone will be able to access anyone else’s work on this part of the teaching server, so take care not to trample over each others’ files or leave anything sensitive such as coursework submissions in this space. Now download the MNIST dataset in a format that can be used with DIGITS, using the built-in tool cd /opt/DIGITS python -m digits.download_data mnist /scratch/Teaching/your_username/mnist Change into the workspace you created cd /scratch/Teaching/your_username If you type ls to list the files in your workspace you will see that a folder mnist has been created containing folders train and test. Both folders contain subfolders 0,1,2..,9 which contain the images. We will be using train as our training image set (contains around 60k images) which we will show to the CNN during training. We will use test as our test image set (contains around 10k images) which we won’t show to the CNN during training, but will use after training to measure how well the CNN has learned to recognise the ten kinds of digit. In addition to images, each folder train and test contains a pair of text files train.txt or test.txt which is a list of every file in the image set, a space, and then a number which is associated with a class (there are 10 classes, numbered 0-9). One line in the file corresponds to one image file. labels.txt which contains 10 lines each providing a descriptive name for each of the 10 classes – which coincidently in this case also the names 0,1,2,..,9. Take a look inside the files using the Linux cat command to see how they are formatted e.g. cat train/train.txt Remember you can use Ctrl-C to stop if it is scrolling for a long time. cat train/labels.txt Imagine if we were working a different image classification with the ImageNet dataset, which contains 16 million images of 1000 classes of object. We would see numbers in train.txt from 0-999 and then 1000 lines in labels.txt containing the actual names of each class e.g. dog, cat, tree, etc. 2. Import the dataset into DIGITS Go to http://aineko.eps.surrey.ac.uk:34448 and look at the “Datasets” tab – there may or may not be datasets already listed in there from other users. In any case you will be creating your own by following these steps. Click on the blue “Images” button by “New Dataset” and select “Classification” as the dataset type from the dropdown menu. or Now fill in the form you are presented with as per the following page. On the server everyone has access to everything – there are no private work areas. So, it is very important that you name everything using a standard convention. We will create a dataset called ‘yourusername_mnist_dataset’. Make sure that everything you create in DIGITS starts with the prefix yourusername_ You need to click on the “Use Text Files” tab which will use the train.txt etc. files we just inspected as lists from which to build. Note that the dataset name starts myusername_ i.e. it is jc0028_mnist_dataset. Ensure you follow this naming convention to prevent problems with other users. Note that: • images are greyscale and of size 28×28. • we have unchecked “validation” and checked “test”. • we are going to use files already on the teaching server (in your area) rather than uploading them via the browser, so check “Use local paths on server”. • the locations of the training, test and labels text files images… i.e. /scratch/Teaching/yourusername/mnist/train/train.txt /scratch/Teaching/yourusername/mnist/test/test.txt /scratch/Teaching/yourusername/mnist/train/labels.txt • finally note that “image folder (optional)” is filled in with /scratch/Teaching/yourusername/ Click create and you will see some progress bars in blue on the right hand side of the screen. It will take about 60 seconds to create the dataset from the 70k images in MNIST. If you get errors, check you didn’t lead off the trailing / on that last field, and check all spelling. If you click on the word “DIGITS” on the top-left to go home, or go to the original URL, you will see your dataset in the active datasets listed with “Done” (or in progress if you didn’t wait for the blue bars to go to 100%, in which case wait for the job to complete on the main screen) 3. Training a CNN Now we will train a standard CNN called “LeNet” to recognise the numbers 0-9 in the MNIST dataset. This popular yet simple CNN architecture is included as a preset within DIGITS so is easy to try out. On the Models tab/box click on the blue Images button by New Model, and pick Classification. Then fill in the form that appears to tell DIGITS how to run the training. First you need to select the dataset your prepared (which will be easy to find, because you named it using yourusername_ as a prefix). Next you need to select the LeNet CNN Finally you need to name this training job. Again we keep to a careful convention. We will use: yourusername_dataset_network_anythingyouwant So, I have used for example, jc0028_mnist_lenet_exp1 Where exp1 means experiment 1, but you can use anything you like for this. It makes sense to keep a notepad beside the PC to record what settings you used for each experiment, for ease of use later. For now, leave all the other settings as they were originally and click Create. You will see a blue progress bar again on the right, and a graph in the centre of the page which will update itself as training proceeds. After about 60 seconds (30 iterations or to use the terminology “epochs”) of training, the job will be complete. If you click out of this screen back to the DIGITS home page (click on DIGITS on the top-left) you will see a list of all experiments. You will be able to get back into these results by clicking on the old experiment. Later, when you run longer experiments, you can do this to leave a job running on the server and return later to analyse the results. Please do not use more than 1 GPU out of the 4 available for any single job. Your end result should look something like this – a blue graph spanning 30 epochs that converges to zero after about 5 epochs. The first graph is a plot of something called the “training loss”. The second is a plot of the “learning rate”. Both are vs. the epoch number 1-30. We will discuss the meaning of the graphs shortly. NOTE: Choosing a GPU The server you are using has 4 GPUs. You can either let DIGITS choose which GPU to use (the default option) or select GPUs at the bottom of the screen in DIGITS by highlighting them in a list. We advise you let DIGITS choose the GPU for you by not adjusting anything in this section. However sometimes DIGITS will get confused and allocate all jobs to a single GPU and you may see “out of memory” or “cuDNN 0=4” or similarly phrased errors. In this case the GPU is fully loaded and you should select manually a different GPU. If you want to see which GPUs are heavily loaded you can run the nvidia-smi command within the ssh window you created in step 1. Last login: Fri Nov 4 16:28:04 2016 from penguin33.eps.surrey.ac.uk $ nvidia-smi Fri Nov 4 16:55:42 2016 +—————————————————————————–+ | NVIDIA-SMI 361.45.18 Driver Version: 361.45.18 | |——————————-+———————-+———————-+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX TIT… Off | 0000:05:00.0 On | N/A | | 28% 61C P8 18W / 250W | 240MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 1 GeForce GTX TIT… Off | 0000:06:00.0 Off | N/A | | 27% 62C P8 18W / 250W | 24MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 2 GeForce GTX TIT… Off | 0000:09:00.0 Off | N/A | | 28% 61C P8 15W / 250W | 24MiB / 12287MiB | 0% Default | +——————————-+———————-+———————-+ | 3 GeForce GTX TIT… Off | 0000:0A:00.0 Off | N/A | | 50% 81C P2 90W / 250W | 10200MiB / 12287MiB | 53% Default | +——————————-+———————-+———————-+ –+ Here you can see the first three GPUs are idle (the 4th GPU has a job taking up 53% of its capacity) 4. Testing the trained model When you clicked Create to train the LeNet CNN, a process called “supervised training” took place. In supervised training, a classifier (here, a CNN) is shown many examples of images and their correct labels e.g. a picture of a 2 and the label 2. Eventually the classifier learns a model that can be applied to new, unseen data – which we call the “test” data. The supervised training of CNN is performed iteratively. Subsets of labelled training data are shown to the network in training iterations called “epochs”. The CNN should get better at classifying data each epoch. If you followed the steps above correctly, your LeNet CNN was trained for 30 epochs using batches of training data sampled from the 60k images in the MNIST dataset. In LeNet there are 1 million weights internally within the neural network that are configured during training. We call this learned configuration the “model”. The trained model can now be applied to some new data (some or all of the “test” image set) to check how well it is performing. This tests the CNN’s ability to generalise over unseen data i.e. how well it learned. Recall that MNIST contains a test set of 10k images, entirely separate from the training data. We will use some of those images now to test the network. The local path to a particular image in the test set is as follows: /scratch/Teaching/yourusername/mnist/test/n/mmmmm.png Where you can substitute m for any digit 0-9, and mmmmm.png for some 5 digit number e.g. 01570.png to get different test images. Note not all numbers are used. Let’s download a single image to our local workstation from the teaching server. In your terminal window (hit Ctrl-Alt-T in Linux) type the following: scp yourusername@aineko.eps.surrey.ac.uk:/scratch/Teaching/yourusername/mnist/test/0/01570.png . Remember your password is your URN unless you changed it (use passwd to change it) Any problems ensure you haven’t omitted the at the end. Make sure you are using a fresh local terminal window, and aren’t typing in the original window you used to download MNIST since typing in commands there will run them on the server – not on your local machine. This will download the specified test image from your area on teaching server to your home folder. Now you can upload it to DIGITS to see if your trained CNN can correctly recognise the image. Recall your model from the DIGITS home screen and scroll to just below the blue graphs. Above the “Classify One” button, use “choose file” to select the test image you just downloaded. Click on Classify One and upload the image back to the teaching server. You will get a result like: We can see here the image on the left, and the top 5 labels the CNN believes should be associated with that image. In this case (01570.png) the CNN is saying with 98.66% probability that the digit is a zero (correct). Now let’s test a more substantial set of test images. Download the entire 10k image test.txt list: scp deepteach@aineko.uplink.li:/scratch/Teaching/yourusername/mnist/test/test.txt . Now, we need to edit the test.txt file… it will look like somethin this: ./mnist/test/7/00000.png 7 ./mnist/test/2/00001.png 2 ./mnist/test/1/00002.png 1 ./mnist/test/0/00003.png 0 ./mnist/test/4/00004.png 4 ./mnist/test/1/00005.png 1 etc… Use a text editor to modify each line to point to the absolute path of each file i.e. /scratch/Teaching/yourusername/mnist/test/7/00000.png 7 /scratch/Teaching/yourusername/mnist/test/2/00001.png 2 /scratch/Teaching/yourusername/mnist/test/1/00002.png 1 /scratch/Teaching/yourusername/mnist/test/0/00003.png 0 /scratch/Teaching/yourusername/mnist/test/4/00004.png 4 etc. You can delete the rest of the file – there is no need to test all 10k images! Now select the edited test.txt file on your local machine using the “Choose file” button above the “Classify Many” button. Once chose, hit Classify Many. This will sample 100 test images at random from the file and show the output. In this output, the column “Ground Truth” tells you what the test image actually is, and then the top 5 classes are shown in successive columns. Here we can see the CNN performs very well, as the correct class is identified in the first column with near 100% accuracy every time. Congratulations you have now trained and tested your first CNN! 5. Analysing the training of the CNN model Testing your trained model takes time, and we can get some indication as to how well the network learned, without even looking at test data. From the DIGITS home screen, click on your trained model to pull up the training graph again. You should have ended up with a graph similar to: The blue line on the graph is the “training loss”. When the CNN is trained, the weights in the network are adjusted to minimise a mathematical expression called a “loss function.” These come in various forms, but a popular one for classification is the “SoftMax Loss” (discussed in lectures). Training occurs iteratively in “epochs”. Here, we have used 30 epochs of training but it is clear this was excessive as the loss bottomed out at around 5 epochs. During a single epoch of training, a batch of data is sampled at random from all of the training data. Each image in the batch is fed through the CNN with its current configuration weights, to get a classification decision. The loss function measures how “wrong” that decision is.. for example an image of a 4 might be classified as a 5. These losses are combined to produce a score– the loss – which is shown here in blue. The loss is also used internally during training to update the weights of the CNN via a process called back propagation so that it performs better in the next epoch. Initially – in the first epoch – the weights in the CNN are totally random so the loss is very high, but after just a few epochs of training, the loss is much lower (better). If the loss didn’t tend toward zero quickly, and hovered around the same high value despite many epochs of training we say the network has “not converged”. There are several reasons a network might not converge and is the main obstacle to overcome when applying deep learning to your classification problem. Practical reasons for non-convergence include: • Problems with the training data e.g. not diverse enough, not enough of it (soln. more data) • CNN architecture is the wrong design (soln. try other designs) • The learning rate is wrong (soln. try other learning rates) 6. Learning Rate The other graph we saw on the model page plotted the learning rate over time. In the default DIGITS configuration, the learning rate starts high, and automatically reduces as the epoch count rises. The initial learning rate was set in the box “learning rate” on the left of the screen when you created the model – it’s value was 0.01. The learning rate is a critical factor in getting the CNN to train correctly. For every machine learning problem there is a “sweet spot”. Too low, and the CNN will take a very long time to train and may not converge at all. Too high, and the network will converge to a high loss value i.e. not train. Go back to your model e.g. yourusername_mnist_lenet_exp1 and hit the “Clone Job” button. You are now setting up another model for training – so modify the name of the model to exp2. Now, try changing the learning rate to 0.001 i.e. an order of magnitude lower. Similarly try changing to 0.1. What happens? When hunting for a good learning rate, it is normal to vary in orders of magnitude like this rather than 0.01, 0.02 etc. As you can see there are also “advanced learning rate” options which control the stepping down behaviour observed in the graph. 7. Working with a Validation Set Monitoring the training loss graph is an important debugging tool when trying to get your CNN to train properly. However it is often insufficient to predict how well the trained CNN will perform over unseen test data. This is because the network might be learning to classify the training data very well, but will be hopeless at classifying unseen test data. We call this “overfitting” and it is a common problem in training any machine learning system. Overfitting usually occurs if your training data is not sufficiently diverse to capture likely test data scenarios, or you have run the training for too many epochs. To counter this problem, we keep hold back a small amount of training set – call the “validation” set – and we calculate the loss over this validation set too. Whilst it does not impact the training of the network directly, it allows us to see – at each epoch – how the trained network would perform against some unseen data. Normally we would expect both the validation and training loss to converge i.e. go low but then after further epochs the validation loss might go high. That combination i.e. a low training loss and a high validation loss shows us that the network has overfitted. It means we should have stopped the training at an earlier epoch. We will now work with a more challenging dataset that is split 3 ways into train, validation and test data The dataset is called “iCub” and was created by waving 4 different objects in front of iCub robot’s webcam and saving the resulting 3029 video frames as separate images, separated into 4 folders – ball, cube, cup, tractor. Such a dataset could be created using any video camera and free software to break a video file up into individual frames. Step 1 – Create the dataset in DIGITS Create a new Image classification dataset from the DIGITS homepage by clicking on the blue button in the Datasets box, and selecting “classification”. Name the dataset yourusername_icub_dataset. For this dataset you can simply use the “Use Image Folder” tab. In the “Training images” box enter /scratch/Teaching/robotobjects In the % for validation box, enter 25%. In the % for testing, enter 10%. Leave everything else as initially set. Hit Create the build the dataset. Step 2 – Train the CNN Create a new CNN training model from the DIGITS home screen, as you did before. Instead of using LeNet we will select AlexNet – a deeper CNN (has more layers) that you will learn about in lectures. Change the number of training epochs from 30 to 15, and enter a model name under the usual naming convention e.g. yourusername_icub_alexnet_exp1 Training will proceed as before (it will take a few minutes) and your graph will contain not only the training loss (blue) but a further two traces, based on the validation data. The green trace is a loss calculated over the validation data. It does not influence the training process but is a useful monitor to check we are not overfitting. At 15 epochs we can see both nicely converge to zero so we can be confident this is not the case. The yellow trace is the accuracy, which should be roughly the inverse of the green (validation loss) trace. It is the percentage of the validation data that was classified correctly using the CNN at that epoch. In this case we get around 100% at 15 epochs. We can see this training has been a success. In the simple datasets in this labsheet, it is difficult to cause the CNN to overfit. However this is something you should watch for (i.e. blue drops but green starts to rise or yellow starts to drop) in other more substantial classification problems. 8. Resuming training in DIGITS if a job aborts Sometimes a job will (seemingly inexplicably) abort itself in DIGITS. Or, it may be interesting to continue training beyond the epoch count initially specified. In that case, you can Clone Job on the model and resume training by specifying a new job name, and selecting “Previous Networks” tab. Just pick the model corresponding to the job that stopped, and which Epoch you wish to resume from (practically if the job stopped at Epoch n, restart it from n-1 or n-2). Congratulations you have now trained and tested a couple of CNNs for the task of image classification. In the lectures you will learn a lot more about the internals / how CNNs work. This concludes the DIGITS Introductory lab for EEEM063.