Home Archive Wiki About
Home Archive Wiki About

# Tracking thesis progress

Posted on 2016-09-22

I’m currently writing my master’s thesis. I’d like to have some idea of how to answer questions like: How well am I doing this week? Have I been stalled for some time? How fast can I write when I’m in the zone? How long until I’m finished?

To help me answer those, I’m gathering data about it using a cron script that logs the number of pages of my the PDF of my thesis and the number of lines of code I’ve written every hour.

This is a technique I copied from a video from the excellent screencast series Destroy All Software. The author uses it in several videos. One variant of this data collection idea he uses is going over the Git history of a project and collecting data ex post by checking out every historic revision and running a data collection script over it.

My use of this technique lets me produce a pretty graph that looks like this:

Other really nice uses of this technique include asking:

• How many tests were failing in each revision?
• Which authors commited how many lines of code, over time?
• How has the speed of a POST request to /clients/create evolved over time? Which changes affected it positively and negatively?
• How has the speed of my website’s rendering evolved over time?

My crontab line looks like this:

0 * * * * log-pagecount

The log-pagecount script in question is dead simple:

#!/bin/bash
set -e

cd /home/prvak/master/text
make >/dev/null
COUNT=pdfinfo /home/prvak/master/text/thesis.pdf | grep Pages: | cut -d\  -f11
DATE=date +%Y%m%d%H%M
echo $DATE$COUNT >> /home/prvak/misc/master-pagecount-log.log

SLOC_LOGFILE=/home/prvak/misc/master-sloc-log.log
echo "---- CUT ----" >> $SLOC_LOGFILE echo$DATE >> $SLOC_LOGFILE cloc /home/prvak/master/code >>$SLOC_LOGFILE

The lines-of-code counting is done by the cloc tool (AlDanial/cloc on GitHub). In Arch Linux, you can install it by running pacman -S cloc.

Then, in ~/misc/sloc-count-to-log.py, I have a simple script that changes the format of the lines-of-count logfile:

#!/usr/bin/python
with open('master-sloc.tsv', 'w') as f:
for line in open('master-sloc-log.log'):
line = line.strip()
if line.startswith('SUM'):
f.write("%s %s\n" % (date, line.split(' ')[-1]))
if line.startswith('2016'):
date = line

Then, I have this Gnuplot script in ~/misc/plot-thesis-stats.gnuplot:

set datafile separator ' '
set xdata time
set y2tics
set timefmt "%Y%m%d%H%M"
set key right bottom
plot \
"master-pagecount-log.log" u 1:2 w lines axes x1y1 title "Pages", \
"master-sloc.tsv" u 1:2 w lines axes x1y2 title "SLOC"

To see the pretty graph, I do this:

~ $cd ~/misc ~/misc$ gnuplot
(...)
gnuplot> load 'plot-thesis-stats.gnuplot'

I have enjoyed procrastinating by writing this post more than looking at my script munching away training samples for my classifiers.

Enjoy!