Rich Data     About/Imprint     CV     Publications     Blog Archive     Blog Feed

Working hours of a start-up employee

200 lines of code (Python)

From November 2017 until July 2019 I worked for a seed-funded Berlin start-up. During most of this time I tracked my working hours. In this post I analyse these data.

Development of extra hours during one and a half years

Read more

The surprisingly good performance of dumb classification algorithms

140 lines of code (Python)

When evaluating binary classification algorithms it is a good idea to have a baseline for the performance measures. In this blog post I calculate the classification performance of really dumb classifiers. These models do not use any feature information. If your own classification model performs just like them, there is a problem.

Summary of F1-scores of dumb classifiers

Read more

Predicting typical completion rates of online courses

140 lines of code (Python)

Massive open online courses (MOOCs) did not revolutionize education. Why? They suffer from abysmal completion rates. Most students start a MOOC without finishing it. In this blog post I take a look at what my own company's e-learning course completion rates would be if we offered standard MOOCs.

alt text

Read more

Modelling rating data correctly using ordered logistic regression

70 lines of code (Python)

Using rating data to predict how much people will like a product is more tricky than it seems. Even though ratings often get treated as if they were a kind of measurement, they are actually a ranking. The difference is not just academic. In this blog post I show how using an appropriate model for such data improves prediction accuracy.

alt text

Read more

Creating the right data map

280 lines of code (R)

Information with a geographical element can best be visualised with a map. However, big regions tend to dominate maps independent of their actual importance. I show possible ways around this issue and let you generate the right data map for your own purposes without needing to code.

alt text

Read more

An analysis of the rental bike market in Berlin

200 lines of code (Python)

2018 was a wild summer for the rental bike market in Berlin. Many new bike systems pushed into the market in the beginning. By now, two have already left. In between, I counted every rented bike I saw. Which bikes got rented the most?

alt text

Read more

Starting off in data science

200 lines of code (R)

A little more than a year ago, I decided to pursue a career in data science. Today, I work as an educational data scientist for StackFuel, a small start-up in Berlin. How did I do it?

alt text

Read more

SQL versus R - who is faster?

225 lines of code (R)

Is it worth organising your data in a data base if all you are interested in is speed? It depends on what you are doing with the data. This guide teaches you where to expect speed advantages of SQLite and R.

alt text

Read more