Using rating data to predict how much people will like a product is more tricky than it seems. Even though ratings often get treated as if they were a kind of measurement, they are actually a ranking. The difference is not just academic. In this blog post I show how using an appropriate model for such data improves prediction accuracy.
Information with a geographical element can best be visualised with a map. However, big regions tend to dominate maps independent of their actual importance. I show possible ways around this issue and let you generate the right data map for your own purposes without needing to code.
2018 was a wild summer for the rental bike market in Berlin. Many new bike systems pushed into the market in the beginning. By now, two have already left. In between, I counted every rented bike I saw. Which bikes got rented the most?
Using a machine learning algorithm out of the box is problematic when one class in the training set dominates the other. Synthetic Minority Over-sampling Technique (SMOTE) solves this problem. In this tutorial I'll walk you through how SMOTE works and then how the SMOTE function code works.
Is it worth organising your data in a data base if all you are interested in is speed? It depends on what you are doing with the data. This guide teaches you where to expect speed advantages of SQLite and R.
On 24 September Germans will elect a new federal parliament. In this tutorial, I text mine the main parties' election manifestos, derive the latent semantic space and visualise it to see who is closer to whom in German politics.
In just one month the biggest country of Europe, Germany, is going to the polls. In this short tutorial, I text mine the main parties' election manifestos in order to visualise the state of German politics.
There are more than 23,000 Germans studying in the Netherlands. Many of them don’t realise that back in Germany they will be penalised. The reason is foreign grade discrimination. What can be done about it?