An analysis of the rental bike market in Berlin

2018 was a wild summer for the bike rental market in Berlin. In the beginning, many new bicycle systems were introduced to the market. By now, two have already left. In the meantime, I’ve counted every rental bike I saw. Which bikes are rented the most?

Berlin bike rental market in 2018

On May 7, I started counting every rental bike I saw in Berlin. I ride my bike to work and the weather has been amazing this summer so there was a lot to count on. This summer has been very special as six new app-based rental services have entered the market that were previously only known by nextbike (Deezer brand), Call-A-Bike (Lidl brand) and Donky.

By May, the last of the new arrivals (Ofo) kicked off. So I started counting, and the number at the top of this post shows the overall bike rental market share. I share my analysis and explain it in great detail at the end of this post so you can recreate everything I did.

Mo-Bike is one of the most popular bike rental apps in Berlin right now. Nextbike (using the Deezer brand) is in second place, with almost a third fewer in my sample of rented bikes. The Berlin bike rental market leader was Call-A-Bike (under the Lidl brand). They rank only third with a market share of 17%. The latest rookie Ofo quickly caught up to fourth overall.

I dug around a bit in the data and tried to find the differences between the days of the week. Whether it’s the weekend or not, is one bike app preferred over the other? The data here is not entirely clear, but I assume it is not. What is most striking in the next picture is that Mo-Bike dominates all days.

Changing market

After I stopped counting, two bike rental companies left the market: Ofo and O-Bike. I didn’t find the O-Bike end particularly surprising as they had an odd pricing structure (see below), few bicycles on offer, and very few rented bikes in my sample.

But Ofo was a surprise. This newbie did a really good job. However, underneath the overall good performance lies an unexpected development. Ofo pressed hard, thundered, and then went broke. These are the bike rental market shares for the seven weeks during which I counted the rented bikes.

The picture is even clearer when not looking at different weeks but instead at a 7-day rolling average. In the following graph you can see the yellow line of Ofo briefly going up aroung 1 June, even surpassing Mo-Bike. Afterwards it returns to a below 10% market share which is where the company had started off.

Moreover, you can see that I wasn’t in Berlin around 18 May (the gap in the data).

Different pricing structures

How can we explain the different market shares? I believe a huge factor is the availability of bikes and the cost of rentals. Cycling comfort probably only comes third. The only factor I could get some quantifiable information on is cost. The following plot shows the cost of rented bikes from different systems over time.

Lime-E is strikingly expensive. However, it is also the only system offering electric-assist bikes. I predict that if Lime-E stays, these bikes will remain a luxury without much market share.

Mo-Bike dominates the market and the cost-plot shows why. For a twenty minute bike ride, they offer the cheapest deal. Byke is equally cheap and over time even cheaper. However, they didn’t put enough bikes on the streets of Berlin in order to dominate the market. If Mo-Bike manages to somehow turn a profit from these tiny prices, they stand a very good chance of dominating the rental bike market for the foreseeable future.

What the previous figure doesn’t show are the set-up costs of different rental bike systems. Some apps require a yearly subscription fee or a one-time deposit. Once one takes this into account, O-Bike is the most expensive system by far. A 79 EUR deposit is a huge barrier for new customers. It’s really no surprise, O-Bike was so unsuccessful.

The future of rental bikes in Berlin

There are currently huge discussions in Germany as to whether the new app-based rental bike systems ‘litter’ the streets with bikes. I believe we will look back with envy at the summer of 2018 when there was a huge choice of different rental bikes. If the legal framework remains unchanged (i.e. without regulation), Mo-Bike and perhaps one other system should remain on offer. In the future, there will likely be less choice for Berliners and tourists who this summer quickly grew accustomed to cheap, high quality bikes at every corner, waiting for customers with smart phones.

For cyclists, the summer of 2018 was a wild ride.

Time series analysis in Python

We start off the data analysis by importing the pandas module for data analytics and two matplotlib modules for plotting.

 # module import  import pandas as pd  import matplotlib.pyplot as plt  import matplotlib.ticker as mtick   

Next, we set some global parameters, such as the hex colours of the different brands.

 # global parameters  colours = ['#006183', '#7AB538', '#FF3000', '#FFD800', '#00C930', '#00838F', '#FF6A00', '#FF9A24', 'grey']  weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', '0']  plt.style.use('fivethirtyeight')   

We are ready to download the data from github and clean it a bit.

 # data gathering  df = pd.read_excel('https://github.com/ri/bike_rentals/raw/master/Bike-sharing_sample.xlsx',                     skiprows=range(4))    # data cleaning  df.fillna(value=0, inplace=True)    # data types  df.loc[:, 'Date'] = pd.to_datetime(df.loc[:, 'Date'], errors='coerce')  df = df.set_index('Date')   

The first plot we produce is this:

 #overall proportion plot  prop_total = df.loc[:, 'Deezer': 'non-App'].sum()/df.loc[:, 'Deezer': 'non-App'].sum().sum() * 100  prop_total = pd.DataFrame({'market_share':prop_total,                             'colours':colours})  prop_total.sort_values('market_share', inplace=True, ascending=False)  fig_total, ax = plt.subplots(figsize=[8, 6])  prop_total.loc[:, 'market_share'].plot(kind='bar', ax=ax, color=prop_total.loc[:, 'colours'])  ax.set(ylabel='Market share (May and June 2018)', title='Rental bike market in Berlin')    # yticks  ax.set_yticks(range(0, 40, 10))  fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'  yticks = mtick.FormatStrFormatter(fmt)  ax.yaxis.set_major_formatter(yticks)    ax.grid(visible=False, axis='x')  ax.text(0.8, 0.8, 'N = {}'.format(int(df.loc[:, 'Deezer': 'non-App'].sum().sum())),          horizontalalignment='right',          transform=ax.transAxes)  fig_total.text(0.99, 0.01, '@ri', color='grey', style='italic',           horizontalalignment='right')  plt.tight_layout()  fig_total.savefig('rental_shares_total.png')   

Next, we can define a function to plot the market share over time. Unfortunately, the function needs to be tweaked a bit for showing the development during the week (week day analysis) and the development across weeks (calendar week analysis).

 def prop_plotter(df_count, df_prop):        fig, ax = plt.subplots(tight_layout=True, figsize=[10, 6])        df_prop.plot(ax=ax, color=colours, legend=False,                   marker='.', markersize=20,                   linewidth=3)        #x ticks      if df_prop.index.dtype == 'int64' or df_prop.index.dtype == '<m8[ns]'< span="">:  # calendar week          ax.set_xticks(df_prop.index)        labels = [item.get_text() for item in ax.get_xticklabels()]      print(labels)      for i in range(df_prop.shape[0]):          if df_prop.index.dtype == 'int64':  # calendar week              labels[i] = 'N={}nn{}'.format(int(df_count.loc[df_count.index[i], 'sum']),                                              int(df_prop.index[i]))          elif df_prop.index.dtype == 'str':  # weekdays              labels[i + 1] = 'N={}nn{}'.format(int(df_count.loc[df_count.index[i], 'sum']),                                              df_prop.index[i])        ax.set_xticklabels(labels)        # yticks      ax.set_yticks(range(0, 50, 10))      fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'      yticks = mtick.FormatStrFormatter(fmt)      ax.yaxis.set_major_formatter(yticks)        ax.set(ylabel='rental bike market share in Berlin',             title=' ')        # new legend      if df_prop.index.dtype == 'int64':          x_pos = ax.get_xticks()      else:          x_pos = ax.get_xticks()[1:-1]      if df_prop.index.dtype == '<m8[ns]'< span="">:          x_adjustment = len(x_pos)/30      else:          x_adjustment = len(x_pos) / 50      for i, brand in enumerate(df_prop.columns):          print(brand)          ax.text(x=x_pos[-1] + x_adjustment,                  y=df_prop.loc[df_prop.index[-1], brand],                  s=brand,                  color=colours[i],                  verticalalignment='center')        ax.margins(0.15, 0.05)      ax.grid(visible=False, axis='x')        #author line      fig.text(0.99, 0.01, '@ri', color='grey', style='italic',                       horizontalalignment='right')        return fig, ax  </m8[ns]'<></m8[ns]'<> 

Thanks to Python’s time series functionality, one can get to the aggregated data by calendar week pretty quickly. We use DataFrame.resample() and give it the argument "7d" meaning seven days. Because the first entry is a Monday, we aggregate periods from Monday to Sunday. The following code produces this figure:

 df_KW1 = df.resample("7d").sum().loc[:, 'Deezer':'sum']  df_KW1.index = df_KW1.index.week  df_KW1.index.name = 'calendar week'    df_KW1_prop = df_KW1.div(df_KW1['sum'], axis=0).loc[:, :'non-App'] * 100    fig_KW, ax_KW = prop_plotter(df_KW1, df_KW1_prop)  ax_KW.set(title='The rise and fall of Ofo')   

Even the aggregation by day of the week is surprisingly pain free. It makes use of DataFrame.index.weekday which stores which day of the week a DatetimeIndex corresponds to. We can then use DataFrame.groupby() in order to aggregate by this new weekday index. The following code produces this figure:

 df_d = df.copy()  df_d.index = pd.Series(weekdays)[df_d.index.weekday]  df_day1 = df_d.groupby(df_d.index).sum().loc[:, 'Deezer':'sum']  df_day1.index.name = 'week day'    df_day1_prop = df_day1.div(df_day1['sum'], axis=0).loc[:, :'non-App']    df_day1 = df_day1.reindex(weekdays[:-1])  df_day1_prop = df_day1_prop.reindex(weekdays[:-1]) * 100    fig_day, ax_day = prop_plotter(df_day1, df_day1_prop)  ax_day.set(title='Intra-week differences in bike rental market share')  fig_day.savefig('rental_shares_day.png')   

The rolling average analysis uses the DataFrame.rolling() method. We specify a 7 period rolling average for data which we aggregated by day just before. If less than 4 periods during these 7 days are filled, we will get a NaN value.

 df_D = df.resample("D").sum().loc[:, 'Deezer':'sum']  df_rolling = df_D.rolling(7, min_periods=4, center=True).sum()  df_rolling_prop = df_rolling.div(df_rolling['sum'], axis=0).loc[:, :'non-App'] * 100    fig_rolling, ax_rolling = prop_plotter(df_rolling, df_rolling_prop)  ax_rolling.set(title='A volatile bike rental market')  ax_rolling.set(ylabel='rental bike market share in Berlin n(7 day rolling average)',                 xlabel='May                                                  Junen2018')  fig_rolling.savefig('rental_shares_rolling.png')   

Calculating the costs of different bike systems was done thanks to information in this document.

 duration = 120  df_cost = pd.DataFrame(data={'Deezer': [(i//30 + 1) * 1.5 for i in range(duration)],                          'Lidl':[(i//30 + 1) * 1.5 for i in range(30)] + [((i//30 + 1) * 1) + 0.5 for i in range(30, duration)],                          'Mo-Bike':[(i//20 + 1) * 0.5 for i in range(duration)],                          'Ofo':[(i//30 + 1) * 0.8 for i in range(duration)],                          'Lime-E':[(i//1 + 1) * 0.15 + 1 for i in range(duration)],                          'Byke':[(i//30 + 1) * 0.5 for i in range(duration)],                          'Donkey':[(i//30 + 1) * 1.25 for i in range(duration)],                          'O-Bike':[(i//30 + 1) * 1 for i in range(duration)]})  df_cost = df_cost[df.columns[5:-2]]#  reorder columns  df_cost_extra = df_cost.copy()  df_cost_extra['Lidl'] = df_cost_extra['Lidl'] + 3  df_cost_extra['Mo-Bike'] = df_cost_extra['Mo-Bike'] + 2  df_cost_extra['O-Bike'] = df_cost_extra['O-Bike'] + 79   

Having calculated the costs per minute, we can plot them. In order to be able to do that, we define a new function.

 def cost_plotter(df):        fig_cost, ax_cost = plt.subplots(tight_layout=True, figsize=[10, 6])      df.plot(drawstyle="steps-post", linewidth=4, color=colours,               ax=ax_cost,               legend=False)      ax_cost.set(xlabel='Rental duration (minutes)',         ylabel='Cost without deposit or subscription (EUR)',         ylim=[0, 8],         title='Price differences between rental bikes in Berlin')      ax_cost.grid(visible=False, axis='x')        # additional plotting with different line styles      lstyle = [':', ':', ':',                   '-', '-', '-',                   ':', '-', '-']      for i in [1, 2, 0]:          df.iloc[:, i].plot(drawstyle="steps-post",                      linewidth=4, color=colours[i],                      linestyle=lstyle[i],              ax=ax_cost,              legend=False)        # new legend      x_pos = ax_cost.get_xticks()      x_adjustment = len(x_pos)/5      for i, brand in enumerate(df_cost.columns):          if (brand == 'Lime-E') and (df.max().max() == df['Lime-E'].max()):              ax_cost.text(x=x_pos[-2] + x_adjustment,                  y=max(ax_cost.get_yticks()),                  s=brand,                  color=colours[i],                  verticalalignment='center')          else:              ax_cost.text(x=x_pos[-2] + x_adjustment,                  y=df.loc[df.index[-1], brand],                  s=brand,                  color=colours[i],                  verticalalignment='center')        ax_cost.margins(0.15, 0.05)      ax_cost.grid(visible=False, axis=1)        fig_cost.text(0.99, 0.01, '@ri', color='grey', style='italic',                       horizontalalignment='right')      return fig_cost, ax_cost   

Having defined the function, the plotting of the costs can be done with the following code.

 fig_cost, ax_cost = cost_plotter(df_cost)  fig_cost.savefig('rental_costs.png')    fig_cost_extra, ax_cost_extra = cost_plotter(df_cost_extra)  ax_cost_extra.set(ylabel='Cost including deposit or subscription (EUR)',                    ylim=[0, 90],                    title='Price differences between rental bikes including set-up costs')  fig_cost_extra.savefig('rental_costs_extra.png')   

The complete script which I used for this post can be found on github here.

Like this post? Share it with your followers or follow me on Twitter!

Berlin bike rental market in 2018

Changing market

Different pricing structures

The future of rental bikes in Berlin

Time series analysis in Python

Leave a Comment Cancel Reply