Berlin bike rental market in 2018
On May 7, I started counting every rental bike I saw in Berlin. I ride my bike to work and the weather has been amazing this summer so there was a lot to count on. This summer has been very special as six new app-based rental services have entered the market that were previously only known by nextbike (Deezer brand), Call-A-Bike (Lidl brand) and Donky.
By May, the last of the new arrivals (Ofo) kicked off. So I started counting, and the number at the top of this post shows the overall bike rental market share. I share my analysis and explain it in great detail at the end of this post so you can recreate everything I did.
Mo-Bike is one of the most popular bike rental apps in Berlin right now. Nextbike (using the Deezer brand) is in second place, with almost a third fewer in my sample of rented bikes. The Berlin bike rental market leader was Call-A-Bike (under the Lidl brand). They rank only third with a market share of 17%. The latest rookie Ofo quickly caught up to fourth overall.
I dug around a bit in the data and tried to find the differences between the days of the week. Whether it’s the weekend or not, is one bike app preferred over the other? The data here is not entirely clear, but I assume it is not. What is most striking in the next picture is that Mo-Bike dominates all days.
Changing market
After I stopped counting, two bike rental companies left the market: Ofo and O-Bike. I didn’t find the O-Bike end particularly surprising as they had an odd pricing structure (see below), few bicycles on offer, and very few rented bikes in my sample.
But Ofo was a surprise. This newbie did a really good job. However, underneath the overall good performance lies an unexpected development. Ofo pressed hard, thundered, and then went broke. These are the bike rental market shares for the seven weeks during which I counted the rented bikes.
The picture is even clearer when not looking at different weeks but instead at a 7-day rolling average. In the following graph you can see the yellow line of Ofo briefly going up aroung 1 June, even surpassing Mo-Bike. Afterwards it returns to a below 10% market share which is where the company had started off.
Moreover, you can see that I wasn’t in Berlin around 18 May (the gap in the data).
Different pricing structures
How can we explain the different market shares? I believe a huge factor is the availability of bikes and the cost of rentals. Cycling comfort probably only comes third. The only factor I could get some quantifiable information on is cost. The following plot shows the cost of rented bikes from different systems over time.
Lime-E is strikingly expensive. However, it is also the only system offering electric-assist bikes. I predict that if Lime-E stays, these bikes will remain a luxury without much market share.
Mo-Bike dominates the market and the cost-plot shows why. For a twenty minute bike ride, they offer the cheapest deal. Byke is equally cheap and over time even cheaper. However, they didn’t put enough bikes on the streets of Berlin in order to dominate the market. If Mo-Bike manages to somehow turn a profit from these tiny prices, they stand a very good chance of dominating the rental bike market for the foreseeable future.
What the previous figure doesn’t show are the set-up costs of different rental bike systems. Some apps require a yearly subscription fee or a one-time deposit. Once one takes this into account, O-Bike is the most expensive system by far. A 79 EUR deposit is a huge barrier for new customers. It’s really no surprise, O-Bike was so unsuccessful.
The future of rental bikes in Berlin
There are currently huge discussions in Germany as to whether the new app-based rental bike systems ‘litter’ the streets with bikes. I believe we will look back with envy at the summer of 2018 when there was a huge choice of different rental bikes. If the legal framework remains unchanged (i.e. without regulation), Mo-Bike and perhaps one other system should remain on offer. In the future, there will likely be less choice for Berliners and tourists who this summer quickly grew accustomed to cheap, high quality bikes at every corner, waiting for customers with smart phones.
For cyclists, the summer of 2018 was a wild ride.
Time series analysis in Python
We start off the data analysis by importing the pandas
module for data analytics and two matplotlib
modules for plotting.
# module import import pandas as pd import matplotlib.pyplot as plt import matplotlib.ticker as mtick
Next, we set some global parameters, such as the hex colours of the different brands.
# global parameters colours = ['#006183', '#7AB538', '#FF3000', '#FFD800', '#00C930', '#00838F', '#FF6A00', '#FF9A24', 'grey'] weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', '0'] plt.style.use('fivethirtyeight')
We are ready to download the data from github and clean it a bit.
# data gathering df = pd.read_excel('https://github.com/ri/bike_rentals/raw/master/Bike-sharing_sample.xlsx', skiprows=range(4)) # data cleaning df.fillna(value=0, inplace=True) # data types df.loc[:, 'Date'] = pd.to_datetime(df.loc[:, 'Date'], errors='coerce') df = df.set_index('Date')
The first plot we produce is this:
#overall proportion plot prop_total = df.loc[:, 'Deezer': 'non-App'].sum()/df.loc[:, 'Deezer': 'non-App'].sum().sum() * 100 prop_total = pd.DataFrame({'market_share':prop_total, 'colours':colours}) prop_total.sort_values('market_share', inplace=True, ascending=False) fig_total, ax = plt.subplots(figsize=[8, 6]) prop_total.loc[:, 'market_share'].plot(kind='bar', ax=ax, color=prop_total.loc[:, 'colours']) ax.set(ylabel='Market share (May and June 2018)', title='Rental bike market in Berlin') # yticks ax.set_yticks(range(0, 40, 10)) fmt = '%.0f%%' # Format you want the ticks, e.g. '40%' yticks = mtick.FormatStrFormatter(fmt) ax.yaxis.set_major_formatter(yticks) ax.grid(visible=False, axis='x') ax.text(0.8, 0.8, 'N = {}'.format(int(df.loc[:, 'Deezer': 'non-App'].sum().sum())), horizontalalignment='right', transform=ax.transAxes) fig_total.text(0.99, 0.01, '@ri', color='grey', style='italic', horizontalalignment='right') plt.tight_layout() fig_total.savefig('rental_shares_total.png')
Next, we can define a function to plot the market share over time. Unfortunately, the function needs to be tweaked a bit for showing the development during the week (week day analysis) and the development across weeks (calendar week analysis).
def prop_plotter(df_count, df_prop): fig, ax = plt.subplots(tight_layout=True, figsize=[10, 6]) df_prop.plot(ax=ax, color=colours, legend=False, marker='.', markersize=20, linewidth=3) #x ticks if df_prop.index.dtype == 'int64' or df_prop.index.dtype == '<m8[ns]'< span="">: # calendar week ax.set_xticks(df_prop.index) labels = [item.get_text() for item in ax.get_xticklabels()] print(labels) for i in range(df_prop.shape[0]): if df_prop.index.dtype == 'int64': # calendar week labels[i] = 'N={}nn{}'.format(int(df_count.loc[df_count.index[i], 'sum']), int(df_prop.index[i])) elif df_prop.index.dtype == 'str': # weekdays labels[i + 1] = 'N={}nn{}'.format(int(df_count.loc[df_count.index[i], 'sum']), df_prop.index[i]) ax.set_xticklabels(labels) # yticks ax.set_yticks(range(0, 50, 10)) fmt = '%.0f%%' # Format you want the ticks, e.g. '40%' yticks = mtick.FormatStrFormatter(fmt) ax.yaxis.set_major_formatter(yticks) ax.set(ylabel='rental bike market share in Berlin', title=' ') # new legend if df_prop.index.dtype == 'int64': x_pos = ax.get_xticks() else: x_pos = ax.get_xticks()[1:-1] if df_prop.index.dtype == '<m8[ns]'< span="">: x_adjustment = len(x_pos)/30 else: x_adjustment = len(x_pos) / 50 for i, brand in enumerate(df_prop.columns): print(brand) ax.text(x=x_pos[-1] + x_adjustment, y=df_prop.loc[df_prop.index[-1], brand], s=brand, color=colours[i], verticalalignment='center') ax.margins(0.15, 0.05) ax.grid(visible=False, axis='x') #author line fig.text(0.99, 0.01, '@ri', color='grey', style='italic', horizontalalignment='right') return fig, ax </m8[ns]'<></m8[ns]'<>
Thanks to Python’s time series functionality, one can get to the aggregated data by calendar week pretty quickly. We use DataFrame.resample()
and give it the argument "7d"
meaning seven days. Because the first entry is a Monday, we aggregate periods from Monday to Sunday. The following code produces this figure:
df_KW1 = df.resample("7d").sum().loc[:, 'Deezer':'sum'] df_KW1.index = df_KW1.index.week df_KW1.index.name = 'calendar week' df_KW1_prop = df_KW1.div(df_KW1['sum'], axis=0).loc[:, :'non-App'] * 100 fig_KW, ax_KW = prop_plotter(df_KW1, df_KW1_prop) ax_KW.set(title='The rise and fall of Ofo')
Even the aggregation by day of the week is surprisingly pain free. It makes use of DataFrame.index.weekday
which stores which day of the week a DatetimeIndex
corresponds to. We can then use DataFrame.groupby()
in order to aggregate by this new weekday index. The following code produces this figure:
df_d = df.copy() df_d.index = pd.Series(weekdays)[df_d.index.weekday] df_day1 = df_d.groupby(df_d.index).sum().loc[:, 'Deezer':'sum'] df_day1.index.name = 'week day' df_day1_prop = df_day1.div(df_day1['sum'], axis=0).loc[:, :'non-App'] df_day1 = df_day1.reindex(weekdays[:-1]) df_day1_prop = df_day1_prop.reindex(weekdays[:-1]) * 100 fig_day, ax_day = prop_plotter(df_day1, df_day1_prop) ax_day.set(title='Intra-week differences in bike rental market share') fig_day.savefig('rental_shares_day.png')
The rolling average analysis uses the DataFrame.rolling()
method. We specify a 7 period rolling average for data which we aggregated by day just before. If less than 4 periods during these 7 days are filled, we will get a NaN
value.
df_D = df.resample("D").sum().loc[:, 'Deezer':'sum'] df_rolling = df_D.rolling(7, min_periods=4, center=True).sum() df_rolling_prop = df_rolling.div(df_rolling['sum'], axis=0).loc[:, :'non-App'] * 100 fig_rolling, ax_rolling = prop_plotter(df_rolling, df_rolling_prop) ax_rolling.set(title='A volatile bike rental market') ax_rolling.set(ylabel='rental bike market share in Berlin n(7 day rolling average)', xlabel='May Junen2018') fig_rolling.savefig('rental_shares_rolling.png')
Calculating the costs of different bike systems was done thanks to information in this document.
duration = 120 df_cost = pd.DataFrame(data={'Deezer': [(i//30 + 1) * 1.5 for i in range(duration)], 'Lidl':[(i//30 + 1) * 1.5 for i in range(30)] + [((i//30 + 1) * 1) + 0.5 for i in range(30, duration)], 'Mo-Bike':[(i//20 + 1) * 0.5 for i in range(duration)], 'Ofo':[(i//30 + 1) * 0.8 for i in range(duration)], 'Lime-E':[(i//1 + 1) * 0.15 + 1 for i in range(duration)], 'Byke':[(i//30 + 1) * 0.5 for i in range(duration)], 'Donkey':[(i//30 + 1) * 1.25 for i in range(duration)], 'O-Bike':[(i//30 + 1) * 1 for i in range(duration)]}) df_cost = df_cost[df.columns[5:-2]]# reorder columns df_cost_extra = df_cost.copy() df_cost_extra['Lidl'] = df_cost_extra['Lidl'] + 3 df_cost_extra['Mo-Bike'] = df_cost_extra['Mo-Bike'] + 2 df_cost_extra['O-Bike'] = df_cost_extra['O-Bike'] + 79
Having calculated the costs per minute, we can plot them. In order to be able to do that, we define a new function.
def cost_plotter(df): fig_cost, ax_cost = plt.subplots(tight_layout=True, figsize=[10, 6]) df.plot(drawstyle="steps-post", linewidth=4, color=colours, ax=ax_cost, legend=False) ax_cost.set(xlabel='Rental duration (minutes)', ylabel='Cost without deposit or subscription (EUR)', ylim=[0, 8], title='Price differences between rental bikes in Berlin') ax_cost.grid(visible=False, axis='x') # additional plotting with different line styles lstyle = [':', ':', ':', '-', '-', '-', ':', '-', '-'] for i in [1, 2, 0]: df.iloc[:, i].plot(drawstyle="steps-post", linewidth=4, color=colours[i], linestyle=lstyle[i], ax=ax_cost, legend=False) # new legend x_pos = ax_cost.get_xticks() x_adjustment = len(x_pos)/5 for i, brand in enumerate(df_cost.columns): if (brand == 'Lime-E') and (df.max().max() == df['Lime-E'].max()): ax_cost.text(x=x_pos[-2] + x_adjustment, y=max(ax_cost.get_yticks()), s=brand, color=colours[i], verticalalignment='center') else: ax_cost.text(x=x_pos[-2] + x_adjustment, y=df.loc[df.index[-1], brand], s=brand, color=colours[i], verticalalignment='center') ax_cost.margins(0.15, 0.05) ax_cost.grid(visible=False, axis=1) fig_cost.text(0.99, 0.01, '@ri', color='grey', style='italic', horizontalalignment='right') return fig_cost, ax_cost
Having defined the function, the plotting of the costs can be done with the following code.
fig_cost, ax_cost = cost_plotter(df_cost) fig_cost.savefig('rental_costs.png') fig_cost_extra, ax_cost_extra = cost_plotter(df_cost_extra) ax_cost_extra.set(ylabel='Cost including deposit or subscription (EUR)', ylim=[0, 90], title='Price differences between rental bikes including set-up costs') fig_cost_extra.savefig('rental_costs_extra.png')
The complete script which I used for this post can be found on github here.
Like this post? Share it with your followers or follow me on Twitter!