Differences between cyclists in a bike-share

Introduction

This scenario introduces us to a fictional bike-sharing company Cyclistic, operating in the Chicago, IL, area. The business wants to move casual riders to its membership pool to drive recurring revenue. The temperature in December is typically -6 to 0 degrees Celsius, so it makes sense to secure stable subscription revenue in a business category that is so weather-dependent.

As a part of the Marketing Analytics team, I am helping figure out the strategies and communications to set this migration in motion. Therefore, the analysis of this case study is looking to answer a particular question:

How do annual members and casual riders use Cyclistic bikes differently?

For such a broad question, to look at more tangible examples, I need to understand what the available data entails.

Preparation

As this is a Google Data Analyst Professional Certificate case study, Google graciously provided all data in .csv format, seemingly machine-generated, divided by months. That would eliminate most human errors, which would make cleaning more straightforward. I will look at the data from November 2021 to October 2022. I've stored the dataset on bit.io to access all 12 months using SQL queries.

As I look at the table, there are a few distinct parameter groups that I can use for my analysis:

As suspected, there were no issues with duplicates, but once I started to look at null values, inconsistencies began to appear.

Unfortunately, when investigating columns with location distinctions, a good ⅕ of the data came up with null values. Other formatting inconsistencies were apparent in station ID parameters. There are a few different formatting options based on the various stations - 5-digit, two lead letters with nine digits, two letters with a dash plus three numbers, and just three digits. Wide variety and complexity. I have some presumptions about what is happening there, but this would require further investigation.

Let's take a step back and then reevaluate. The reliability of location data so far is questionable unless I investigate the coordinates. There are still rideable type and timestamp parameters. And timestamps alone can give valuable information. To refocus on the business task, even not including location data, I have enough to investigate this further based on the initial findings. I recommend further analysis of location-related details to get more granular marketing targeting information.

Hence I've chosen to look at three separate lines of inquiry:

After looking at these questions, I will better understand the differences between these cohorts and how significant they may be.

I started my initial manipulations by looking at only one month, as the total dataset is over 5M records. While Questions 1 and 3 generally don't require any calculated columns, Question 2 needs a ride_length parameter. I create it by subtracting the start timestamp from the end timestamp.

As I looked at the ride length distribution, I realised that I needed to filter the data before analysing it further. I can see a significant long tail and a big spike in under 1 minute. The increase in ride lengths under a minute can be explained by user experience errors or connection issues, as the ride essentially finishes before it starts. That is why there is a significant drop-off at around 1 minute and a subsequent increase.

I determined where to cut off the long tail by filtering out the member rides. They stop around the 24-hour mark, so I will cut it there to keep sets equal. Anything over 24 hours is an outlier, as they represent under 1% of records.

Analysis

Q1: Are there different preferences for a type of bike?

I will look at the differences over three different timescales - the hour of the day, day of the week and month of the year.

Looking at the data, I can immediately highlight the first conclusion: Docked bikes are completely unused by the subscriber base, likely due to the fixed nature of dockings, which limits access to their destinations. Therefore I've excluded any data about it because it is irrelevant to our comparison.

Looking at the hourly scale, we can see spikes in member usage around 8:00 and 17:00, which are clear indicators of commuting for those working between the hours of 8 and 17. Members mainly use Cyclistic to commute to and from their workplaces. Additionally, members use classic bikes more at peak hours. Unfortunately, I need the data to ascertain whether this has anything to do with bike availability.

In contrast, the peak of use for casual riders is at 17:00, which suggests travelling to post-workday social or personal activities. In addition, casual riders prefer electric bikes during the average day.

Similarly to the hourly splits, members are relatively indifferent about which bike to use, whether electric or casual. Also, bike usage declines over the weekend, which suggests different lifestyle patterns.

Comparatively, casual riders are more active as the weekend arrives, with a significant demand increase on Saturday, mainly suggesting recreational use. This data view also reinforces casual riders' preference for electric bikes.

One key thing visible immediately is a significant swing around July in bike choices towards electric bikes. I suspect this is due to increased availability, but I cannot corroborate this.

With that in mind, this would explain the significant gap between e-bike and casual bike usage in June, as a supposed increase in electric bike capacity in July collapsed the gap. More importantly, though, after increased electric bike availability, electric bikes became more popular than classic bikes for member drivers than before.

Casual riders exhibited a similar pattern to member riders between June and July but exited the summer months with a clear preference for electric bikes.

To conclude on different preferences for bike types:

Q2: When do either of the groups choose to cycle?

As with the previous question, I will examine this inquiry at multiple levels - the hour of the day and the day of the week, but within a monthly framework.

When looking at the data for the year overall, the expected trends show - low use in the winter months and high use in the summer months. But let's take a closer look, in particular, at the winter and summer months separately to spot any emerging trends.

As evidenced by the spike pattern, member cyclists stick to their commuting patterns throughout the winter months. However, the frequency was reduced by 52% from 175k to 84k for member cyclists. Casual cyclists experienced a 74% reduction in the same period, explained by the average air temperature of -4.92 ℃ recorded in January 2022 (as per Weather Underground).

As mentioned earlier in the analysis, casual cyclists also see peak demand in the late afternoon, likely due to choosing Cyclistic for social trips.

Looking at the summer period, the member group continues the commuter pattern, with a more distinct afternoon peak, signifying the member cohort's probable "commuting home by bike" subsection. Similar behaviour is visible in the casual rider numbers, as they match afternoon demand more closely than in the winter months.

In our winter crosssection, looking at the weekday timescale, member riders showed a similar pattern to what was visible earlier in our analysis - a slowdown in demand on the weekend compared to the working week.

Detailed analysis revealed that the low demand for both groups starting the weekend before Christmas had skewed the totals towards the two work weeks at the start of the month.

Similar patterns, members during the workweek - casuals during weekends, continue during the summer months, with a significant demand spike on July weekends from casual riders. Growing demand also coincides with increased electric bike popularity, which may explain the growth. Further investigation is necessary for a more conclusive answer.

As the seasons turn more cycling-friendly, the scale of demand increase also intensifies, as, in July, low-demand Mondays grow by 43% to Saturday highs. In contrast, low Sundays grow to high Thursdays during freezing January by only 30% - a signifier of weak casual rider demand.

Finally, in this reporting period, from November 2021 to October 2022, 80.63% of rides happened from April until October, with 76.28% of member rides and 86.92% of casual rides, respectively. While there are other reasons, like the increase in electric bike availability in July, this factoid emphasises the difference in demand between the seasons. Additionally, it highlights how flexible that demand is across the casual group.

To conclude:

Q3: What are the differences in time spent 'on the bike'?

For the final angle of inquiry, I will be looking at the duration of rides between the user groups.

When plotting out all intervals, it is apparent that most rides are under about half an hour. So to have a better understanding of these differences, let's zoom in.

I have filtered for the top 75% in this view, essentially dropping the lower quartile. I also simplified the curve by rounding results within a minute and added back the rideables to visualise the locomotion differences between them.

Firstly, electric cycles have shorter ride lengths, as evidenced by the peak frequency of 4 minutes for members and 6 minutes for casuals. Shorter ride lengths allow bicycles to be returned to vacancy faster and generate more rides.

Secondly, the top 50% of members' rides are under 10 minutes. Casual riders, in comparison, spend from 4 to 13 minutes when selecting the top half of the set.

Furthermore, I want to look at whether there are time differences depending on the day of the week. I have filtered these views for the top 50% to highlight the most frequent uses.

Ride lengths are uniform throughout an average week when looking at the member ride length frequencies beyond the previously mentioned slowdown in demand over the weekend. Members might be sticking to the same destinations, as they are primarily commuters, but a more in-depth analysis is required for that to be proven. Entirely anecdotally, everyone seems to rush on Mondays, as the frequencies after peak reduce relatively steeply. That isn't a fact without further analysis, however.

More interestingly, casual riders exhibit a more pronounced demand for longer rides during peak days over the weekend. I need to determine whether that is due to a slower pace or increased distance. However, rides over the weekend peak at longer intervals, approximately 8 minutes, compared to 6 to 7 minutes during the week. They also maintain that length since the top half of records stay above the initial frequency set at 4 minutes. However, weekday records have a lower incidence at the maximum variance than the 4-minute bottom mark.

To conclude:

Conclusion and recommendations

After all this slicing and dicing, it is important to conclude and pull together some valuable insights.

To answer the main objective - how do annual members and casual riders use Cyclistic bikes differently, let’s look again at the three avenues of inquiry I presented earlier.

For our first question, I concluded that docked bikes are unused by the member group. I also found out that while members have no clear preferences over the type of bike initially, as the electric bike availability increased so did their preference of them within the member group.

For our second question, the conclusions were that members are primarily commuters, whereas casual riders would cycle more over the weekends. I also understood that the majority of demand is dependent on weather, as 80% of the trips happened within the warmer months of the year, with casual rider demand being very flexible.

Thirdly, electric bikes have shorter ride lengths, therefore increasing individual bike turnover and enabling more rides with the same fleet. Most frequent member cyclists spend under 10 minutes from departure to destination, with little variance in times, whereas casual riders have longer trips and skew towards longer trips over weekends.

To coalesce these findings, these are a few main takeaways and recommendations:

Finally, these would be some of the areas worth looking into for further analysis:

Thank you for your time.