Differences between cyclists in a bike-share
Introduction
This scenario introduces us to a fictional bike-sharing company Cyclistic, operating in the Chicago, IL, area. The business wants to move casual riders to its membership pool to drive recurring revenue. The temperature in December is typically -6 to 0 degrees Celsius, so it makes sense to secure stable subscription revenue in a business category that is so weather-dependent.
As a part of the Marketing Analytics team, I am helping figure out the strategies and communications to set this migration in motion. Therefore, the analysis of this case study is looking to answer a particular question:
How do annual members and casual riders use Cyclistic bikes differently?
For such a broad question, to look at more tangible examples, I need to understand what the available data entails.
Preparation
As this is a Google Data Analyst Professional Certificate case study, Google graciously provided all data in .csv format, seemingly machine-generated, divided by months. That would eliminate most human errors, which would make cleaning more straightforward. I will look at the data from November 2021 to October 2022. I've stored the dataset on bit.io to access all 12 months using SQL queries.
As I look at the table, there are a few distinct parameter groups that I can use for my analysis:
A record of the type of cycle
Time-related data - timestamps
Geographical data with location distinctions
Geographical coordinate data
A column indicating whether a member or a casual cyclist generated the entry
As suspected, there were no issues with duplicates, but once I started to look at null values, inconsistencies began to appear.
Unfortunately, when investigating columns with location distinctions, a good ⅕ of the data came up with null values. Other formatting inconsistencies were apparent in station ID parameters. There are a few different formatting options based on the various stations - 5-digit, two lead letters with nine digits, two letters with a dash plus three numbers, and just three digits. Wide variety and complexity. I have some presumptions about what is happening there, but this would require further investigation.
Let's take a step back and then reevaluate. The reliability of location data so far is questionable unless I investigate the coordinates. There are still rideable type and timestamp parameters. And timestamps alone can give valuable information. To refocus on the business task, even not including location data, I have enough to investigate this further based on the initial findings. I recommend further analysis of location-related details to get more granular marketing targeting information.
Hence I've chosen to look at three separate lines of inquiry:
Are there different preferences for a type of bike?
When does either of the groups want to cycle?
What are the differences in time spent 'on the bike'?
After looking at these questions, I will better understand the differences between these cohorts and how significant they may be.
I started my initial manipulations by looking at only one month, as the total dataset is over 5M records. While Questions 1 and 3 generally don't require any calculated columns, Question 2 needs a ride_length parameter. I create it by subtracting the start timestamp from the end timestamp.
As I looked at the ride length distribution, I realised that I needed to filter the data before analysing it further. I can see a significant long tail and a big spike in under 1 minute. The increase in ride lengths under a minute can be explained by user experience errors or connection issues, as the ride essentially finishes before it starts. That is why there is a significant drop-off at around 1 minute and a subsequent increase.
I determined where to cut off the long tail by filtering out the member rides. They stop around the 24-hour mark, so I will cut it there to keep sets equal. Anything over 24 hours is an outlier, as they represent under 1% of records.
Analysis
Q1: Are there different preferences for a type of bike?
I will look at the differences over three different timescales - the hour of the day, day of the week and month of the year.
Looking at the data, I can immediately highlight the first conclusion: Docked bikes are completely unused by the subscriber base, likely due to the fixed nature of dockings, which limits access to their destinations. Therefore I've excluded any data about it because it is irrelevant to our comparison.
Looking at the hourly scale, we can see spikes in member usage around 8:00 and 17:00, which are clear indicators of commuting for those working between the hours of 8 and 17. Members mainly use Cyclistic to commute to and from their workplaces. Additionally, members use classic bikes more at peak hours. Unfortunately, I need the data to ascertain whether this has anything to do with bike availability.
In contrast, the peak of use for casual riders is at 17:00, which suggests travelling to post-workday social or personal activities. In addition, casual riders prefer electric bikes during the average day.
Similarly to the hourly splits, members are relatively indifferent about which bike to use, whether electric or casual. Also, bike usage declines over the weekend, which suggests different lifestyle patterns.
Comparatively, casual riders are more active as the weekend arrives, with a significant demand increase on Saturday, mainly suggesting recreational use. This data view also reinforces casual riders' preference for electric bikes.
One key thing visible immediately is a significant swing around July in bike choices towards electric bikes. I suspect this is due to increased availability, but I cannot corroborate this.
With that in mind, this would explain the significant gap between e-bike and casual bike usage in June, as a supposed increase in electric bike capacity in July collapsed the gap. More importantly, though, after increased electric bike availability, electric bikes became more popular than classic bikes for member drivers than before.
Casual riders exhibited a similar pattern to member riders between June and July but exited the summer months with a clear preference for electric bikes.
To conclude on different preferences for bike types:
Docked bikes are unused by members, likely due to inflexibility.
On average, members choose electric and classic bikes equally.
Casual riders have a clear preference for electric bikes.
As electric bikes became more readily available in July 2022, electric rideables are an emerging choice for members.
Q2: When do either of the groups choose to cycle?
As with the previous question, I will examine this inquiry at multiple levels - the hour of the day and the day of the week, but within a monthly framework.
When looking at the data for the year overall, the expected trends show - low use in the winter months and high use in the summer months. But let's take a closer look, in particular, at the winter and summer months separately to spot any emerging trends.
As evidenced by the spike pattern, member cyclists stick to their commuting patterns throughout the winter months. However, the frequency was reduced by 52% from 175k to 84k for member cyclists. Casual cyclists experienced a 74% reduction in the same period, explained by the average air temperature of -4.92 ℃ recorded in January 2022 (as per Weather Underground).
As mentioned earlier in the analysis, casual cyclists also see peak demand in the late afternoon, likely due to choosing Cyclistic for social trips.
Looking at the summer period, the member group continues the commuter pattern, with a more distinct afternoon peak, signifying the member cohort's probable "commuting home by bike" subsection. Similar behaviour is visible in the casual rider numbers, as they match afternoon demand more closely than in the winter months.
In our winter crosssection, looking at the weekday timescale, member riders showed a similar pattern to what was visible earlier in our analysis - a slowdown in demand on the weekend compared to the working week.
Detailed analysis revealed that the low demand for both groups starting the weekend before Christmas had skewed the totals towards the two work weeks at the start of the month.
Similar patterns, members during the workweek - casuals during weekends, continue during the summer months, with a significant demand spike on July weekends from casual riders. Growing demand also coincides with increased electric bike popularity, which may explain the growth. Further investigation is necessary for a more conclusive answer.
As the seasons turn more cycling-friendly, the scale of demand increase also intensifies, as, in July, low-demand Mondays grow by 43% to Saturday highs. In contrast, low Sundays grow to high Thursdays during freezing January by only 30% - a signifier of weak casual rider demand.
Finally, in this reporting period, from November 2021 to October 2022, 80.63% of rides happened from April until October, with 76.28% of member rides and 86.92% of casual rides, respectively. While there are other reasons, like the increase in electric bike availability in July, this factoid emphasises the difference in demand between the seasons. Additionally, it highlights how flexible that demand is across the casual group.
To conclude:
Member cyclists primarily use their bikes to commute, with distinct demand increases around commuting times - 8:00 and 17:00, with a decrease in use over the weekend.
Usually, casual users use the service during social hours after 17:00 and on weekends.
The vast majority of demand across the year is over the warm period - from April to October.
Casual rider demand is more flexible annually, as only about 13% of rides happen from November to March.
Members stay on the commuter schedule throughout the year.
Q3: What are the differences in time spent 'on the bike'?
For the final angle of inquiry, I will be looking at the duration of rides between the user groups.
When plotting out all intervals, it is apparent that most rides are under about half an hour. So to have a better understanding of these differences, let's zoom in.
I have filtered for the top 75% in this view, essentially dropping the lower quartile. I also simplified the curve by rounding results within a minute and added back the rideables to visualise the locomotion differences between them.
Firstly, electric cycles have shorter ride lengths, as evidenced by the peak frequency of 4 minutes for members and 6 minutes for casuals. Shorter ride lengths allow bicycles to be returned to vacancy faster and generate more rides.
Secondly, the top 50% of members' rides are under 10 minutes. Casual riders, in comparison, spend from 4 to 13 minutes when selecting the top half of the set.
Furthermore, I want to look at whether there are time differences depending on the day of the week. I have filtered these views for the top 50% to highlight the most frequent uses.
Ride lengths are uniform throughout an average week when looking at the member ride length frequencies beyond the previously mentioned slowdown in demand over the weekend. Members might be sticking to the same destinations, as they are primarily commuters, but a more in-depth analysis is required for that to be proven. Entirely anecdotally, everyone seems to rush on Mondays, as the frequencies after peak reduce relatively steeply. That isn't a fact without further analysis, however.
More interestingly, casual riders exhibit a more pronounced demand for longer rides during peak days over the weekend. I need to determine whether that is due to a slower pace or increased distance. However, rides over the weekend peak at longer intervals, approximately 8 minutes, compared to 6 to 7 minutes during the week. They also maintain that length since the top half of records stay above the initial frequency set at 4 minutes. However, weekday records have a lower incidence at the maximum variance than the 4-minute bottom mark.
To conclude:
Electric cycles have shorter ride lengths, making cycles available earlier and increasing turnover.
Most frequent member cyclists have ride lengths under 10 minutes, and most rides stay at the same length throughout an average week.
Casual riders ride for longer and have more rides at longer times, especially over the weekend.
Conclusion and recommendations
After all this slicing and dicing, it is important to conclude and pull together some valuable insights.
To answer the main objective - how do annual members and casual riders use Cyclistic bikes differently, let’s look again at the three avenues of inquiry I presented earlier.
Are there different preferences for a type of bike?
When do either of the groups choose to cycle?
What are the differences in time spent ‘on the bike’?
For our first question, I concluded that docked bikes are unused by the member group. I also found out that while members have no clear preferences over the type of bike initially, as the electric bike availability increased so did their preference of them within the member group.
For our second question, the conclusions were that members are primarily commuters, whereas casual riders would cycle more over the weekends. I also understood that the majority of demand is dependent on weather, as 80% of the trips happened within the warmer months of the year, with casual rider demand being very flexible.
Thirdly, electric bikes have shorter ride lengths, therefore increasing individual bike turnover and enabling more rides with the same fleet. Most frequent member cyclists spend under 10 minutes from departure to destination, with little variance in times, whereas casual riders have longer trips and skew towards longer trips over weekends.
To coalesce these findings, these are a few main takeaways and recommendations:
Docked bikes seem to be completely unattractive to the member group, so further investment in this category seems unjustified.
Electric bikes have an upside, both in total ride times and preferences among user groups. Where casual riders were instinctively preferring them throughout the reporting period, with increased availability members started preferring them as well.
Members find the most value in the subscription if it is used regularly, therefore using bikes to commute to their workplace. Catering to these patterns may increase subscription uptake.
Finally, these would be some of the areas worth looking into for further analysis:
Location-related information, to see what are the spatial flows of the member group to assist with bike allocation.
Total bike availability and turnover, to understand whether bikes are utilized efficiently per unit basis.
Information about docking stations, their classification and any patterns that lead to choosing or not choosing them concerning the rest of the fleet.
Weather-related information, in regards to changes of precipitation and the impact it may cause on the use of the service.