Skip to the content.
My Case Studies

Cyclistic Case Study Analysis 2020

Using Spreadsheets, Google Bigquery (SQL), and Tableau

By Cristian Arevalo

Last Updated November 14, 2023

Table of Contents


About This Project


This case study is my capstone project for the Google Data Analatics Professional Certificated (via Coursera).

Here, I assume the role of a junior data analyst working in a marketing analyst team at Cylistic, a fictional bike-share company stationed in Chicago based on Divvy bikes. The objective is to understand how casual riders and annual members use Cyclistic bikes differently by identifying trends and more. These insights will help the marketing team to develop new marketting strategies to convert casual riders into annual members. But first, Cyclistic executives must be compelled with data insights and professional data visualizations.

The main tools I use are spreadsheets/excel, SQL and Tableau. Here are the highlights:

A more in-depth breakdown of the case study scenario is included below, followed by my full report.

The analysis is based on the Divvy case study “‘Sophisticated, Clear, and Polished’: Divvy and Data Visualization” written by Kevin Hartman (found here)

Additionally, all data cleaning and analysis is accomplish through the use of SQL in Google CLoud BigQuery (Sandbox Edition). Python is used to upload large datasets (over 10MBs), the rest are manually uploaded through the site from csv files. Spreadsheets (Excel) is used to inspect data closely after being broken down through SQL queries. Finally, Tableau is used to provided compelling visuals.

Introduction


Cyclistic is a bike-share company based in Chicago, USA. In 2016, Cyclistic Launched a successful bike-share program offering bike rentals throughout the city. Since then, the program has expanded to a fleet over 5,800 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Divvy Bikes in Chicago

Based on Divvy Bikes Stationed in Chicago

Customers are divided into two classes: casual riders (those who purchase either the single-ride or full-day passes) and Cyclistic members (who purchase annual memberships).

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. The flexible pricing plans (single-ride passes, full-day passes, and annual memberships) aided in bringing in new customers.

Although pricing flexibility helps in attracking new customers, Cyclistic’s finance analyst have concluded that annual members are much more profitable than casual riders. The director of marketting (Lily Moreno) believes that increasing the number of annual members will be the key to future growth. Since casual riders have chosen Cyclistic for their mobility needs and are already aware of the annual membership program, a marketing strategy aimed at converting casual riders into members is to be held.

Defining The Problem


Lily, the director of marketing, and the marketing analytics team are looking for ways to convert Cyclistic’s casual riders into annual members. The main problem is designing marketing strategies to accomplishing this task by identifying and understading the difference between casual riders & member riders in their usage of Cyclistic Bikeshare program. There are three questions that will guide the marketing team in their future marketing program.

The marketing team would like to know:

Specifically, my focus will revolve around on how do annual members and casual riders use Cyclistic bikes differently.

By looking at the data, we will be able to first get a broad sense of certain patterns that are occurring in the two different groups. Understanding the differences will provide more accurate customer profiles for each group. These insights will help the marketing analytics team design high quality targeted marketing for converting casual riders into members. For the Cyclistic executive team, these insights will help Cyclistic maximize the number of annual members and will fuel future growth for the company.

Business Task

Analyze historical bike trip data to identify trends in how annual members and casual riders use Cyclistic bikes differently.

About Data Sources


Prep Work


The data is downloaded, skimmed, prepared, cleaned, and manipulated. This section deals with all the dirty work prior to the actual analysis and visualization. The prep-work lays the foundation of our data problem solving, and takes more than twice the time and effort as the analysis.

If you are more interested with the analysis, and results, you can jump ahead to Data Summary or if you want a brief peak at how the analysis is performed please check out Performing Analysis.

Step 1: Collect Data


Step 2: Wrangle Data and Combine into a Single Table


Preliminary Inspection

Once data is uploaded, it is important to compare schemas (e.g. column names and data type) for each of the tables and inspect the tables through preview to look for incongruencies. Here are the results:

Fixing Column Data Type

The following query is ran for multiple tables to correct column types.

CREATE OR REPLACE TABLE `project.dataset.table` AS (
  SELECT * REPLACE (
    CAST(started_at AS TIMESTAMP) AS started_at, 	
    CAST(ended_at AS TIMESTAMP) AS ended_at, 
    CAST(start_station_id AS STRING) AS  start_station_id, 
    CAST(end_station_id AS STRING) AS end_station_id 
    )
  FROM `project.dataset.table`);

Combine Trip Data

The bike trip data is divided into 10 tables, however we interested in looking at all rides occuring during 2020.

The following query is used to create a new table, called divvy_trips_2020 for reference, containing all trip data occuring in 2020 is created.

CREATE TABLE IF NOT EXISTS `case-study1-bike-share.divvy_trips_2020_analysis.divvy_trips_2020`
SELECT * FROM `case-study1-bike-share.divvy_trips_2020_data.divvy_trips_2020_*`;

Inspecting Combined Trip Data

Briefly skiming the new table shows that:

There is still a need to check for nulls or missing values.

Step 3: Clean Up and Add Data to Prepare for Analysis


Key Problems and Solutions

The first part of tidying data is look for issues/concerns regarding the data. Here are some of them:

  1. Some station names can have more than one id.
    • Create new unique station ids.

    • Prior to December ids were unique intergers, afterward alphanumeric ids were added but old ids were still being used.

    • The result, the station id is sometimes in accordance with the previous data, at times missing, and at others a combination of alphanumeric characters.

  2. Some station names have duplicates that end with a word or symbol in parathesis.

    • Establish a single unique name per station.

    • There are two sources of duplicates: those that end with “(*)” and those that end with “(Temp)”.

  3. There are some station names that corresponds to quality checks or other.
    • Filter these stations when interest on insights only about customers

    • Relevant NEW IDS: 310, 311, 312, 631, 455, and 45

      • HQ QR (id=310)
      • Hubbard st bike checking (Lbs-wh-test) (id=311)
      • Hubbard_test_lws (id=312)
      • Watson Testing-divvy (id=633)
      • Base-2132 W Hubbard Warehouse (id=45)
      • Mt1-Eco5.1-01 (id=455)
  4. The data can only be aggregated at the ride-level, which is too granular.
    • Add additional columns of data, such as the weekday & month when trips begin. This provide additional opportunities to aggregate the data.
  5. There is no field measuring trip duration.
    • Create a new column indicating ride_length and Scrutinize ride duration.

    • This presents more opportunities to compare user behavior between casual and member riders.

  6. Additionally, there are some rides were trip durations are negative.
    • Remove bad data.

    • This includes several hundred rides where Divvy took bikes out of circulation for Quality Control reasons.

    • This may also correspond to early cancellation times of rides by users.

  7. Some crucial data is missing in columns with Null values.
    • These instances are represented by Null values and it is important to be aware of them for data cleaning.

Checking For Missing Data

First order of business is finding out if anything important is missing from the data, so we check which columns contain NULLS using the following query.

SELECT column_name, COUNT(1) AS nulls_count
FROM `case-study1-bike-share.divvy_trips_2020_analysis.divvy_trips_2020` as table1,
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(table1), r'"(\w+)":null')) column_name
GROUP BY column_name
ORDER BY nulls_count DESC

Results

After checking for nulls or missing data, we discover that some important information is missing from the trips: start/end station names & ids.

The table below showcases which columns have missing information and how many rows are missing.

column_name nulls_count
end_station_id 111342
end_station_name 110881
start_station_id 95282
start_station_name 94656
end_lat 4255
end_lng 4255

Checking For Duplicate Data

A common problem often encounter while cleaning data is duplicate names, which we’ll tackle in this section. Specifically, we will be looking for duplicate station names or multiple ids for one station using the following query.

WITH all_stations AS (
  SELECT start_station_name, start_station_id, end_station_name, end_station_id
  FROM `case-study1-bike-share.divvy_trips_2020_analysis.divvy_trips_2020` 
)
SELECT DISTINCT station_name, station_id
FROM ( (SELECT DISTINCT start_station_name AS station_name, start_station_id AS station_id FROM all_stations)
UNION ALL (SELECT DISTINCT end_station_name AS station_name, end_station_id AS station_id FROM all_stations)
)
GROUP BY station_name, station_id
ORDER BY station_name

Results

After querying a list of stations with their respective ids, the outcome is downloaded as a csv file to view in EXCEL (unfortunately, some station names were cut off when viewing the table in BigQuery).

Below are two primary examples showcasing duplicates by either having the different ids or names.

station_name station_id
Damen Ave & Walnut (Lake) St 656
Damen Ave & Walnut (Lake) St KA17018054
Damen Ave & Walnut (Lake) St (*) 656
Wentworth Ave & Cermak Rd 120
Wentworth Ave & Cermak Rd 13075
Wentworth Ave & Cermak Rd (Temp) 120

Creating Table About Stations

Due to the present duplicate data occuring in start & end stations, there is a need to fix the problem to not have excess information and be concise. A way to address these stations is by creating a new table as a list of station information. The desired parameters will be station_name, station_id, lat, lng, and later down the line we will add the number of users per member_casual as member_riders and casual_riders (to display overall station activity for marketing purposes).

A summary of the steps taken to create a new table showcasing the station info is presented below (meanwhile, the full query will be available here):

  1. The new table will be called divvy_stations_2020.

  2. We begin by identifying unique station names from start & end station names, fixing duplicates to only present unique station names.

  3. Then, we create unique station ids that correspond to only one station.

  4. Finally, we average the latitude and longitude values of all ride trips corresponding to the stations to determine the lat and lng geo-location of each station.

    • Now that we have the most basic station information, we need consider the trip data from 2020 where we notice that the geo-location can be different for the same station.

    • Latitude / Longitude values for the stations are included in the same row as each ride’s information. Since each bike has its own GPS device, there is slight variance in the lat/long values of every station per ride. However, each station can only have one unique geo-location, so we take mean value for all respective lat/long values of a station.

    • These values are then used to replaced the start and end station lat/long values for each ride.

Results

The full table is available here: cleaned and uncleaned. Below are the first 5 results.

station_name station_id lat lng
2112 W Peterson Ave 1 41.991161729691882 -87.683592172455633
63rd St Beach 2 41.781052497804261 -87.57612809187053
900 W Harrison St 3 41.8747676018918 -87.649804773961
Aberdeen St & Jackson Blvd 4 41.877720779404484 -87.654792138860145
Aberdeen St & Monroe St 5 41.880414761538468 -87.655516799346827
Aberdeen St & Randolph St 6 41.884114663376479 -87.654270784691533

Cleaning Operations

Now that we have established a solid foundation of information about each station, we will use this data (station_name, station_id, lat, lng) to replace the trips corresponding information about start & end stations. The full query is available here and the verification of the results is available here, however below I will discussion a brief overview of the steps taken.

  1. The goal of this new query is to create a new version of divvy_trip_data (v2), where certain cases are excluded, to make analysis of the trip data easier.

  2. Rides are removed when:
    • Station names are missing (meaning they are NULL) from start or end station names because these rides cannot be tracked.
      • The new station ids will be assigned based on their station names.
    • Trip durations that are less than 60 seconds because these ride lengths are similar to a false start.
      • Moreover, the first minute of all rides are charged a fix fee (even if rides are less than a minute).
    • Trip durations are greater than 24 hours because Cyclistic bikes are expected to be returned to an appropriate station once they are no longer being used.
      • Keeping a bike longer than a day is synonymous to stealing it and such cases result in an additional fee.
  3. Station names are altered to prevent duplicate names.
    • Duplicates end with a word or symbol in parathesis.

    • This makes implementing new station ids smoother.

  4. Using divvy_stations_2020, station information is updated for both start & end stations.
    • Specifically, the following parameters are updated (8 parameters total):

      • Station name
      • Station id
      • Starting latitude
      • Startting longitude
  5. Additional data is added to aggregate trip info more easily during analysis
    • The following fields of data are implemented:

      • Starting month number (1-12) as month_num
      • Starting month name as starting_month
      • Starting day of the week (number) as weekday_num [1-7]
      • Starting day of the week as weekday [Sun-Sat]
      • Trip duration as ride_length measured in seconds

Results

ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casuaL month_num starting_month weekday_num weekday ride_length
99C72397983C1DF5 docked_bike 2020-08-10 23:49:04.000000 UTC 2020-08-11 12:49:16.000000 UTC Lake Park Ave & 35th St 351 Lake Park Ave & 35th St 351 41.831275652901688 -87.6087894057451 41.831275652901688 -87.6087894057451 casual 8 Aug 2 Mon 46812
08306A0B4F8B9C44 docked_bike 2020-08-10 23:49:04.000000 UTC 2020-08-11 00:56:14.000000 UTC Lake Park Ave & 35th St 351 Lake Park Ave & 35th St 351 41.831275652901688 -87.6087894057451 41.831275652901688 -87.6087894057451 casual 8 Aug 2 Mon 4030
7A3E9EC138E2D0AD docked_bike 2020-08-10 23:49:17.000000 UTC 2020-08-11 00:18:37.000000 UTC Stockton Dr & Wrightwood Ave 586 Kedzie Ave & Milwaukee Ave 333 41.931312260664328 -87.638733542147918 41.929576232244784 -87.707870112443771 casual 8 Aug 2 Mon 1760
5858B7C1FC4A89A2 docked_bike 2020-08-10 23:49:28.000000 UTC 2020-08-11 00:57:04.000000 UTC Lake Park Ave & 35th St 351 Lake Park Ave & 35th St 351 41.831275652901688 -87.6087894057451 41.831275652901688 -87.6087894057451 casual 8 Aug 2 Mon 4056
CDF2EA7B3F5ED5FF docked_bike 2020-08-10 23:49:48.000000 UTC 2020-08-11 00:39:13.000000 UTC Halsted St & 59th St 286 Halsted St & 59th St 286 41.787533838414639 -87.644880773373984 41.787533838414639 -87.644880773373984 casual 8 Aug 2 Mon 2965

Verified

Number of quality check uses in cleaned trip data

member_casual quality_check_users
casual 71
member 197

From the following stations:

station_name station_id
Base - 2132 W Hubbard Warehouse 45
HQ QR 310
Hubbard St Bike Checking (Lbs-Wh-Test) 311
Hubbard_Test_Lws 312
Mtl-Eco5.1-01 455
Watson Testing - Divvy 633

Manipulating Data

After the cleaning operation, our data is nearly ready for analysis. But before that, we manipulate it to ease our analysis.

Results: Top 5 Starting Stations

station_name station_id lat lng total_rides_from total_rides_to members_from_st casuals_from_st members_to_st casuals_to_st
Clark St & Elm St 135 41.902963895113132 -87.631312305029311 31012 31180 19886 11126 20565 10615
Kingsbury St & Kinzie St 345 41.889192992430722 -87.638501411531365 21882 21337 16260 5622 16454 4883
St. Clair St & Erie St 564 41.894421578032492 -87.622700546409931 21188 23853 15559 5629 17410 6443
Dearborn St & Erie St 208 41.894004025551475 -87.629318830077878 23559 24134 15453 8106 16039 8095
Broadway & Barry Ave 56 41.937610986894825 -87.644098303068489 23167 24057 15372 7795 15787 8270

Step 4: Perform Analysis


Now that the preliminary work is done we can begin conducting a detailed analysis on the data. But first we need to recall the goal: to discover how do annual members and casual riders use Cyclistic bikes differently. The full query used for performing analysis is provided here.

Specifically, we are interested in analyzing user behavior in the following ways:

  1. Weekly

  2. Monthly

  3. Time of day

  4. Type of rideable

  5. And geographically

Cyclistic Weekly Analysis

Riders behave very different based on weekly trips. During the weekly analysis, we scrutinize the following parameters:

Resulting Table

This table is available for download here.

member_casual weekday_num weekday avg_ride_len_secs avg_ride_len max_ride_len min_ride_len num_rides part_of_tot_rides
casual 1 Sun 2520 00:42:00 23:55:54 00:01:01 242713 0.072880308537139046
casual 2 Mon 2239 00:37:19 23:59:49 00:01:01 131039 0.039347553490740761
casual 3 Tue 2101 00:35:01 23:54:26 00:01:01 127485 0.038280381083243054
casual 4 Wed 2061 00:34:21 23:57:59 00:01:01 141395 0.04245718698878418
casual 5 Thu 2073 00:34:33 23:48:04 00:01:01 150913 0.045315191202223465
casual 6 Fri 2183 00:36:23 23:58:33 00:01:01 188455 0.056588063043044826
casual 7 Sat 2427 00:40:27 23:59:54 00:01:01 296754 0.089107394658012387
member 1 Sun 1044 00:17:24 23:29:04 00:01:01 254494 0.076417831928453209
member 2 Mon 857 00:14:17 23:17:37 00:01:01 276207 0.082937672807462157
member 3 Tue 855 00:14:15 23:56:39 00:01:01 298331 0.089580926139898681
member 4 Wed 861 00:14:21 23:39:52 00:01:01 311855 0.093641826432245057
member 5 Thu 870 00:14:30 23:46:15 00:01:01 310516 0.093239760069375219
member 6 Fri 901 00:15:01 23:52:08 00:01:01 307400 0.092304107502756513
member 7 Sat 1036 00:17:16 23:59:43 00:01:01 292739 0.087901796116621467

Cyclistic Monthly Analysis

Riders also behave very differently based on monthly trips. During the monthly analysis, we scrutinize the following parameters:

Resulting Table

This table is available for download here.

member_casual month_num starting_month avg_ride_len_secs avg_ride_len max_ride_len min_ride_len num_rides part_of_tot_rides
casual 1 Jan 2273 00:37:53 23:48:04 00:01:05 7678 0.002305500772303723
casual 2 Feb 2364 00:39:24 23:55:55 00:01:06 12206 0.0036651396752721077
casual 3 Mar 2474 00:41:14 23:53:10 00:01:01 24418 0.0073320809922000924
casual 4 Apr 2372 00:39:32 23:58:33 00:01:01 23331 0.0070056835788770725
casual 5 May 2678 00:44:38 23:53:20 00:01:01 85914 0.0257977068705004
casual 6 Jun 2554 00:42:34 23:55:42 00:01:01 152794 0.045880005861340854
casual 7 Jul 2686 00:44:46 23:58:54 00:01:01 265559 0.079740359415499409
casual 8 Aug 2273 00:37:53 23:59:49 00:01:01 277794 0.083414207025441583
casual 9 Sep 1961 00:32:41 23:59:33 00:01:01 212041 0.063670316392296658
casual 10 Oct 1692 00:28:12 23:59:54 00:01:01 120624 0.03622020384974789
casual 11 Nov 1763 00:29:23 23:55:03 00:01:01 72106 0.021651528873109178
casual 12 Dec 1436 00:23:56 23:41:27 00:01:01 24289 0.0072933456965987411
member 1 Jan 652 00:10:52 23:20:02 00:01:01 134865 0.040496400319971558
member 2 Feb 666 00:11:06 23:52:08 00:01:01 125383 0.037649205956467535
member 3 Mar 784 00:13:04 23:41:42 00:01:01 114351 0.034336587498528658
member 4 Apr 1094 00:18:14 23:56:39 00:01:01 60215 0.018080975384770603
member 5 May 1172 00:19:32 23:29:04 00:01:01 111415 0.03345498418158626
member 6 Jun 1107 00:18:27 23:46:15 00:01:01 184961 0.055538907052105881
member 7 Jul 1050 00:17:30 23:52:12 00:01:01 276113 0.082909447088186761
member 8 Aug 980 00:16:20 22:05:05 00:01:01 318269 0.0955677813623774
member 9 Sep 901 00:15:01 22:59:48 00:01:01 279042 0.083788948489863971
member 10 Oct 830 00:13:50 23:59:43 00:01:01 211750 0.063582936771986634
member 11 Nov 811 00:13:31 23:36:24 00:01:01 147124 0.044177454496537247
member 12 Dec 748 00:12:28 22:49:28 00:01:01 88054 0.026440292394429805

Cyclistic Time of Day Analysis

Riders appear to follow notable trends based on the start of their trips. During the monthly analysis, we scrutinize the following parameters:

Resulting Table

This table is available for download here.

member_casual weekday_num weekday time_of_day num_rides part_of_tot_rides
casual 1 Sun 0 5504 0.0016527059456576832
casual 1 Sun 1 3812 0.0011446429987004157
casual 1 Sun 2 2264 0.00067981945148419239
casual 1 Sun 3 1176 0.00035312176455185967
casual 1 Sun 4 714 0.00021439535704934336
casual 1 Sun 5 591 0.00017746170310386825
casual 1 Sun 6 1128 0.000338708631304845
casual 1 Sun 7 1912 0.0005741231410060848
casual 1 Sun 8 3546 0.0010647702186232094
casual 1 Sun 9 6924 0.0020790944708818674
casual 1 Sun 10 11461 0.0034414358363340676
casual 1 Sun 11 17117 0.0051397833706072977
casual 1 Sun 12 20244 0.0060787389469284412
casual 1 Sun 13 23164 0.0069555378861218342
casual 1 Sun 14 24292 0.0072942465174266795
casual 1 Sun 15 24548 0.0073711165614107578
casual 1 Sun 16 22629 0.0067948915051394831
casual 1 Sun 17 19534 0.0058655446843163487
casual 1 Sun 18 16201 0.0048647327444767676
casual 1 Sun 19 12597 0.0037825466565134152
casual 1 Sun 20 9089 0.0027291868350440922
casual 1 Sun 21 5993 0.0017995397406116453
casual 1 Sun 22 4722 0.001417891983175069
casual 1 Sun 23 3551 0.0010662715866697735
casual 2 Mon 0 2210 0.0006636046765813009

Cyclistic Rideable Analysis

Riders have a higher tendency to use certain rideable types than others. This analysis aims to ascertain which rideables are more favored. During the analysis we look out for the following parameters:

Resulting Table

This table is available for download here

rideable_type member_casual num_rides part_of_tot_rides
classic_bike casual 11158 0.0033504529327122875
classic_bike member 58496 0.017564805050361891
docked_bike casual 1125692 0.33801559981455104
docked_bike member 1784073 0.53571003898752545
electric_bike casual 141904 0.042610026255924395
electric_bike member 208973 0.062749076958924968

Cyclistic Geographic Analysis

Members and casual riders concentrate their trips on different destinations. Members spreadout throughout Chicago, meanwhile casual users are more focus near the pier. Moreover, when looking at the starting stations, riders follow a notable trend based on their start

Riders appear to follow notable trends based on the start of their trips. During the monthly analysis, we scrutinize the following parameters:

Resulting Table

This table is available for download here.

Data Summary


The data can be summarized as below:

Table: Summary of Trips

summary_of started_with ended_with eliminated retained (%)
rides 3,541,683 3,330,296 211,387 94.031
stations 696 683 13 98.132

Table: Summary of Riders

member_casual num_riders percent avg_ride
casual 1,278,754 38.40 37:55
member 2,051,542 61.60 15:15

Data Story: Cyclistic Bikeshare Analysis of 2020


For the full interactive Tableau Story please visit here!.

Below you will find key insights pertaining to each dashboard within the story, you are more than welcome to have a look.

Customer Usage

Seasonal trends played a major role in the number of rides Cyclistic experienced. During the year 2020, casual riders accounted for 38.4% of total rides while member riders made up a whopping 61.6% of total trips. Summer time brought a huge increase in the number of casual riders, in addition to overall ridership. Meanwhile, during winter time, by far the least popular season, there is a major drop-off in rides. This makes sense because bike riding is more suited for warm weather, however it important to plan out events, activities, or promotional content during Cyclistic’s most popular time: summer.

By taking advantage of the seasonal trend, and the fact that casual riders are already familiar with the Cyclistic program and using Cyclistic bikes, this can be a good opportunity to convert them to annual members.

It is also important to note user preference when it comes to which rideable popularity. Docked bikes are by far the most popular but electric bikes also receive their fair share of attention. Moving forward, focusing on these bikes will continue to yield more customers.

Monthly Analysis

Taking a deeper dive into how customers use Cyclistic bikes throughout the year, we consider the number of trips taken & their average ride length on a month-to-month basis.

Seasonal Trend

We confirm once more the favorite season for riders is summer. Moreover, a rising trend is seen during early spring, when the number of rides begin increasing around May. Peak usage occurs during August, then we see a falling trend up during fall, ending with the least favorite cycling period: winter.

Best Time For Marketting Campaign

Since we know that riders prefer biking during summer time and we see a large influx of trips taken during late Spring, this means a sufficient number of users are aware and have experience Cyclistics bikeshare services. Thus, early May is the optimal time to begin advertising.

Weekly Analysis

Number of Rides

When looking at the number of rides, casual riders show a dramatic predisposition to favor weekend for biking. Meanwhile, annual members are overall consitent in how many rides they take throughout the week; their highest usage day occurs midweek.

Ride Length

Saturday and Sunday are favored by both groups for longer rides, revealing a U-shape pattern when ride duration decreases during the week. Moreover, just like during the monthly analysis, casual riders have nearly double the ride time of members.

Insight

Despite their high inclination towards riding during week, casual riders still take plentiful bike trips during week. This shows promise when attempting to convert to members. Of course, usage time plays a role in their conversion rate. Shorter trips indicate a purposeful usage (of knowing where you are going), while longer trips can be associete with exploring or touring. But with so many riders, Cyclistic is bound gain notably more members by targetting those key individuals.

Targetting Group

Casual Riders with ride times whose average ride duration nears that of members (~15mins) and uses the Cyclistic bikeshare program consistently.

Potential Secondary Marketting Campaign

Due to casual riders being more prominent during the weekend, a potential way to increase the number Cyclistic users is by inviting even more casual riders. Specific bonus packages can be tailure for leasure riders, such a Museum pass, beach activities, coastal group rides, discounts using Cyclistic for certain activities, and more!

Daily Analysis

Knowing when riders take trips is important in determining Cyclistic use cases, so we take a deeper dive into how often riders take trips throughout the week, except we consider their experience on an hour-to-hour basis.

Popular Times

During the week, Cyclistic experiences peak usage times at 8AM & 5PM, hours associated with commuting to-and-from work. During the weekend, popular times occur from 11AM to 5PM suggesting that rides are taken for leasure.

Insights

Members most likely use Cyclistic bikes to comute, and while casual riders favor weekends, there are plenty of users who take trips during similar periods as members. These casual riders are already using Cyclistic services similar to members, and thus have a higher chance of being converted.

Geographical Analysis

Majority of trips are taken downtown Chicago, regardless of user type. However, stations with more activy from casual users are near the pier or spreadout across stations. Popular stations not near the pier and near the downtown area are dominated by annual members.

This means that near the pier, there are a lot of riders leasurely spending their time. Still, plenty of casual riders exist along the cost line and within stations with numerious trips (downtown) which holds promise in conversion.

Insights The prior two targetting groups still hold promise in increasing the number users. Moreover, if Cyclistic invents a leasure member group we can convert casual riders who enjoy weekend rides into members. This is of couse in addition to casual riders who share the use cases with current annual members (of 2020).

Conclusion

Recommendations


By focusing on seasonality, usage time, and usage case, we can formulate successful marketting strategies to convert casual riders to members.

Seasonality

In Spring the demand by casual riders increases, peaking during the summer. This is an opportune time to host a targeted advertising campaign and promotional activities.

Launch the marketing campaign in early Spring to draw-in incoming casual riders and run through the summer to convert them into annual members.

Schedule different promotions and offers to highlight the value of becoming an annual member instead of remaining a casual rider. Discounts for signing up now, raise awareness that summer will soon arrive and be ready.

Usage Time

Daily usage revealed that both groups peak during the evening (~5PM), however in the morning only members showed high activity. This could mean that most casual riders are not aware of the potential benefits of using Cyclistic rides for commuting early in the morning.

Use digital media to raise awareness about the benefits of using Cyclistic bikeshare for morning and evening comutes.

The goal is to get casual riders to spend more time on Cyclistic bikes, increasing the odds of upgrading their membership status.

Usage Case

Many casual riders share the pattern of enjoying leasure rides during the weekends. Proof is their drastic change in number of rides when compared to weekdays. So why not tap into that potential?

Create a new package deal for annual members, catoring to customers who want to spend leasure time riding Cyclistic Bikes.

Adding these new benefits, for members and incoming casual riders, specifically tailored to entertainment and joyful activites will improve conversion rate and member retention in the longer.

Use digital channels to market and promote these new benefits. Possibly hosting summer events to guide casual riders into how they can enjoy leasure rides with others.


Final Thoughts


From the Cyclistic case study, the analysis results showed that casual riders could be categorized into two groups: those that shared similarities with members and those that used Cyclistic bikes for leasure or entertainment.

The marketting campaign holds promise in converting casual users to annual members. Though not easy, with targetted advertising via email and social media, Cyclistic can improve the conversion rate by raising awareness of their benefits and tailuring new benefits to bring more riders

Thank you for taking the time to read this case study

Working on the capstone project was a good opportunity to learn about the real-world data analysis process. I learned a ton about data preparation and cleaning by actually working on those processes. Checking for data integrity and reliability is crucial before anything.

It’s been fun finding out how data analysis can inform business decisions and lead to effect actions. Until next time, have fun analyzing!