Google Data Analytics Certificate Case Study

Overview

The following will contain my case study on the fictional bike-share company, Cyclystic. This case study is in relation to the Google Data Analytics Professional Certificate. I received the data from this program through this link. However, the data has been made available Motivate International Inc. under this license. The CSV files for this project are very large, especially when put together. That being said, I will be using R and RStudio to clean, analyze, and visualize this data as these tools are better suited for large datasets.

About the Company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments.One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Business Problem

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

The goal of this case study is to answer the question: “How do annual members and casual riders use Cyclistic bikes differently?”

1 - Load Packages

library(ggplot2)
library(tidyverse)
library(here)
library(janitor)
library(skimr)
library(readr)
library(plyr)
library(lubridate)
library(maptools)
library(mapproj)
library(mapdata)
library(ggmap)

2 - Load Data

all_trip_data <- list.files(path = "./CSV", pattern = "*.csv", full.names = TRUE) %>% 
  lapply(read_csv, na.omit) %>% 
  bind_rows

3 - Clean Data

In the following sections, I will detail my process of cleaning the data.

3.1 - Create Data Frame of data

all_trips_df <- data.frame(all_trip_data)
head(all_trips_df)

##                 X1            X2                  X3                  X4
## 1          ride_id rideable_type          started_at            ended_at
## 2 ACB6B40CF5B9044C electric_bike 2020-10-31 19:39:43 2020-10-31 19:57:12
## 3 DF450C72FD109C01 electric_bike 2020-10-31 23:50:08 2020-11-01 00:04:16
## 4 B6396B54A15AC0DF electric_bike 2020-10-31 23:00:01 2020-10-31 23:08:22
## 5 44A4AEE261B9E854 electric_bike 2020-10-31 22:16:43 2020-10-31 22:19:35
## 6 10B7DD76A6A2EB95 electric_bike 2020-10-31 19:38:19 2020-10-31 19:54:32
##                               X5               X6                         X7
## 1             start_station_name start_station_id           end_station_name
## 2  Lakeview Ave & Fullerton Pkwy              313       Rush St & Hubbard St
## 3   Southport Ave & Waveland Ave              227 Kedzie Ave & Milwaukee Ave
## 4     Stony Island Ave & 67th St              102   University Ave & 57th St
## 5            Clark St & Grace St              165     Broadway & Sheridan Rd
## 6 Southport Ave & Wrightwood Ave              190    Stave St & Armitage Ave
##               X8                 X9                X10                X11
## 1 end_station_id          start_lat          start_lng            end_lat
## 2            125  41.92610083333334 -87.63897733333333  41.89034516666667
## 3            260  41.94817233333333         -87.663911 41.929527666666665
## 4            423  41.77346216666667 -87.58537166666666  41.79145466666667
## 5            256  41.95085483333333 -87.65924433333333          41.952809
## 6            185 41.928856833333334         -87.663962 41.917776833333335
##                  X12           X13
## 1            end_lng member_casual
## 2         -87.626068        casual
## 3 -87.70781783333334        casual
## 4 -87.60004966666666        casual
## 5 -87.65010316666667        casual
## 6 -87.69143366666667        casual

colnames(all_trips_df)

##  [1] "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10" "X11" "X12"
## [13] "X13"

3.2 - Rename Columns

Looks like we’re going to have to change the column names before we proceed.

names(all_trips_df) [1:13] <- c("ride_id", "rideable_type", "started_at", "ended_at", "start_station_name", 
                                "start_station_id", "end_station_name", "end_station_id", "start_lat", "start_lng", 
                                "end_lat", "end_lng", "member_casual")

We will need to remove the first row of the df as it contains what was supposed to be the column names

all_trips_df <- all_trips_df[-c(1),]

Let’s check out what our df looks like now…

head(all_trips_df)

##            ride_id rideable_type          started_at            ended_at
## 2 ACB6B40CF5B9044C electric_bike 2020-10-31 19:39:43 2020-10-31 19:57:12
## 3 DF450C72FD109C01 electric_bike 2020-10-31 23:50:08 2020-11-01 00:04:16
## 4 B6396B54A15AC0DF electric_bike 2020-10-31 23:00:01 2020-10-31 23:08:22
## 5 44A4AEE261B9E854 electric_bike 2020-10-31 22:16:43 2020-10-31 22:19:35
## 6 10B7DD76A6A2EB95 electric_bike 2020-10-31 19:38:19 2020-10-31 19:54:32
## 7 DA6C3759660133DA electric_bike 2020-10-29 17:38:04 2020-10-29 17:45:43
##               start_station_name start_station_id           end_station_name
## 2  Lakeview Ave & Fullerton Pkwy              313       Rush St & Hubbard St
## 3   Southport Ave & Waveland Ave              227 Kedzie Ave & Milwaukee Ave
## 4     Stony Island Ave & 67th St              102   University Ave & 57th St
## 5            Clark St & Grace St              165     Broadway & Sheridan Rd
## 6 Southport Ave & Wrightwood Ave              190    Stave St & Armitage Ave
## 7      Larrabee St & Division St              359        Wells St & Huron St
##   end_station_id          start_lat          start_lng            end_lat
## 2            125  41.92610083333334 -87.63897733333333  41.89034516666667
## 3            260  41.94817233333333         -87.663911 41.929527666666665
## 4            423  41.77346216666667 -87.58537166666666  41.79145466666667
## 5            256  41.95085483333333 -87.65924433333333          41.952809
## 6            185 41.928856833333334         -87.663962 41.917776833333335
## 7             53  41.90353233333333 -87.64334533333333 41.894397166666664
##              end_lng member_casual
## 2         -87.626068        casual
## 3 -87.70781783333334        casual
## 4 -87.60004966666666        casual
## 5 -87.65010316666667        casual
## 6 -87.69143366666667        casual
## 7 -87.63431216666666        casual

3.3 - Drop NAs

Before proceeding, I dropped all rows that contained an NA value.

all_trips_df <- all_trips_df %>% 
  drop_na() # drops rows with NA

Let’s take a look at what we have now.

head(all_trips_df)

##            ride_id rideable_type          started_at            ended_at
## 1 ACB6B40CF5B9044C electric_bike 2020-10-31 19:39:43 2020-10-31 19:57:12
## 2 DF450C72FD109C01 electric_bike 2020-10-31 23:50:08 2020-11-01 00:04:16
## 3 B6396B54A15AC0DF electric_bike 2020-10-31 23:00:01 2020-10-31 23:08:22
## 4 44A4AEE261B9E854 electric_bike 2020-10-31 22:16:43 2020-10-31 22:19:35
## 5 10B7DD76A6A2EB95 electric_bike 2020-10-31 19:38:19 2020-10-31 19:54:32
## 6 DA6C3759660133DA electric_bike 2020-10-29 17:38:04 2020-10-29 17:45:43
##               start_station_name start_station_id           end_station_name
## 1  Lakeview Ave & Fullerton Pkwy              313       Rush St & Hubbard St
## 2   Southport Ave & Waveland Ave              227 Kedzie Ave & Milwaukee Ave
## 3     Stony Island Ave & 67th St              102   University Ave & 57th St
## 4            Clark St & Grace St              165     Broadway & Sheridan Rd
## 5 Southport Ave & Wrightwood Ave              190    Stave St & Armitage Ave
## 6      Larrabee St & Division St              359        Wells St & Huron St
##   end_station_id          start_lat          start_lng            end_lat
## 1            125  41.92610083333334 -87.63897733333333  41.89034516666667
## 2            260  41.94817233333333         -87.663911 41.929527666666665
## 3            423  41.77346216666667 -87.58537166666666  41.79145466666667
## 4            256  41.95085483333333 -87.65924433333333          41.952809
## 5            185 41.928856833333334         -87.663962 41.917776833333335
## 6             53  41.90353233333333 -87.64334533333333 41.894397166666664
##              end_lng member_casual
## 1         -87.626068        casual
## 2 -87.70781783333334        casual
## 3 -87.60004966666666        casual
## 4 -87.65010316666667        casual
## 5 -87.69143366666667        casual
## 6 -87.63431216666666        casual

4 - Analyze

4.1 - Format columns

Upon further inspection, I realized that I needed to change the data types of the columns for later analysis.

all_trips_df <- all_trips_df %>% 
  mutate(started_at = ymd_hms(started_at), # formats date as ymd hms
         ended_at = ymd_hms(ended_at), 
         start_station_id = as.numeric(start_station_id),
         end_station_id = as.numeric(end_station_id),
         start_lat = as.numeric(start_lat),
         start_lng = as.numeric(start_lng),
         end_lat = as.numeric(end_lat),
         end_lng = as.numeric(end_lng),
         ride_length_min = round(as.numeric(ended_at-started_at), digits = 2), # returns the diff in time in sec
         day_of_week = weekdays(started_at)) # retrieves day of the week that the ride started on

## Warning: 12 failed to parse.

## Warning: 12 failed to parse.

## Warning in eval(cols[[col]], .data, parent.frame()): NAs introduced by coercion

## Warning in eval(cols[[col]], .data, parent.frame()): NAs introduced by coercion

## Warning in eval(cols[[col]], .data, parent.frame()): NAs introduced by coercion

## Warning in eval(cols[[col]], .data, parent.frame()): NAs introduced by coercion

## Warning in eval(cols[[col]], .data, parent.frame()): NAs introduced by coercion

## Warning in eval(cols[[col]], .data, parent.frame()): NAs introduced by coercion

# change ride_length_min to min instead of sec
all_trips_df <- all_trips_df %>% 
  mutate(ride_length_min = round(as.numeric((ended_at-started_at)/60), digits = 2)) # returns diff in minutes

head(all_trips_df)

##            ride_id rideable_type          started_at            ended_at
## 1 ACB6B40CF5B9044C electric_bike 2020-10-31 19:39:43 2020-10-31 19:57:12
## 2 DF450C72FD109C01 electric_bike 2020-10-31 23:50:08 2020-11-01 00:04:16
## 3 B6396B54A15AC0DF electric_bike 2020-10-31 23:00:01 2020-10-31 23:08:22
## 4 44A4AEE261B9E854 electric_bike 2020-10-31 22:16:43 2020-10-31 22:19:35
## 5 10B7DD76A6A2EB95 electric_bike 2020-10-31 19:38:19 2020-10-31 19:54:32
## 6 DA6C3759660133DA electric_bike 2020-10-29 17:38:04 2020-10-29 17:45:43
##               start_station_name start_station_id           end_station_name
## 1  Lakeview Ave & Fullerton Pkwy              313       Rush St & Hubbard St
## 2   Southport Ave & Waveland Ave              227 Kedzie Ave & Milwaukee Ave
## 3     Stony Island Ave & 67th St              102   University Ave & 57th St
## 4            Clark St & Grace St              165     Broadway & Sheridan Rd
## 5 Southport Ave & Wrightwood Ave              190    Stave St & Armitage Ave
## 6      Larrabee St & Division St              359        Wells St & Huron St
##   end_station_id start_lat start_lng  end_lat   end_lng member_casual
## 1            125  41.92610 -87.63898 41.89035 -87.62607        casual
## 2            260  41.94817 -87.66391 41.92953 -87.70782        casual
## 3            423  41.77346 -87.58537 41.79145 -87.60005        casual
## 4            256  41.95085 -87.65924 41.95281 -87.65010        casual
## 5            185  41.92886 -87.66396 41.91778 -87.69143        casual
## 6             53  41.90353 -87.64335 41.89440 -87.63431        casual
##   ride_length_min day_of_week
## 1           17.48    Saturday
## 2           14.13    Saturday
## 3            8.35    Saturday
## 4            2.87    Saturday
## 5           16.22    Saturday
## 6            7.65    Thursday

4.2 - Filter Data

all_trips_df <- all_trips_df %>% 
  filter(start_station_name != "WATSON TESTING - DIVVY") # seemed to be a test location for the company, decided to filter out

all_trips_df <- all_trips_df %>% 
  filter(ride_length_min > 0) # filter out rides less than 0 minutes

4.3 - Mutate Data

Create df for Casual Riders

casual_trips_df <- all_trips_df %>% # data frame of rides for casual riders
  filter(member_casual == "casual",
         ride_length_min < 1500, # filter out rides longer than 24hrs
         year(started_at) > "2020") %>% # filter out 2020 data
  mutate(day_of_week = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), # create a hierarchy of weekdays
         month = month(started_at)) # create column representing the month the ride was in

Create df for Annual Members

annual_member_trips_df <- all_trips_df %>% # data frame of rides for annual members
  filter(member_casual == "member", 
         ride_length_min < 1500, # filter out rides longer than 24hrs
         year(started_at) > "2020") %>% # filter out 2020 data
  mutate(day_of_week = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), # create a hierarchy of weekdays
         month = month(started_at)) # create column representing the month the ride was in

4.4 - Summary Statistics

summary(casual_trips_df)

##    ride_id          rideable_type        started_at                 
##  Length:1932167     Length:1932167     Min.   :2021-01-01 00:12:27  
##  Class :character   Class :character   1st Qu.:2021-06-05 18:41:49  
##  Mode  :character   Mode  :character   Median :2021-07-20 18:11:13  
##                                        Mean   :2021-07-15 22:34:36  
##                                        3rd Qu.:2021-08-31 19:40:18  
##                                        Max.   :2021-10-31 23:57:33  
##                                                                     
##     ended_at                   start_station_name start_station_id 
##  Min.   :2021-01-01 00:12:30   Length:1932167     Min.   :    20   
##  1st Qu.:2021-06-05 19:15:45   Class :character   1st Qu.: 13033   
##  Median :2021-07-20 18:33:07   Mode  :character   Median : 13156   
##  Mean   :2021-07-15 23:03:22                      Mean   : 12547   
##  3rd Qu.:2021-08-31 19:57:59                      3rd Qu.: 13303   
##  Max.   :2021-11-01 10:58:41                      Max.   :202480   
##                                                   NA's   :1032809  
##  end_station_name   end_station_id      start_lat       start_lng     
##  Length:1932167     Min.   :    20    Min.   :41.65   Min.   :-87.83  
##  Class :character   1st Qu.: 13033    1st Qu.:41.88   1st Qu.:-87.65  
##  Mode  :character   Median : 13154    Median :41.90   Median :-87.64  
##                     Mean   : 12575    Mean   :41.90   Mean   :-87.64  
##                     3rd Qu.: 13303    3rd Qu.:41.93   3rd Qu.:-87.62  
##                     Max.   :202480    Max.   :42.06   Max.   :-87.53  
##                     NA's   :1029974                                   
##     end_lat         end_lng       member_casual      ride_length_min  
##  Min.   :41.65   Min.   :-87.83   Length:1932167     Min.   :   0.02  
##  1st Qu.:41.88   1st Qu.:-87.65   Class :character   1st Qu.:   9.70  
##  Median :41.90   Median :-87.64   Mode  :character   Median :  16.98  
##  Mean   :41.90   Mean   :-87.64                      Mean   :  28.76  
##  3rd Qu.:41.93   3rd Qu.:-87.63                      3rd Qu.:  31.18  
##  Max.   :42.17   Max.   :-87.52                      Max.   :1499.85  
##                                                                       
##     day_of_week         month       
##  Monday   :214482   Min.   : 1.000  
##  Tuesday  :200828   1st Qu.: 6.000  
##  Wednesday:202655   Median : 7.000  
##  Thursday :209040   Mean   : 6.981  
##  Friday   :273331   3rd Qu.: 8.000  
##  Saturday :446123   Max.   :10.000  
##  Sunday   :385708

min(casual_trips_df$ride_length_min)

## [1] 0.02

mean(casual_trips_df$ride_length_min)

## [1] 28.75529

max(casual_trips_df$ride_length_min)

## [1] 1499.85

summary(annual_member_trips_df)

##    ride_id          rideable_type        started_at                 
##  Length:2222600     Length:2222600     Min.   :2021-01-01 00:02:24  
##  Class :character   Class :character   1st Qu.:2021-05-21 08:37:00  
##  Mode  :character   Mode  :character   Median :2021-07-17 14:20:31  
##                                        Mean   :2021-07-09 06:38:52  
##                                        3rd Qu.:2021-09-06 20:17:45  
##                                        Max.   :2021-10-31 23:58:15  
##                                                                     
##     ended_at                   start_station_name start_station_id 
##  Min.   :2021-01-01 00:08:39   Length:2222600     Min.   :    20   
##  1st Qu.:2021-05-21 08:49:06   Class :character   1st Qu.: 13050   
##  Median :2021-07-17 14:39:18   Mode  :character   Median : 13179   
##  Mean   :2021-07-09 06:52:23                      Mean   : 12402   
##  3rd Qu.:2021-09-06 20:32:34                      3rd Qu.: 13323   
##  Max.   :2021-11-01 07:42:23                      Max.   :202480   
##                                                   NA's   :1276813  
##  end_station_name   end_station_id      start_lat       start_lng     
##  Length:2222600     Min.   :    20    Min.   :41.65   Min.   :-87.83  
##  Class :character   1st Qu.: 13053    1st Qu.:41.88   1st Qu.:-87.66  
##  Mode  :character   Median : 13179    Median :41.90   Median :-87.64  
##                     Mean   : 12437    Mean   :41.90   Mean   :-87.65  
##                     3rd Qu.: 13323    3rd Qu.:41.93   3rd Qu.:-87.63  
##                     Max.   :202480    Max.   :42.06   Max.   :-87.53  
##                     NA's   :1280120                                   
##     end_lat         end_lng       member_casual      ride_length_min  
##  Min.   :41.65   Min.   :-87.83   Length:2222600     Min.   :   0.02  
##  1st Qu.:41.88   1st Qu.:-87.66   Class :character   1st Qu.:   5.88  
##  Median :41.90   Median :-87.64   Mode  :character   Median :  10.02  
##  Mean   :41.90   Mean   :-87.65                      Mean   :  13.52  
##  3rd Qu.:41.93   3rd Qu.:-87.63                      3rd Qu.:  17.00  
##  Max.   :42.07   Max.   :-87.53                      Max.   :1495.63  
##                                                                       
##     day_of_week         month      
##  Monday   :295823   Min.   : 1.00  
##  Tuesday  :332869   1st Qu.: 5.00  
##  Wednesday:342877   Median : 7.00  
##  Thursday :324804   Mean   : 6.75  
##  Friday   :322785   3rd Qu.: 9.00  
##  Saturday :322414   Max.   :10.00  
##  Sunday   :281028

min(annual_member_trips_df$ride_length_min)

## [1] 0.02

mean(annual_member_trips_df$ride_length_min)

## [1] 13.52448

max(annual_member_trips_df$ride_length_min)

## [1] 1495.63

After a quick look at the summary statistics, we can see that casual riders spend more time on average per each ride than annual members. Casual riders spend and average of 28.7552883 minutes per ride, whereas annual members spend an average of 13.5244758 minutes.

5 - Plots

5.1 - Bar Charts

Casual Riders

casual_weekly_plot <- ggplot(data = casual_trips_df, mapping = aes(x=day_of_week)) +
  geom_bar(aes(fill = day_of_week)) + # color for each weekday
  labs(
    title = "Number of Weekly Casual Rides",
    subtitle = "January 2021 - October 2021",
    x = "Day of the Week",
    y = "Number of Rides",
    fill = "Day of the Week"
  ) +
  theme(axis.text.x = element_text(angle = 90)) + # turns xaxis labels 90 degrees counterclockwise
  facet_wrap(~month) # create a weekly plot for each month
casual_weekly_plot

Annual Members

member_weekly_plot <- ggplot(data = annual_member_trips_df, mapping = aes(x=day_of_week)) +
  geom_bar(aes(fill = day_of_week)) + # color for each weekday
  labs(
    title = "Number of Weekly Member Rides",
    subtitle = "January 2021 - October 2021",
    x = "Day of the Week",
    y = "Number of Rides",
    fill = "Day of the Week"
  ) +
  theme(axis.text.x = element_text(angle = 90)) + # turns xaxis labels 90 degrees counterclockwise
  facet_wrap(~month) # create a weekly plot for each month
member_weekly_plot

5.2 - Density Plots

Map Setup

###
## create map of chicago from https://cfss.uchicago.edu/notes/raster-maps-with-ggmap/
#
chi_bb <- c( # store bounding box coordinates
  left = -87.936287,
  bottom = 41.679835,
  right = -87.447052,
  top = 42.000835
)

chicago_stamen <- get_stamenmap( # retrieve map from Stamen Maps
  bbox = chi_bb,
  zoom = 11
)

chicago_map <- chicago_stamen
ggmap(chicago_map) # use ggmap to view

Casual Riders

casual_ride_density <- ggmap(chicago_map) + # map to show yearly ride density
  stat_density2d(data = casual_trips_df, aes(x=start_lng, y=start_lat), color="Deep Pink", fill=stat(nrow(casual_trips_df)), alpha=I(0.5), geom = "polygon") +
  labs(
    title = "Ride Density for Casual Riders",
    subtitle = "January 2021 - October 2021",
    x = "Longitude",
    y = "Latitude",
  )

casual_ride_density

## Warning: Removed 26981 rows containing non-finite values (stat_density2d).

Annual Members

member_ride_density <- ggmap(chicago_map) + # map to show ride yearly ride density
  stat_density2d(data = annual_member_trips_df, aes(x=start_lng, y=start_lat), color="Deep Pink", fill=stat(nrow(annual_member_trips_df)), alpha=I(0.5), geom = "polygon") +
  labs(
    title = "Ride Density for Members",
    subtitle = "January 2021 - October 2021",
    x = "Longitude",
    y = "Latitude",
  )

member_ride_density

## Warning: Removed 33910 rows containing non-finite values (stat_density2d).

6 - Conclusion

To sum up everthing that has been shown so far, we loaded the necessary packages and data, cleaned and filtered the data, manipulated the data frame to fit our needs, analyzed the data, and built some beautiful plots. Now it’s time to circle back to our business problem. We had two questions to conquerer.

How do annual members and casual riders use Cyclistic bikes differently?
How can Cyclystic convert casual riders into annual members and boost profits?

Findings

We found that for both casual and annual riders, the winter months (November, December, January, February, and most of March) are exceedingly lower in terms of the number of rides. And this makes sense, the winter months are typically cold, rainy, windy, and/or snowy in Chicago. Those conditions are not favorable for biking, even for the most dedicated users. However, the Spring, Summer, and early Fall months are the peak of Cyclystic’s users’ activity. The weather is fair, if not beautiful, allowing for an optimal environment for biking.

We also found that there are differences in how casual and annual members use this service. Casual riders have high activity on the weekend days in comparison the weekdays. Annual members have a different spread of activity. These users utilize the bikes more equally throughout the week as a whole with a slight emphasis on weekdays rather than weekends. This could indicate that these users are using this service to commute to and from work or to run to get lunch during a busy day.

Another interesting find was that most rides for both casual and annual riders where located in downtown Chicago. In addition, a lot of rides start up and down the water front area. We could infer that this is due to the city’s contract with Cyclystic in terms of where bikes are allowed to be located to avoid having them littering the streets. We could also infer that it would be an optimal area for users to get around and see what the city has to offer in a more efficient way that walking or Ubering everywhere.

Next Steps

Moving forward, there are some things I think Cyclystic should do if they want to convert casual riders into annual members.

Direct marketing efforts toward engaging users during the winter months.
Encourage more community building events (i.e. group rides) during weekdays to engage those who have historically only ridden on the weekends.
Expand geographical limits of the fleet to include more neighborhoods in the surrounding downtown area.

Google Data Analytics Certificate Case Study

Cyclystic - How does a bike-share navigate speedy success?

Austin Seo

12/13/2021

Overview

About the Company

Business Problem

The goal of this case study is to answer the question: “How do annual members and casual riders use Cyclistic bikes differently?”

1 - Load Packages

2 - Load Data

3 - Clean Data

3.1 - Create Data Frame of data

3.2 - Rename Columns

3.3 - Drop NAs

4 - Analyze

4.1 - Format columns

4.2 - Filter Data

4.3 - Mutate Data

Create df for Casual Riders

Create df for Annual Members

4.4 - Summary Statistics

5 - Plots

5.1 - Bar Charts

Casual Riders

Annual Members

5.2 - Density Plots

Map Setup

Casual Riders

Annual Members

6 - Conclusion

Findings

Next Steps