The Impact of Reviews on Customer Acquisition¶

Summary¶

To understand the impact of reviews on customer acquisition rates, we developed a causal model of how various factors influence customer acquisition and attempt to isolate the direct effect of Review Count.

We find strong evidence of a direct impact of increased review counts on customer acquisition rates. Specifically, we estimate that increasing a client review count by 23% directly nets roughly 1 additional customer per quarter, or roughly 4 additional customers per year. This means that the estimated impact of each additional review is much stronger for companies with relatively few reviews, and weaker for companies which already have a great number of reviews.

For example, a company that starts with only 5 reviews and then achieves an extra 50 is expected to acquire an additional 11 customers per quarter, or 44 new customers per year. In contrast, a company that starts with 500 reviews and receives an extra 50 reviews may not even notice a boost to their new customer acquisition, with a predicted average customer acquisition of roughly 0.45 new customers.

One important limitation of this analysis is that it does not incorporate the recency of reviews as a causal factor, which may strongly influence customer acquisition rates. This is discussed further in section 3.4.

Estimation Formula¶

The derived formula for estimating the intra-quarter lift in customers acquired is:

\[ L = 11.06*log_{10}(\frac{R_{current} + R_{extra}}{R_{current}}) \]

\(L\) is the number of _extra _customers acquired by end-of-quarter given we have an additional \(R_{extra}\) reviews at the beginning of the quarter.
\(R_{extra}\) is the number of extra reviews at the beginning of the quarter.
\(R_{current}\) is the current number of reviews.

By setting \(L\) to 1, we derive an estimated requirement of a 23% lift in review count to acquire 1 additional customer per quarter.

Predicted Impact by Company Size¶

If we assume there is no causal relationship between customer count and review count, then we can derive specific values for the L-Coefficient (11.06 in the general case) at different company sizes.

Company Size (# of Customers)	L-Coefficient	% Review Lift Required for 1 Additional Customer per Q
< 92	10.25	25.2%
93 - 151	10.59	24.2%
152 - 218	10.91	23.5%
219 - 328	11.36	22.5%
> 328	12.33	20.5%

This seems to indicate that larger companies require fewer reviews to achieve the same customer acquisition boost. However, the assumption of causal independence is strong and caution should be used interpreting these results. Sections 2 and 3 expand on these limitations.

Methodology¶

Selection and Cleaning¶

Companies with the following qualities are considered:

Companies with >80% of customer records have an associated Job, Payment, Invoice, or Case.
Companies that existed prior to January 1, 2021.

These selection criteria were chosen because we can be reasonably certain about customer counts for clients with integrations for the associated event types. Clients were then further filtered by eliminating companies with unusually high-frequency lifts in customer acquisition or review count (top 10%). Data was collected for the year of 2021.

Relevant distributions of the selected data are shown below.

review_customer_count_distributions

Causal Impact Estimation¶

We translate this problem into a causal estimation problem by considering beginning-of-quarter review count as a treatment variable, and considering intra-quarter customer lift as an outcome.

Definitions¶

The following variables will be used for the rest of the discussion.

\(R\): The Beginning-of-Quarter Review Count
\(R_{log}\): The base-10 logarithm of R
\(S\): The Beginning-of-Quarter Customer Count
\(L\): The Lift in Customers at End-Of-Quarter
\(T\): Seasonality Effect (Estimated by stratifying over Quarter)
\(UR\): Unobserved external effects on R
\(UL\): Unobserved external effects on L

We use a logarithmic scale for the review count based on a goodness-of-fit test with several transformations of R and its correlation with L. The correlation coefficient between \(R_{log}\) and \(L\) is r=0.36 (p < .000001), indicating a weak-to-moderate but strongly significant relationship between review count and customer lift. However, this correlation does not adequately account for confounders.

Causal Analysis¶

The causal structure assumed by our analysis is given below. Relationships represent a linear causal influence between two variables.

causal_model

This causal structure assumes that \(S\) is a confounder of \(R_{log}\) and \(L\). In reality, size is probably not a direct confounder, but a proxy of several other confounders for which we don’t have data, such as marketing budget and outreach. We use backdoor adjustment to account for the effect of \(S\). Conditional effects of seasonality on \(L\) are calculated and aggregated by conditioning on \(T\).

Our estimate for the direct causal effect of \(R_{log}\) on \(L\) is 11.06 ± 1.64, which indicates that a one-unit increase of \(R_{log}\) brings about an 11.06 increase in customer lift over 1 quarter.

We’ll call this 11.06 number the L-Coefficient. This gives us the formula described in Section 1.1:

\[ L = 11.06*log_{10}(\frac{R_{current} + R_{extra}}{R_{current}}) \]

By Company Size¶

If we apply our methodology across company size by assuming company size does not impact review count (assuming there is no S - \(R_{log}\) in the causal diagram) we get the following coefficients:

Company Size (# of Customers)	L-Coefficient
< 92	10.25
93 - 151	10.59
152 - 218	10.91
219 - 328	11.36
> 328	12.33

This indicates that the effect may increase in power as a company grows in size. However, because company size is assumed to be a confounder with review count and customer acquisition, the L-coefficient does not necessarily represent the direct impact of new reviews. In this scenario, it is difficult to disentangle the effect of new reviews from the natural “inertia” of larger companies on their baseline customer acquisition rate. As a result, it is possible that the overall effect is slightly over-estimated for larger companies and slightly under-estimated for small companies.

Limitations¶

Selection Bias¶

Due to incomplete or unreliable data, several filtering mechanisms were used to extract meaningful data points for this analysis. However, this leads to a high degree of selection bias.

Over-Representation of Certain Industries¶

Due to the selection criteria, many businesses which acquire customers more rapidly (restaurants, retail) could not be adequately considered. In order to provide a robust estimate of the causal impact of reviews for all clients, we would need a reliable and consistent measure of growth for these clients.

COVID Lockdowns¶

The data was collected for the year of 2021 as a compromise between completeness and the size of the available dataset, but this was an unusual year for many businesses due to COVID restrictions and lock-downs. It is unclear whether this had a meaningful impact on the direct causal effect of Review Count on Customer Acquisition, and what the nature of that relationship might be.

Seasonality Control¶

The quarter-sized time block was chosen as a way to balance the amount of data available for a particular time-slice and effectively controlling for seasonal effects. A drawback of this approach is that we have just 4 discrete time blocks. Although this could theoretically pose problems in analysis, we find the impact estimate to be steady (within an \(L\)-coefficient of 1) across all four quarters, which does suggest that seasonality has been adequately controlled for in the review impact estimation.

Missing Confounders and Mediators¶

Causal Estimation makes several strong assumptions about confounding variables. We have implicitly assumed that there are no confounders of the effect \(R_{log}\) on \(L\) except for \(S\), but such confounders likely do exist (e.g average employee performance, customer outreach investment). If the confounding effect of these unobserved variables is strong, the true causal impact has been overestimated.

Additionally, the causal model we use presumes (through necessity) that there are no mediating effects of review count on customer acquisition. If there are significant mediators, the total effect of reviews on customer acquisition should not meaningfully change, but the direct effect could have been overestimated or underestimated depending on the effect of the mediators.

Causal Relationship Linearity¶

The interactions between associated variables in this model are assumed to be linear (the logarithmic relationship is a consequence of transforming R as described in Section 2.2). This makes the results much easier to interpret, but will fail to account for complex interactions between variables. A more sophisticated estimate could be provided at the expense of explanatory value.

Unaccounted Review Recency¶

Review recency was not considered at all in this analysis. This choice was deliberate, as it simplified the causal model, provided access to more data, and allowed more interpretable estimates of the direct effect. However, in reality it may be a significant contributing factor to new customer acquisition, and further investigation is warranted to identify the importance of review recency.

All Review Mediums Created Equal¶

We use total review count as an one figure rather than stratifying across the different review mediums (facebook, google, etc.), which implicitly assumes that they are all equal with regards to their impact. Conditioning on the medium may be a worthwhile step in understanding how review impact may differ, and is an excellent direction for further analysis if it is warranted. However, the key limitation here is simply the amount of data available given our selection criteria. In order to perform robust analysis at this level of granularity, we need more reliable data from which to draw conclusions. Specifically, we need more reliable data on customer counts for our clients.

Appendix¶

Selected Data Distribution Summary¶

Distribution Summaries of Selected Data (supplementary to Selection and Cleaning).

	BOY Customer Count	BOY Review Count
# of Data Points in Q1	507	507
Mean	150	263
Standard Deviation	99	248
Minimum Value	3	4
25th Percentile	72	107
50th Percentile	128	192
75th Percentile	209	321
Maximum Value	470	2077