AI Tools User Research Report

User Research Team · November 20, 2024 · Wave 3

Key Findings

This wave surveyed 1,200 users of AI coding tools between September and November 2024, using stratified random sampling across four user segments. The four metrics below summarize the overall health of the user base.

Sample Size

participants

Satisfaction

▲ +6% vs. last wave

AI Adoption

▲ +9% vs. last wave

NPS

▲ +5 vs. last wave

Satisfaction Trends

Monthly Satisfaction Score (last 6 months)

Overall satisfaction has climbed steadily since May, reaching 78% in October — the highest score recorded across all three research waves. The 70% baseline is the industry benchmark for this category.

Feature Satisfaction Scores

Code completion and intelligent suggestions score highest; batch processing and export capabilities score lowest and represent the clearest improvement opportunities.

Respondent Profile

Respondents were segmented by role and organizational context. Enterprise developers report the highest satisfaction; students show the highest churn risk.

Segment	Count	Share	Avg. Satisfaction
Enterprise Developers	396	33%	84%
Freelance / Independent Devs	312	26%	80%
Academic Researchers	276	23%	75%
Students	216	18%	69%
Total	1,200	100%	78%

Top 5 Improvement Requests

Faster response times at peak hours — cited by 68% of power users as the top friction point
Higher batch processing limits — freelancers and enterprise teams need to process 50+ files at once
Richer export formats — support for Excel, JSON, and Notion integration was frequently requested
Better conversation history management — search and categorization across past sessions
More transparent pricing tiers — lightweight users feel the entry plan does not reflect their actual usage

Data Sources

In-app survey distributed via push notification and email invitation
Product telemetry (anonymized usage logs, session duration, feature activation)
Five focus-group sessions (n = 8 per session) conducted over video call
Support ticket tagging and NLP classification (5,200 tickets, Sep–Oct 2024)
Publicly available developer sentiment data from Stack Overflow and GitHub Discussions

Research Timeline

The study followed a standard UX research process from study design through report delivery over approximately ten weeks.

Sep 2, 2024

Study Design
Research team drafted objectives based on the Wave 2 retrospective and current product roadmap. Defined key research questions and selected methodology (mixed-methods: survey + focus groups).

Sep 9, 2024

Survey Creation
Designed a 38-question structured questionnaire covering satisfaction scales, feature ratings, and open-ended questions. Pilot-tested with 20 internal volunteers; revised 6 items.

Sep 16, 2024

Recruit Participants
Distributed invitations via in-app notification and email to active users. Applied stratified quota sampling to match target segment ratios. Recruited 1,347 respondents.

Sep 16 — Sep 30, 2024

Online Survey
Survey open for two weeks. Collected 1,347 raw responses; completion rate 89.1% (vs. 83.4% in Wave 2). Focus groups ran in parallel across the same window.

Oct 7, 2024

Data Cleaning
Removed 147 invalid responses (straight-liners, completion time < 90 s, > 30% missing). Final valid sample: 1,200 (valid rate 89.1%). Normalized Likert scales and encoded segment labels.

Nov 20, 2024

Analysis & Report
Descriptive statistics, variance analysis, and cluster modeling completed. Report reviewed by Product, Engineering, and Strategy leads; finalized and distributed to all stakeholders.

Research Workflow

The diagram below maps the full methodological lifecycle, from study design through final report delivery. Each stage has a hard dependency on the one before it.

Analysis Code

The script below is the actual data cleaning pipeline used in this study. It handles outlier removal, missing-value imputation, Likert scale normalization, and per-segment aggregation.

python · Data Cleaning Script

import pandas as pd
import numpy as np

# Load raw survey responses
df = pd.read_csv("survey_raw_wave3.csv", encoding="utf-8-sig")

# ── 1. Remove invalid responses ────────────────────────────────
# Drop straight-liners (zero variance across all Likert items)
likert_cols = [c for c in df.columns if c.startswith("q_sat_")]
df = df[df[likert_cols].std(axis=1) > 0]

# Drop responses completed in under 90 seconds (speed-clicking)
df = df[df["completion_seconds"] >= 90]

# Drop rows missing more than 30% of questions
threshold = int(df.shape[1] * 0.30)
df.dropna(thresh=df.shape[1] - threshold, inplace=True)

print(f"Valid responses after cleaning: {len(df)}")  # → 1200

# ── 2. Normalize Likert scales (1–5 → 0–100%) ─────────────────
df[likert_cols] = (df[likert_cols] - 1) / 4 * 100

# ── 3. Compute weighted satisfaction index ─────────────────────
weights = {
    "q_sat_feature":  0.40,
    "q_sat_support":  0.30,
    "q_sat_overall":  0.30,
}
df["satisfaction_index"] = sum(df[col] * w for col, w in weights.items())

# ── 4. Outlier removal via IQR ─────────────────────────────────
Q1  = df["satisfaction_index"].quantile(0.25)
Q3  = df["satisfaction_index"].quantile(0.75)
IQR = Q3 - Q1
df  = df[
    (df["satisfaction_index"] >= Q1 - 1.5 * IQR) &
    (df["satisfaction_index"] <= Q3 + 1.5 * IQR)
]

# ── 5. Encode segment labels ───────────────────────────────────
segment_map = {
    1: "Enterprise Developers",
    2: "Freelance Developers",
    3: "Academic Researchers",
    4: "Students",
}
df["segment_label"] = df["user_segment"].map(segment_map)

# ── 6. Export clean dataset ────────────────────────────────────
df.to_csv("survey_clean_wave3.csv", index=False, encoding="utf-8-sig")
print("Data cleaning complete. Output: survey_clean_wave3.csv")

# ── 7. Per-segment summary ─────────────────────────────────────
summary = (
    df.groupby("segment_label")["satisfaction_index"]
    .agg(["mean", "std", "count"])
    .round(2)
)
print(summary)

Limitations & Recommendations

ℹ️

Methodology Note
All reported satisfaction scores are based on the clean sample (n = 1,200). The sampling error at a 95% confidence level is ±2.8 percentage points. All statistical tests use two-tailed hypotheses with a significance threshold of α = 0.05.

💡

Recommendation for Wave 4
Enrich the analysis by fusing survey responses with behavioral telemetry (session logs, feature activation). This will reduce self-report bias and enable longitudinal tracking of satisfaction trajectories across waves.

⚠️

Voluntary Response Bias
Only active users were invited to participate; churned users are not represented. Overall satisfaction figures likely overstate the true user-base average. Always disclose this limitation when citing these results externally.

🔒

Privacy & Data Handling
All raw survey responses are protected under GDPR and applicable privacy law. Individual-level data has been anonymized. Any re-identification attempt is a compliance violation. Access to raw data requires a formal request to the Data Governance Committee and written approval.

Image Example

Reports support three image layouts: full (full-width), left (left-float with text wrap), and right (right-float). Placeholders demonstrate each below.

Satisfaction Heatmap — Fig. 1 — Satisfaction heatmap by segment and feature (replace with actual visualization)

Left-float — layout: left

Fig. 2 — Survey interface screenshot

When layout=left is used, the image floats to the left and body text wraps to the right — suitable for embedding explanatory screenshots or diagrams inline. The online survey used a responsive design optimized for both desktop and mobile. Each of the five pages contained eight to ten questions, with branching logic at key decision points to ensure each respondent received questions relevant to their usage scenario.

The median completion time was 7 minutes 52 seconds. Introduction of a progress indicator and auto-save reduced the mid-survey drop-off rate from 18.3% (Wave 2) to 9.7% (Wave 3), contributing to the record-high valid sample count.

Executive Summary

Across 1,200 valid responses, overall satisfaction with AI coding tools reached 78% — a new high for the three-wave study series. NPS rose to 42, and daily AI feature adoption climbed to 64%. These gains are most pronounced among enterprise developers and independent freelancers, both of whom report satisfaction above the overall average and strong retention intent.

Two structural constraints cap further satisfaction growth. First, system latency during peak hours is the top-cited pain point regardless of user segment. Second, batch-processing limits and limited export formats prevent heavy users from integrating the tool into professional workflows. Churn risk among students remains elevated at 18%, requiring a dedicated product and pricing strategy.

"If the batch limit went from 10 files to 50, I would upgrade to the Pro plan immediately. For our team, that is a genuine productivity leap — not a nice-to-have."
— Enterprise Developer, Wave 3 focus group, October 2024

Key Conclusions & Action Items

Prioritize technical stability
- Peak-hour latency must be addressed through capacity planning and caching before Q1 2025
- Hallucination-rate targets should be added as a core KPI in the next model evaluation cycle
Raise the ceiling for power users
- Increase batch processing limit to 50 files on the Pro plan; remove the cap on Enterprise
- Add export support for Excel, JSON, and Notion to reduce workflow integration friction
Develop a student retention strategy
- Introduce an academic-verified pricing tier with a lower paid conversion threshold
- Elevate learning-focused features (essay assistance, knowledge organization) on the product roadmap