AI Tools User Research Report
Key Findings
This wave surveyed 1,200 users of AI coding tools between September and November 2024, using stratified random sampling across four user segments. The four metrics below summarize the overall health of the user base.
Satisfaction Trends
Monthly Satisfaction Score (last 6 months)
Overall satisfaction has climbed steadily since May, reaching 78% in October — the highest score recorded across all three research waves. The 70% baseline is the industry benchmark for this category.
Feature Satisfaction Scores
Code completion and intelligent suggestions score highest; batch processing and export capabilities score lowest and represent the clearest improvement opportunities.
Respondent Profile
Respondents were segmented by role and organizational context. Enterprise developers report the highest satisfaction; students show the highest churn risk.
| Segment | Count | Share | Avg. Satisfaction |
|---|---|---|---|
| Enterprise Developers | 396 | 33% | 84% |
| Freelance / Independent Devs | 312 | 26% | 80% |
| Academic Researchers | 276 | 23% | 75% |
| Students | 216 | 18% | 69% |
| Total | 1,200 | 100% | 78% |
Top 5 Improvement Requests
- Faster response times at peak hours — cited by 68% of power users as the top friction point
- Higher batch processing limits — freelancers and enterprise teams need to process 50+ files at once
- Richer export formats — support for Excel, JSON, and Notion integration was frequently requested
- Better conversation history management — search and categorization across past sessions
- More transparent pricing tiers — lightweight users feel the entry plan does not reflect their actual usage
Data Sources
- In-app survey distributed via push notification and email invitation
- Product telemetry (anonymized usage logs, session duration, feature activation)
- Five focus-group sessions (n = 8 per session) conducted over video call
- Support ticket tagging and NLP classification (5,200 tickets, Sep–Oct 2024)
- Publicly available developer sentiment data from Stack Overflow and GitHub Discussions
Research Timeline
The study followed a standard UX research process from study design through report delivery over approximately ten weeks.
Research team drafted objectives based on the Wave 2 retrospective and current product roadmap. Defined key research questions and selected methodology (mixed-methods: survey + focus groups).
Designed a 38-question structured questionnaire covering satisfaction scales, feature ratings, and open-ended questions. Pilot-tested with 20 internal volunteers; revised 6 items.
Distributed invitations via in-app notification and email to active users. Applied stratified quota sampling to match target segment ratios. Recruited 1,347 respondents.
Survey open for two weeks. Collected 1,347 raw responses; completion rate 89.1% (vs. 83.4% in Wave 2). Focus groups ran in parallel across the same window.
Removed 147 invalid responses (straight-liners, completion time < 90 s, > 30% missing). Final valid sample: 1,200 (valid rate 89.1%). Normalized Likert scales and encoded segment labels.
Descriptive statistics, variance analysis, and cluster modeling completed. Report reviewed by Product, Engineering, and Strategy leads; finalized and distributed to all stakeholders.
Research Workflow
The diagram below maps the full methodological lifecycle, from study design through final report delivery. Each stage has a hard dependency on the one before it.
Analysis Code
The script below is the actual data cleaning pipeline used in this study. It handles outlier removal, missing-value imputation, Likert scale normalization, and per-segment aggregation.
import pandas as pd
import numpy as np
# Load raw survey responses
df = pd.read_csv("survey_raw_wave3.csv", encoding="utf-8-sig")
# ── 1. Remove invalid responses ────────────────────────────────
# Drop straight-liners (zero variance across all Likert items)
likert_cols = [c for c in df.columns if c.startswith("q_sat_")]
df = df[df[likert_cols].std(axis=1) > 0]
# Drop responses completed in under 90 seconds (speed-clicking)
df = df[df["completion_seconds"] >= 90]
# Drop rows missing more than 30% of questions
threshold = int(df.shape[1] * 0.30)
df.dropna(thresh=df.shape[1] - threshold, inplace=True)
print(f"Valid responses after cleaning: {len(df)}") # → 1200
# ── 2. Normalize Likert scales (1–5 → 0–100%) ─────────────────
df[likert_cols] = (df[likert_cols] - 1) / 4 * 100
# ── 3. Compute weighted satisfaction index ─────────────────────
weights = {
"q_sat_feature": 0.40,
"q_sat_support": 0.30,
"q_sat_overall": 0.30,
}
df["satisfaction_index"] = sum(df[col] * w for col, w in weights.items())
# ── 4. Outlier removal via IQR ─────────────────────────────────
Q1 = df["satisfaction_index"].quantile(0.25)
Q3 = df["satisfaction_index"].quantile(0.75)
IQR = Q3 - Q1
df = df[
(df["satisfaction_index"] >= Q1 - 1.5 * IQR) &
(df["satisfaction_index"] <= Q3 + 1.5 * IQR)
]
# ── 5. Encode segment labels ───────────────────────────────────
segment_map = {
1: "Enterprise Developers",
2: "Freelance Developers",
3: "Academic Researchers",
4: "Students",
}
df["segment_label"] = df["user_segment"].map(segment_map)
# ── 6. Export clean dataset ────────────────────────────────────
df.to_csv("survey_clean_wave3.csv", index=False, encoding="utf-8-sig")
print("Data cleaning complete. Output: survey_clean_wave3.csv")
# ── 7. Per-segment summary ─────────────────────────────────────
summary = (
df.groupby("segment_label")["satisfaction_index"]
.agg(["mean", "std", "count"])
.round(2)
)
print(summary)
Limitations & Recommendations
All reported satisfaction scores are based on the clean sample (n = 1,200). The sampling error at a 95% confidence level is ±2.8 percentage points. All statistical tests use two-tailed hypotheses with a significance threshold of α = 0.05.
Enrich the analysis by fusing survey responses with behavioral telemetry (session logs, feature activation). This will reduce self-report bias and enable longitudinal tracking of satisfaction trajectories across waves.
Only active users were invited to participate; churned users are not represented. Overall satisfaction figures likely overstate the true user-base average. Always disclose this limitation when citing these results externally.
All raw survey responses are protected under GDPR and applicable privacy law. Individual-level data has been anonymized. Any re-identification attempt is a compliance violation. Access to raw data requires a formal request to the Data Governance Committee and written approval.
Image Example
Reports support three image layouts: full (full-width), left (left-float with text wrap), and right (right-float). Placeholders demonstrate each below.
When layout=left is used, the image floats to the left and body text wraps to the right — suitable for embedding explanatory screenshots or diagrams inline. The online survey used a responsive design optimized for both desktop and mobile. Each of the five pages contained eight to ten questions, with branching logic at key decision points to ensure each respondent received questions relevant to their usage scenario.
The median completion time was 7 minutes 52 seconds. Introduction of a progress indicator and auto-save reduced the mid-survey drop-off rate from 18.3% (Wave 2) to 9.7% (Wave 3), contributing to the record-high valid sample count.
Executive Summary
Across 1,200 valid responses, overall satisfaction with AI coding tools reached 78% — a new high for the three-wave study series. NPS rose to 42, and daily AI feature adoption climbed to 64%. These gains are most pronounced among enterprise developers and independent freelancers, both of whom report satisfaction above the overall average and strong retention intent.
Two structural constraints cap further satisfaction growth. First, system latency during peak hours is the top-cited pain point regardless of user segment. Second, batch-processing limits and limited export formats prevent heavy users from integrating the tool into professional workflows. Churn risk among students remains elevated at 18%, requiring a dedicated product and pricing strategy.
"If the batch limit went from 10 files to 50, I would upgrade to the Pro plan immediately. For our team, that is a genuine productivity leap — not a nice-to-have."
— Enterprise Developer, Wave 3 focus group, October 2024
Key Conclusions & Action Items
-
Prioritize technical stability
- Peak-hour latency must be addressed through capacity planning and caching before Q1 2025
- Hallucination-rate targets should be added as a core KPI in the next model evaluation cycle
-
Raise the ceiling for power users
- Increase batch processing limit to 50 files on the Pro plan; remove the cap on Enterprise
- Add export support for Excel, JSON, and Notion to reduce workflow integration friction
-
Develop a student retention strategy
- Introduce an academic-verified pricing tier with a lower paid conversion threshold
- Elevate learning-focused features (essay assistance, knowledge organization) on the product roadmap