Google Search Console API Query Builder
Build GSC API requests with dimensions, filters, and regex. Learn the 5K row bug, searchAppearance limitations, and undocumented quirks.
The GSC API lets you filter and group search performance data by dimensions like page, query, country, and device. Understanding how to build queries, and their undocumented bugs, is critical for reliable data extraction.
Implementation Examples
const response = await fetch(
`https://searchconsole.googleapis.com/webmasters/v3/sites/${encodeURIComponent(siteUrl)}/searchAnalytics/query`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${accessToken}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
startDate: '2025-01-01',
endDate: '2025-01-31',
dimensions: ['query', 'page'],
dimensionFilterGroups: [{
filters: [{
dimension: 'country',
operator: 'equals',
expression: 'usa'
}]
}],
rowLimit: 25000,
startRow: 0
})
}
)
import requests
from datetime import date, timedelta
def query_gsc(site_url, access_token, dimensions=['query'], start_days_ago=7):
"""Fetch GSC data for last N days"""
end_date = date.today() - timedelta(days=3) # Account for 2-3 day lag
start_date = end_date - timedelta(days=start_days_ago)
url = f'https://searchconsole.googleapis.com/webmasters/v3/sites/{site_url}/searchAnalytics/query'
payload = {
'startDate': start_date.isoformat(),
'endDate': end_date.isoformat(),
'dimensions': dimensions,
'rowLimit': 25000
}
headers = {
'Authorization': f'Bearer {access_token}',
'Content-Type': 'application/json'
}
response = requests.post(url, json=payload, headers=headers)
response.raise_for_status()
return response.json()
# Usage
data = query_gsc(
site_url='sc-domain:example.com',
access_token='ya29.a0Ae...',
dimensions=['page', 'query'],
start_days_ago=28
)
Key fields:
startDate(YYYY-MM-DD) - Start date for data collection.endDate(YYYY-MM-DD) - End date (inclusive).dimensions- Array of grouping keys (e.g.,query,page,date).dimensionFilterGroups- Nested filters for narrowing results (AND/OR logic).rowLimit- Max rows per response (up to 25,000).startRow- Zero-based index for pagination.
Dimensions Explained
Available Dimensions
| Dimension | Description | Example Values |
|---|---|---|
date | Daily breakdown | 2025-01-27 |
query | Search keywords | "gsc api query" |
page | Landing page URL | "https://example.com/article" |
country | User country (ISO 3166-1) | "usa", "gbr", "jpn" |
device | Device type | "DESKTOP", "MOBILE", "TABLET" |
searchAppearance | SERP feature type | "VIDEO", "RICH_RESULT" |
Dimension Combinations
You can request up to 7 dimensions in a single query, but some combinations have hidden costs:
Safe combinations (no data loss):
dateonlydate+countrydate+devicequeryonlypageonly
Lossy combinations (Google drops data):
page+query- ~66% impression loss on large sites (Google's documented behavior)date+query- Triggers 5K row bug (see below)- Any combination with
searchAppearance- Must be ONLY dimension
The 5K Row Bug
When querying with date + query dimensions, the API returns only 5,000 rows despite setting rowLimit: 25000.
Affected query:
{
dimensions: ['date', 'query'],
rowLimit: 25000 // Ignored! Returns 5,000 max
}
Workaround: Query by query alone, then make separate date-range queries per keyword:
// Step 1: Get top queries (works fine)
const queries = await queryGSC({
dimensions: ['query'],
rowLimit: 25000
})
// Step 2: Loop queries, get daily breakdown
for (const q of queries.rows) {
const dailyData = await queryGSC({
dimensions: ['date'],
dimensionFilterGroups: [{
filters: [{
dimension: 'query',
operator: 'equals',
expression: q.keys[0]
}]
}]
})
}
Source: Developer forums and API users consistently report this truncation issue when multiple dimensions are present. Google recommends BigQuery for datasets of this scale.
Filters Deep Dive
Filter Operators
| Operator | Behavior | Use Case |
|---|---|---|
equals | Exact match | country = "usa" |
notEquals | Exclude exact match | device != "TABLET" |
contains | Substring match | page contains "/blog/" |
notContains | Exclude substring | query not contains "brand" |
includingRegex | RE2 regex match | page matches "\/2025\/" |
excludingRegex | Exclude regex match | query excludes "^brand" |
Regex Filtering
Google officially added RE2 regex support to Search Console in April 2021, and to the API in October 2021. Syntax follows RE2 spec.
Example: Match blog posts from 2025:
{
dimensionFilterGroups: [{
filters: [{
dimension: 'page',
operator: 'includingRegex',
expression: '\\/blog\\/2025\\/[0-9]{2}\\/'
}]
}],
dimensions: ['page']
}
Example: Exclude branded queries:
{
'dimensionFilterGroups': [{
'filters': [{
'dimension': 'query',
'operator': 'excludingRegex',
'expression': '^(brand|company|product)'
}]
}],
'dimensions': ['query']
}
Regex is powerful but not anchored by default, use ^ (start) and $ (end) explicitly.
searchAppearance Filter Bug
The searchAppearance dimension has two critical bugs that remain unresolved as of March 2026:
Bug 1: Must be ONLY dimension
This fails:
{
dimensions: ['searchAppearance', 'page'], // ERROR
}
searchAppearance cannot combine with page, query, country, or device. Must query alone, then filter other dimensions separately.
Bug 2: notContains/notEquals return OPPOSITE results
When filtering searchAppearance with notContains or notEquals, the API returns the opposite of what you requested (returning only the rows you tried to exclude).
// Request: Exclude VIDEO results
{
dimensionFilterGroups: [{
filters: [{
dimension: 'searchAppearance',
operator: 'notEquals',
expression: 'VIDEO'
}]
}]
}
// Bug: Returns ONLY VIDEO results instead
Workaround: Manually filter results client-side. Google acknowledged this bug in early 2025 but Search Engine Roundtable confirmed it remains "under investigation" with no fix date.
Advanced Filter Patterns
Multiple Filters (AND Logic)
Filters in the same group are ANDed:
{
dimensionFilterGroups: [{
filters: [
{ dimension: 'country', operator: 'equals', expression: 'usa' },
{ dimension: 'device', operator: 'equals', expression: 'MOBILE' }
]
}]
}
// Returns: USA AND Mobile traffic only
Multiple Filter Groups (OR Logic)
Separate groups are ORed:
{
dimensionFilterGroups: [
{
filters: [
{ dimension: 'country', operator: 'equals', expression: 'usa' }
]
},
{
filters: [
{ dimension: 'country', operator: 'equals', expression: 'gbr' }
]
}
]
}
// Returns: USA OR UK traffic
Combining AND + OR
{
'dimensionFilterGroups': [
{
'filters': [
{'dimension': 'country', 'operator': 'equals', 'expression': 'usa'},
{'dimension': 'device', 'operator': 'equals', 'expression': 'MOBILE'}
]
},
{
'filters': [
{'dimension': 'country', 'operator': 'equals', 'expression': 'gbr'},
{'dimension': 'device', 'operator': 'equals', 'expression': 'DESKTOP'}
]
}
]
}
# Returns: (USA AND Mobile) OR (UK AND Desktop)
Data Loss Warning
Google's deep dive blog states:
"When you group by page and/or query, the system may drop some data to reduce cardinality."
Translation: Large sites lose ~66% of impression data when querying page + query together.
Why? Google pre-aggregates data to reduce storage. When you cross-reference page × query, the result set explodes (millions of combinations). Google drops low-traffic combinations.
Impact example:
// Query 1: Pages alone
{ dimensions: ['page'] }
// Returns: 10M total impressions
// Query 2: Pages + Queries
{ dimensions: ['page', 'query'] }
// Returns: 3.4M total impressions (66% data loss)
Workaround: Query dimensions separately when precision matters:
- Get top pages:
dimensions: ['page'] - Get top queries:
dimensions: ['query'] - For specific page, get queries:
dimensionFilterGroups: [{ filters: [{ dimension: 'page', ... }] }]
This avoids cross-dimensional data loss.
Sorting and Limits
No Sort Parameter
The API does not support custom sorting. Results are always sorted by clicks DESC.
You cannot sort by impressions, CTR, or position. Must sort client-side after fetching.
Source: AnalyticsEdge documentation on GSC API limitations
Pagination
Max 25,000 rows per request. For larger datasets, use startRow:
// Page 1
{ rowLimit: 25000, startRow: 0 }
// Page 2
{ rowLimit: 25000, startRow: 25000 }
// Page 3
{ rowLimit: 25000, startRow: 50000 }
Daily limit: 50,000 rows per property. Two requests max before hitting quota.
Anonymized Queries
GSC hides queries with <few dozen users over 2-3 months (exact threshold undocumented).
These "anonymized queries" are:
- Included in totals (clicks/impressions aggregated)
- Excluded from query dimension (missing from results)
Example:
{ dimensions: ['query'] }
// Returns: 500 queries, 10K clicks
// But totals show:
// 12K clicks (2K from anonymized queries)
You cannot retrieve anonymized queries via API. They exist only in aggregate metrics.
gscdump Query Builder
gscdump stores GSC data in SQLite (D1) and exposes Drizzle-style query syntax:
// Native GSC API (complex)
await fetch('https://searchconsole.googleapis.com/...', {
body: JSON.stringify({
startDate: '2025-01-01',
endDate: '2025-01-31',
dimensions: ['query'],
dimensionFilterGroups: [{
filters: [{
dimension: 'query',
operator: 'includingRegex',
expression: '^mcp'
}]
}]
})
})
-- gscdump MCP (simple)
SELECT query, SUM(clicks) as clicks
FROM gsc_keywords
WHERE date BETWEEN '2025-01-01' AND '2025-01-31'
AND query LIKE 'mcp%'
GROUP BY query
ORDER BY clicks DESC
gscdump removes:
- 25k row limit (query unlimited historical data)
- 5K row bug (dimensions work correctly)
- searchAppearance bugs (data stored correctly)
- Data loss (no pre-aggregation)
- Rate limits (query your own DB)
Common Query Patterns
Top Keywords by Clicks
{
startDate: '2025-01-01',
endDate: '2025-01-31',
dimensions: ['query'],
rowLimit: 100
}
Pages Losing Traffic (Month-over-Month)
# Get current month
current = query_gsc(dimensions=['page'], start_date='2025-01-01', end_date='2025-01-31')
# Get previous month
previous = query_gsc(dimensions=['page'], start_date='2024-12-01', end_date='2024-12-31')
# Compare client-side
for page in current['rows']:
prev_clicks = find_page_clicks(previous, page['keys'][0])
diff = page['clicks'] - prev_clicks
if diff < -100:
print(f"Declining: {page['keys'][0]} ({diff} clicks)")
Mobile vs Desktop Performance
// Mobile
{
dimensions: ['page'],
dimensionFilterGroups: [{
filters: [{ dimension: 'device', operator: 'equals', expression: 'MOBILE' }]
}]
}
// Desktop (separate query)
{
dimensions: ['page'],
dimensionFilterGroups: [{
filters: [{ dimension: 'device', operator: 'equals', expression: 'DESKTOP' }]
}]
}
Striking Distance Keywords (Position 4-15)
GSC API doesn't filter by position directly. Fetch all, filter client-side:
data = query_gsc(dimensions=['query'])
striking_distance = [
row for row in data.get('rows', [])
if 4 <= row['position'] <= 15 and row['impressions'] > 100
]
# Sort by impressions (opportunity size)
striking_distance.sort(key=lambda x: x['impressions'], reverse=True)
Brand vs Non-Brand Traffic
// Brand queries
{
dimensions: ['query'],
dimensionFilterGroups: [{
filters: [{
dimension: 'query',
operator: 'includingRegex',
expression: '(brand|company|product)'
}]
}]
}
// Non-brand (query separately, subtract)
{
dimensions: ['query'],
dimensionFilterGroups: [{
filters: [{
dimension: 'query',
operator: 'excludingRegex',
expression: '(brand|company|product)'
}]
}]
}
Best Practices
1. Account for 2-3 day lag
Don't query today's date. Always subtract 3 days:
const endDate = new Date()
endDate.setDate(endDate.getDate() - 3)
2. Avoid lossy combinations
Never query page + query for accurate totals. Query separately.
3. Use regex for complex filters
contains is slow on large datasets. Regex is optimized:
// Slow
{ operator: 'contains', expression: '/blog/' }
// Fast
{ operator: 'includingRegex', expression: '\\/blog\\/' }
4. Paginate large results
Don't assume <25k rows. Always implement pagination:
all_rows = []
start_row = 0
while True:
data = query_gsc(row_limit=25000, start_row=start_row)
rows = data.get('rows', [])
if not rows:
break
all_rows.extend(rows)
start_row += 25000
if len(rows) < 25000: # Last page
break
5. Cache aggressively
GSC data updates once daily. Cache responses for 24 hours:
const cacheKey = `gsc:${siteUrl}:${hash(query)}`
const cached = await cache.get(cacheKey)
if (cached) return cached
const data = await queryGSC(...)
await cache.set(cacheKey, data, { ttl: 86400 }) // 24h
Limitations Summary
| Issue | Impact | Workaround |
|---|---|---|
5K row bug with date+query | Returns 5k instead of 25k | Query dimensions separately |
searchAppearance bugs | notContains returns opposite | Filter client-side |
Data loss with page+query | 66% impressions missing | Query dimensions separately |
| No sort parameter | Always sorted by clicks | Sort client-side |
| 25k row limit | Large sites need pagination | Use startRow + loop |
| Anonymized queries | Missing from results | Accept data gap |
Next Steps
- Rate Limits - Understand API quotas and 429 errors
- Authentication - OAuth setup for API access
- MCP Server - Query GSC data with AI
Why gscdump Exists
The GSC API's bugs, limits, and data loss make reliable querying difficult. gscdump syncs your full dataset daily, stores it without limits, and fixes API quirks:
- No 5K row bug (query any dimension combination)
- No 25K row limit (query millions of rows)
- No data loss (raw data stored before aggregation)
- No rate limits (query your own database)
Try gscdump free: gscdump.com