AWS CloudFront
CloudFront access logs record every request to your site - including the user-agent string that identifies AI crawlers like GPTBot, ClaudeBot, and PerplexityBot. By giving Sitefire read-only access to these logs, we can show you exactly which AI bots visit which pages, how often, and how that changes over time.
Time needed: ~15 minutes. Two steps: enable logging, create an IAM role.
How It Works
CloudFront writes a gzip-compressed log file to S3 for every batch of requests. Each line includes the URL path, timestamp, and user-agent. Sitefire assumes a read-only IAM role in your account, syncs new log files, filters for AI bot user-agents, and surfaces the insights in your dashboard.
This is the same cross-account IAM pattern used by Datadog, New Relic, and other SaaS tools. No credentials are shared. You stay in full control.
Step 1: Enable CloudFront Logging
If your distribution already has logging enabled, skip to Step 2.
New setup (v2)
Standard Logging v2 (launched November 2024) is the recommended option for new setups. It delivers logs to S3 without requiring bucket ACLs, and the console handles all permissions automatically.
Open the Logging tab
Go to CloudFront > Distributions > select your distribution > Logging tab > click Add.
Configure S3 delivery
- Select Amazon S3 as the destination
- Choose or create an S3 bucket (e.g.,
yourcompany-cf-logs) - Optionally set a prefix
- Output format: select W3C (our parser requires this format)
- Field selection: make sure cs(User-Agent) is included (it is by default)
The console automatically creates the required S3 bucket policy. No manual permission setup needed.
Save
Logs start appearing in your bucket within a few minutes.
Verify logs are flowing
Wait 5 minutes, then check that files are appearing in your bucket:
aws s3 ls s3://YOUR-BUCKET/YOUR-PREFIX/ --recursive --summarize | tail -3You should see .gz files. If the bucket is empty, double-check that logging is enabled on the correct distribution.
Step 2: Create an IAM Role for Sitefire
This role grants Sitefire read-only access to your log bucket - nothing else.
Your Account ID and External ID are shown in the Sitefire app. Go to Crawler Analytics → Connect CDN → AWS CloudFront to find them.
Create a new IAM role
Go to IAM > Roles > Create Role > select Custom trust policy.
Paste the following trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SITEFIRE_ACCOUNT_ID:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "UNIQUE_EXTERNAL_ID"
}
}
}
]
}Replace SITEFIRE_ACCOUNT_ID and UNIQUE_EXTERNAL_ID with the values shown in the Sitefire setup wizard.
Click Next.
Attach a permission policy
Click Create policy (opens in a new tab), switch to the JSON editor, and paste:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::YOUR-LOG-BUCKET",
"arn:aws:s3:::YOUR-LOG-BUCKET/*"
]
}
]
}Replace YOUR-LOG-BUCKET with your bucket name from Step 1.
Name the policy something like sitefireLogReaderPolicy, then save it.
Go back to the role creation tab, refresh the policy list, and attach the policy you just created. Click Next.
Name and create the role
Name the role sitefireLogReader (or similar) and click Create role.
Enter details in Sitefire
Go back to the Sitefire setup wizard and click I’ve created the IAM role. Enter:
- The Role ARN from the role summary page (e.g.,
arn:aws:iam::123456789012:role/sitefireLogReader) - Your S3 bucket name and prefix (if any)
Click Connect & Import to validate the connection and start syncing.
That’s it. Sitefire will validate the connection and start importing the last 7 days of AI bot traffic. You’ll see data in your dashboard within minutes.