- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Square is blocking sites from being indexed for search
in my robots.txt file the following rules are present.
User-agent: Googlebot Disallow: User-agent: Googlebot-Image Disallow:
The first rule will block the google search bot from indexing ANY page on the site
the second rule will block the google search bot from indexing ANY image on the site.
Search engine visibility is set to "visible to search engines" in the SEO settings page on the dashboard.
I was curious so I checked the robots.txt of a few other square sites and I'm seeing the same thing. I've talked to support, however they don't seem to understand and simply tell me to talk to google. Please review your robots.txt file so we can see if this is a widespread global issue. you can review the robots.txt file by going to www.yourdomain.com/robots.com in any web browser. please post your results.
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
HI @ZaraJo
So I am no expert, or novice even for that matter when ti comes to code. I did just do a quick Google search though, and pulled this from Googles Devs page.
If i'm understanding Google, there actually isn't anything wrong with your robots.txt file since there is no /nogooglebot/ after Disallow:
Example from Google of a bot that is not allowed to crawl.
User-agent: Googlebot Disallow: /nogooglebot/ User-agent: * Allow: / Sitemap: https://www.example.com/sitemap.xml
Meaning of code above;
- The user agent named Googlebot is not allowed to crawl any URL that starts with https://example.com/nogooglebot/.
- All other user agents are allowed to crawl the entire site. This could have been omitted and the result would be the same; the default behavior is that user agents are allowed to crawl the entire site.
- The site's sitemap file is located at https://www.example.com/sitemap.xml.
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Dan,
in the example you provided the Robots.txt file is instructing Google to not index only the specific page https://example.com/nogooglebot/ however the next line allows Google or any other bot to crawl the rest of the domain.
This screenshot is taken from my google search console page, The page is only listed as indexed because I have uploaded the sitemap to google, it's clearly showing that the googlebot is blocked by the robots.txt.
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
another good test is open a browser window and enter any site and view the contents of their robots.txt file. You won't find explicit disallow rules for the googlebot.
**for obviously reasons the site can't be a social media site or Google competitor or they'll have disallow rules for the Googlebot
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Ahh okay I see now! Thanks @ZaraJo . Yeah like I mentioned i'm definitely no expert haha. So that is strange, and probably one thing that I don't like about sites that host and "maintain" your site Is the limited ability to the back end..
Have you tried to change your visibility to "off" save, and the toggle it back on and save gain and try to crawl and see if that fixes it?
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Yes, I've tried to do just that. changed to off saved, and published. Toggle back to on, save and then re-published. same issue. My suspicion is that changing that setting should be updating the robots.txt file, there is most likely a job/task/service that runs in the background that should be performing the update and this is broken. though I've checked the robots.txt file for a handful of other square sites and I'm seeing the same issue. it could be a widespread problem with the robots.txt file. From searching through the archives in the weebly group, I've seen were there have been global issues with blocking uploads via robots.txt.
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Most people who are using the platform aren't very technical and may not know that the google bot is being blocked from crawling the site. People are potentially losing out on tons of traffic and losing dollars
- Subscribe to RSS Feed
- Mark Thread as New
- Mark Thread as Read
- Float this Thread for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
@AdamB , Do you happen to have any insight here on why this may be happening and now allowing googlebot to crawl?