1,439 questions
1
vote
1
answer
45
views
Show or hide app-root depending on ngIf on Angular 12+
I am trying to show/hide a development page from most of people.
I changed the index.html like:
<!DOCTYPE html>
<html lang="en">
<head>
</head>
<body>
<app-...
0
votes
1
answer
45
views
How do I write robots.txt with specific parameters?
I want Google to ignore(disallow) URLs like this:
(Disallow) http://example.com/*/?page=2
But if the URL contains "article",
I don't want to exclude it (allow) URLs like this:
(Allow) ...
0
votes
0
answers
33
views
How to test robots.txt that has disallows urls that contain somekeyword in ASP.NET MVC web app
I have an ASP.NET MVC web application. I added robots.txt to disallow a url that contains a key "somewords".
Please advise how can I test that a crawler is not allowed to access the page in ...
1
vote
0
answers
30
views
The content of the public folder of nuxt3 is modified but not effective?
Please help. I developed a web using Nuxt3. I found that after running with nodejs, the new robots.txt file under my public folder, I modified the content of robots.txt, but refreshing my web did not ...
0
votes
0
answers
17
views
How to Implement noindex tags at scale in Magento (particularly with Wildcards)
Good afternoon,
I am wondering if there is a way to implement no index tags at scale in magento, much like you would in robots.txt with wildcard selectors. Example:
example.com/catalogsearch/*
Whereby ...
0
votes
0
answers
22
views
Issues with Googlebot crawling and negative number pages
I am dealing with very heavy load from Googlebot, I have hundreds of sites running on multiple IPs all on the same server and googlebot is constantly crawling these sites causing a lot of availability ...
-1
votes
1
answer
154
views
Query parameter causing error on semrush SEO
My requirement was to pre-fill one specific field of webform using query parameter and I have done it but due query parameter semrush (SEO) is claiming hundreds of pages as duplicate for example some ...
0
votes
0
answers
60
views
How to solve Alternate page with proper canonical tag on search console - All Error related to query parameters?
I have found multiple error of Alternate page with proper canonical tag in search console. So how to disallow those query parameters in robots.txt file
Error on these pages:
https://example.com/es/ads/...
-1
votes
1
answer
43
views
robots.txt working in local but not uat server
I am trying to the robots.txt on my UAT server. I am able to access the robots.txt locally with localhost:9001/food/robots.txt.
However, I am not able to access it when it is on my UAT server with ...
0
votes
0
answers
122
views
Is automatic scraping forbidden on EURLEX?
Is it possible that the EU has removed the possibility to scrape the EURLEX Website?
For a university course I am required to scrape the following website for all the listed documents in an html ...
1
vote
1
answer
310
views
Something is blocking Ahrefs bot on my WordPress site
I have a client with a WordPress site and I'm working with her to improve it overall. We've redesigned it, revamped content, relaunched it and we're getting more traffic.
However, Ahrefs cannot load ...
0
votes
1
answer
236
views
Why is my astrojs website blocked from indexing?
I'm hosting an astrojs website on netlify. I keep getting a low lighthouse score for SEO, with the reason being that the page is blocked from indexing. The image is attached below.
I'm following the ...
1
vote
1
answer
64
views
Can I write a robots.txt rule to forbid crawling URLs with an anchor part (using the hash character, #)?
I have a table of content plugin on my site which made some URLs with # at the beginning.
like: https://example.com/#how_to_do_something which links to that header's part in the content.
Its like it ...
2
votes
1
answer
160
views
How to disallow specific fragment URLs using robots.txt?
How do you disallow dynamic URLs that have the following path in the robots.txt file.
https://example.com/blogs/news/nameofthepost#How-Much-do-they-cost
The posts do have canonical set but that didn't ...
2
votes
2
answers
327
views
Generate a video sitemap for Next.js
I'm currently working on a project where I can successfully generate a sitemap. Among the various sitemaps I've created, one is specifically for "videos". After conducting research, I ...
0
votes
1
answer
93
views
Why robots.txt file is supposed to block sub folder but blocking some random files also
I was getting some strange URLs indexed for my site by adding files as folders. One sample URL is here
https://www.plus2net.com/python/tkinter-scale.php/math.php
I have a file tkinter-scale.php but ...
1
vote
0
answers
972
views
Block GeedoProductSearch using robots.txt
i have robots.txt blocking user agent GeedoProductSearch but when i see logs i still see geedo accessing my page
# START YOAST BLOCK
# ---------------------------
User-agent: *
Crawl-delay: 100000
...
0
votes
1
answer
314
views
Sitemap URL not defined in robots.txt Websiteplanet
This is what my robots.txt looks like:
User-Agent: *
Disallow: /info/privacy
Disallow: /info/cookies
Allow: /
Allow : /search/Jaén
Allow : /search/Tarragona
Allow : /search/Rioja
Sitemap: https://www....
0
votes
1
answer
97
views
Protect from httrack laravel webiste
I have added below rules in my Laravel website https://example.com/robots.txt
User-agent: HTTrack
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Disallow:
But ...
3
votes
1
answer
3k
views
How to disallow robots in .htaccess and robots.txt?
I tried to disallow Amazonbot to my website, and I tried to use robots.txt by adding these lines:
User-agent: Amazonbot
Disallow: /
After several hours I noticed this robot did not follow robots.txt ...
1
vote
0
answers
179
views
Do LLM crawlers respect the robots meta tag?
It is currently possible to use robots.txt to disallow Large Language Model crawlers via user-agent strings:
User-agent: GPTBot
Disallow: /
But this approach is very broad and while it works for site ...
0
votes
1
answer
2k
views
Allowing all access to robots.txt in nginx?
I have a website with what I think is an uncomplicated setup. In my main location / {...} block, I have a bunch of deny directives for specific IP addresses that seem to be malicious in some way. I ...
0
votes
0
answers
529
views
Google Search Console - Excluded by noindex tag
I have a WordPress site with pages that Google Search Console (GSC) reports as "Excluded by ‘noindex’ tag"
The robots.txt file accessed by the Yoast plugin is as follows though:
User-Agent: *...
-1
votes
1
answer
40
views
what does '.action' mean in a robots.txt file?
I'm looking at a robots.txt file that includes a line Disallow: /path/p.action. I would understad if it said /path/p* as forbidding accessing any page that begins with p. But what does action refer to?...
0
votes
1
answer
152
views
WP Site's Live Robots.txt Differs from local Robots.txt accessed by SFTP
I have a Wordpress site, hosted on WPEngine, that serves as a CMS for our website through an endpoint.
On the Wordpress site, I have installed YoastSEO plugin, and have edited the robots.txt file to ...
0
votes
0
answers
112
views
Check Url in robots.txt
I need to check if a certain URL in the robots.txt file is available for crawling by a certain agent.
I'm using
import urllib.robotparser
rp = urllib.robotparser.RobotFileParser("https://rus-...
0
votes
1
answer
49
views
How to disallow nested folders with robots.txt?
In my robots.txt I have a disallowing rule like:
User-agent: *
Disallow: /_hcms/
I want to disallow anything placed inside of _hcms and in its nested folders, like /_hcms/a/ or /_hcms/b/.
Should I ...
2
votes
1
answer
2k
views
How to implement robots.txt in nuxt.js
I have looked at different documentation and it is still not clear to me if robots.txt is put in the root of the server or in the root of the code itself.
And I would like to know how it is ...
2
votes
3
answers
1k
views
Getting Unauthorised 401 for my nextjs 13 app on Search Console
I built out a nextjs 13 webapp using clerk for auth. The issue is that the homepage of the web app isn't discoverable on Google. when i Test Live URL, this is how it looks:
enter image description ...
0
votes
1
answer
2k
views
Disable web crawling for Nextjs app from robots.txt for particular sub-domain only
I have my website deployed on vercel, The site is a Next js application directly (not using nginx or any other web serving servers) deployed on vercel. There are two domains assigned to the same ...
1
vote
1
answer
958
views
How do I scrape my correct submissions on LeetCode?
I am looking into how I can scrape my correct LeetCode submissions and upload them to Github. Being a beginner to web scraping, I read through a few blogs and I understand that we can use python ...
0
votes
2
answers
1k
views
Unable to find robots.txt file
In a Vue (3.4) app, I have created a robots.txt file at the root of my folder. I have deployed my website with the robots.txt file and I am unable to find it when typing the URL https://www.example....
1
vote
1
answer
423
views
Can you rename robots.txt and favicon?
I want the following names in my server like this:
(so all server setup and crawler stuff starts with a . to show up first in the list of files, and then my webpage files after in the list of files.)
....
0
votes
3
answers
1k
views
Disallowing /wp-includes, /wp-content/plugins (cache and themes) affects my SEO score?
Disallowing /wp-includes, /wp-content/plugins (cache and themes) affects my SEO score?
I want to have a clear SEO analyze for my website, and I want to be sure that disallowing some WP folders doesn't ...
0
votes
2
answers
62
views
How to program the serving of two pages, accessible under the same URL, with only one of them getting indexed
Let's say you have a website hosted on example.org. That website has a single page, whose content is static if the client requesting is not logged in, but dynamic (according to the logged client) if ...
0
votes
0
answers
266
views
How to stop crawling the admin with strapi and nuxt
I have a project created with strapi and nuxt. When i try to block the link admin.mywebsite.com from being crawled in the live version with the robots.txt, it doesn't work. I tried to disallow the /...
0
votes
0
answers
71
views
Why is my Wordpress website not getting indexed on Google Search Console despite using Yoast SEO and manually creating a sitemap and robots.txt?
I have added this website in my Google Search Console but it's not getting indexed. It's a Wordpress website and I'm using Yoast SEO but since it didn't work out. I had to create Sitemap manually and ...
0
votes
0
answers
39
views
Is Firebase domain.web.app is ignore by google robots? [duplicate]
I have 2 SPA in react and 1 static website deployed on Firebase.
I wrote in the browser site: domain.web.app and my site didn't appear.
If I have to add custom domain to fix this?
One of my SPA was ...
1
vote
2
answers
2k
views
Using robots.txt to exclude one specific user-agent and allowing all others?
It sounds like a simple question. Exclude the waybackmachine crawler (ia_archiver) and allow all other user agents.
So I setup the robots.txt as follows:
User-agent: *
Sitemap: https://www.example....
1
vote
1
answer
441
views
Can a robots.txt disallow use an asterisk for product id wildcard?
Is the following valid in my robots.txt file?
Disallow: /*?action=addwishlist&product_id=*
rather than writing individually for every product like below:
Disallow: /*?action=addwishlist&...
1
vote
1
answer
394
views
Using X robot tags to in .htaccess file to de index query strings URL from Google
I am looking for a solution to deindex all the URLs with query strings ?te= from Google. From example I want to deindex all the URLs https://example.com/?te= from Google.
Google has currently indexed ...
2
votes
2
answers
894
views
Google search console Live Indexing failed with Server error (5xx)
My website was crawling by google perfectly until this last February, 2023. The website didn't have any robots.txt until now. Suddenly Page Indexing live test is failing due to this error
Failed: ...
2
votes
0
answers
558
views
Multi-Page React App - Robot.txt is not valid
I have a multi-page React app which contains a robot.txt. After deploying the site I have noticed an error that wasn't present during development which is hindering the SEO. The error states that the ...
0
votes
1
answer
362
views
FTP domain indexed by google
We are facing a very strange issue where google is indexing our FTP domain ftp.example.com.
We do not have that as a subdomain and there is no root folder or any other files for it.
So I am not sure ...
2
votes
1
answer
1k
views
Robots.txt - blocking bots from adding to cart in WooCommerce
I'm not sure how good Google's robots.txt tester is and wondering if the following example of my robots.txt for me WooCommerce site will actually do the trick for blocking bots from adding to cart and ...
-1
votes
2
answers
2k
views
Add robots.txt for Laravel 9+
I want to add robots.txt to my Laravel project but robots.txt packages I found are not compatible with Laravel 9+ so if you know there is any tutorial or package for latest version of Laravel, please ...
1
vote
0
answers
96
views
How to make Robots.txt and Sitemap.xml only accessable through search engine bots
Several website like Quora, Stackechange, and including Stackoverflow (https://stackoverflow.com/sitemap.xml) only access through the search engine crawlers (Google, Yahoo, Bing, etc).
How can i do ...
-3
votes
1
answer
590
views
Robots.txt file and Googlebot crawability
Will this robots.txt allow Googlebot to crawl my site or not?
Disallow: /
User-agent: Robozilla
Disallow: /
User-agent: *
Disallow:
Disallow: /cgi-bin/
Sitemap: https://koyal.pk/sitemap/sitemap.xml
0
votes
0
answers
297
views
Ho do I disallow urls for specific countries in robots.txt file?
I have multi-site in WordPress and have a setup like this
domain.com (main version for Australia)
domain.com/us
domain.com/eu
...
I want if Australian users to search on google, which should only show ...
0
votes
1
answer
1k
views
Disallow URLs with query params in Robots.txt
My site was hacked and google crawled some weird URLs. For e.g.
www.tuppleapps.com/?andsd123
www.tuppleapps.com/?itlq7433
www.tuppleapps.com/?copz656
I want to disallow this URLs with query params ...