Newest 'robots.txt' Questions

1 vote

1 answer

45 views

Show or hide app-root depending on ngIf on Angular 12+

I am trying to show/hide a development page from most of people. I changed the index.html like: <!DOCTYPE html> <html lang="en"> <head> </head> <body> <app-...

Tito

794

asked Dec 5 at 19:14

0 votes

1 answer

45 views

How do I write robots.txt with specific parameters?

I want Google to ignore(disallow) URLs like this: (Disallow) http://example.com/*/?page=2 But if the URL contains "article", I don't want to exclude it (allow) URLs like this: (Allow) ...

ishiodori

1

asked Nov 30 at 13:19

0 votes

0 answers

33 views

How to test robots.txt that has disallows urls that contain somekeyword in ASP.NET MVC web app

I have an ASP.NET MVC web application. I added robots.txt to disallow a url that contains a key "somewords". Please advise how can I test that a crawler is not allowed to access the page in ...

user3497702

841

asked Aug 12 at 16:13

1 vote

0 answers

30 views

The content of the public folder of nuxt3 is modified but not effective?

Please help. I developed a web using Nuxt3. I found that after running with nodejs, the new robots.txt file under my public folder, I modified the content of robots.txt, but refreshing my web did not ...

蜀云泉

11

asked Aug 9 at 9:15

0 votes

0 answers

17 views

How to Implement noindex tags at scale in Magento (particularly with Wildcards)

Good afternoon, I am wondering if there is a way to implement no index tags at scale in magento, much like you would in robots.txt with wildcard selectors. Example: example.com/catalogsearch/* Whereby ...

Jack Holliday

23

asked Jul 25 at 13:48

0 votes

0 answers

22 views

Issues with Googlebot crawling and negative number pages

I am dealing with very heavy load from Googlebot, I have hundreds of sites running on multiple IPs all on the same server and googlebot is constantly crawling these sites causing a lot of availability ...

Julien

2,309

asked Jul 3 at 17:06

-1 votes

1 answer

154 views

Query parameter causing error on semrush SEO

My requirement was to pre-fill one specific field of webform using query parameter and I have done it but due query parameter semrush (SEO) is claiming hundreds of pages as duplicate for example some ...

syed1234

835

asked Jun 25 at 7:17

0 votes

0 answers

60 views

How to solve Alternate page with proper canonical tag on search console - All Error related to query parameters?

I have found multiple error of Alternate page with proper canonical tag in search console. So how to disallow those query parameters in robots.txt file Error on these pages: https://example.com/es/ads/...

Avjot Thakur

1

asked Jun 21 at 9:44

-1 votes

1 answer

43 views

robots.txt working in local but not uat server

I am trying to the robots.txt on my UAT server. I am able to access the robots.txt locally with localhost:9001/food/robots.txt. However, I am not able to access it when it is on my UAT server with ...

JJJJJJ

63

asked Jun 11 at 8:36

0 votes

0 answers

122 views

Is automatic scraping forbidden on EURLEX?

Is it possible that the EU has removed the possibility to scrape the EURLEX Website? For a university course I am required to scrape the following website for all the listed documents in an html ...

Jonas Bäumer

72

asked Jun 6 at 12:55

1 vote

1 answer

310 views

Something is blocking Ahrefs bot on my WordPress site

I have a client with a WordPress site and I'm working with her to improve it overall. We've redesigned it, revamped content, relaunched it and we're getting more traffic. However, Ahrefs cannot load ...

MaseBase

820

asked May 9 at 19:12

0 votes

1 answer

236 views

Why is my astrojs website blocked from indexing?

I'm hosting an astrojs website on netlify. I keep getting a low lighthouse score for SEO, with the reason being that the page is blocked from indexing. The image is attached below. I'm following the ...

Alinferno

35

asked Apr 16 at 17:02

1 vote

1 answer

64 views

Can I write a robots.txt rule to forbid crawling URLs with an anchor part (using the hash character, #)?

I have a table of content plugin on my site which made some URLs with # at the beginning. like: https://example.com/#how_to_do_something which links to that header's part in the content. Its like it ...

Kimia Houshidari

11

asked Apr 13 at 7:24

2 votes

1 answer

160 views

How to disallow specific fragment URLs using robots.txt?

How do you disallow dynamic URLs that have the following path in the robots.txt file. https://example.com/blogs/news/nameofthepost#How-Much-do-they-cost The posts do have canonical set but that didn't ...

user1683449

51

asked Apr 1 at 20:25

2 votes

2 answers

327 views

Generate a video sitemap for Next.js

I'm currently working on a project where I can successfully generate a sitemap. Among the various sitemaps I've created, one is specifically for "videos". After conducting research, I ...

Lhony

98

asked Mar 15 at 15:16

0 votes

1 answer

93 views

Why robots.txt file is supposed to block sub folder but blocking some random files also

I was getting some strange URLs indexed for my site by adding files as folders. One sample URL is here https://www.plus2net.com/python/tkinter-scale.php/math.php I have a file tkinter-scale.php but ...

Mamata Mohapatra

16

asked Mar 9 at 2:16

1 vote

0 answers

972 views

Block GeedoProductSearch using robots.txt

i have robots.txt blocking user agent GeedoProductSearch but when i see logs i still see geedo accessing my page # START YOAST BLOCK # --------------------------- User-agent: * Crawl-delay: 100000 ...

kasper

301

asked Mar 4 at 9:39

0 votes

1 answer

314 views

Sitemap URL not defined in robots.txt Websiteplanet

This is what my robots.txt looks like: User-Agent: * Disallow: /info/privacy Disallow: /info/cookies Allow: / Allow : /search/Jaén Allow : /search/Tarragona Allow : /search/Rioja Sitemap: https://www....

Millard Ziadie Dos Santos Leca

11

asked Feb 18 at 19:22

0 votes

1 answer

97 views

Protect from httrack laravel webiste

I have added below rules in my Laravel website https://example.com/robots.txt User-agent: HTTrack Disallow: / User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: * Disallow: But ...

Loveyatri

19

asked Feb 14 at 15:29

3 votes

1 answer

3k views

How to disallow robots in .htaccess and robots.txt?

I tried to disallow Amazonbot to my website, and I tried to use robots.txt by adding these lines: User-agent: Amazonbot Disallow: / After several hours I noticed this robot did not follow robots.txt ...

Sallar Rabiei

772

asked Jan 24 at 4:02

1 vote

0 answers

179 views

Do LLM crawlers respect the robots meta tag?

It is currently possible to use robots.txt to disallow Large Language Model crawlers via user-agent strings: User-agent: GPTBot Disallow: / But this approach is very broad and while it works for site ...

MediaFormat

377

asked Dec 12, 2023 at 19:21

0 votes

1 answer

2k views

Allowing all access to robots.txt in nginx?

I have a website with what I think is an uncomplicated setup. In my main location / {...} block, I have a bunch of deny directives for specific IP addresses that seem to be malicious in some way. I ...

user9219182

349

asked Dec 1, 2023 at 15:22

0 votes

0 answers

529 views

Google Search Console - Excluded by noindex tag

I have a WordPress site with pages that Google Search Console (GSC) reports as "Excluded by ‘noindex’ tag" The robots.txt file accessed by the Yoast plugin is as follows though: User-Agent: *...

Dr Madvibe

138

asked Oct 26, 2023 at 11:37

-1 votes

1 answer

40 views

what does '.action' mean in a robots.txt file?

I'm looking at a robots.txt file that includes a line Disallow: /path/p.action. I would understad if it said /path/p* as forbidding accessing any page that begins with p. But what does action refer to?...

snapcrack

1,791

asked Oct 10, 2023 at 23:28

0 votes

1 answer

152 views

WP Site's Live Robots.txt Differs from local Robots.txt accessed by SFTP

I have a Wordpress site, hosted on WPEngine, that serves as a CMS for our website through an endpoint. On the Wordpress site, I have installed YoastSEO plugin, and have edited the robots.txt file to ...

Suchi

1

asked Oct 5, 2023 at 16:19

0 votes

0 answers

112 views

Check Url in robots.txt

I need to check if a certain URL in the robots.txt file is available for crawling by a certain agent. I'm using import urllib.robotparser rp = urllib.robotparser.RobotFileParser("https://rus-...

Максим Акулов

1

asked Sep 5, 2023 at 6:17

0 votes

1 answer

49 views

How to disallow nested folders with robots.txt?

In my robots.txt I have a disallowing rule like: User-agent: * Disallow: /_hcms/ I want to disallow anything placed inside of _hcms and in its nested folders, like /_hcms/a/ or /_hcms/b/. Should I ...

Evgeniy

2,575

asked Aug 30, 2023 at 8:46

2 votes

1 answer

2k views

How to implement robots.txt in nuxt.js

I have looked at different documentation and it is still not clear to me if robots.txt is put in the root of the server or in the root of the code itself. And I would like to know how it is ...

tomasraigal

39

asked Aug 28, 2023 at 12:10

2 votes

3 answers

1k views

Getting Unauthorised 401 for my nextjs 13 app on Search Console

I built out a nextjs 13 webapp using clerk for auth. The issue is that the homepage of the web app isn't discoverable on Google. when i Test Live URL, this is how it looks: enter image description ...

goyashy

33

asked Aug 2, 2023 at 5:06

0 votes

1 answer

2k views

Disable web crawling for Nextjs app from robots.txt for particular sub-domain only

I have my website deployed on vercel, The site is a Next js application directly (not using nginx or any other web serving servers) deployed on vercel. There are two domains assigned to the same ...

Anand Yadav

51

asked Jul 26, 2023 at 21:15

1 vote

1 answer

958 views

How do I scrape my correct submissions on LeetCode?

I am looking into how I can scrape my correct LeetCode submissions and upload them to Github. Being a beginner to web scraping, I read through a few blogs and I understand that we can use python ...

Nagavel Rajasekaran

23

asked Jul 18, 2023 at 6:44

0 votes

2 answers

1k views

Unable to find robots.txt file

In a Vue (3.4) app, I have created a robots.txt file at the root of my folder. I have deployed my website with the robots.txt file and I am unable to find it when typing the URL https://www.example....

g4rf4z

67

asked Jul 16, 2023 at 12:28

1 vote

1 answer

423 views

Can you rename robots.txt and favicon?

I want the following names in my server like this: (so all server setup and crawler stuff starts with a . to show up first in the list of files, and then my webpage files after in the list of files.) ....

Antarctica-UFO

73

asked Jul 9, 2023 at 7:32

0 votes

3 answers

1k views

Disallowing /wp-includes, /wp-content/plugins (cache and themes) affects my SEO score?

Disallowing /wp-includes, /wp-content/plugins (cache and themes) affects my SEO score? I want to have a clear SEO analyze for my website, and I want to be sure that disallowing some WP folders doesn't ...

odoojs

21

asked Jun 13, 2023 at 11:16

0 votes

2 answers

62 views

How to program the serving of two pages, accessible under the same URL, with only one of them getting indexed

Let's say you have a website hosted on example.org. That website has a single page, whose content is static if the client requesting is not logged in, but dynamic (according to the logged client) if ...

DevelJoe

1,262

asked Jun 10, 2023 at 23:32

0 votes

0 answers

266 views

How to stop crawling the admin with strapi and nuxt

I have a project created with strapi and nuxt. When i try to block the link admin.mywebsite.com from being crawled in the live version with the robots.txt, it doesn't work. I tried to disallow the /...

maipo89

1

asked Jun 6, 2023 at 15:52

0 votes

0 answers

71 views

Why is my Wordpress website not getting indexed on Google Search Console despite using Yoast SEO and manually creating a sitemap and robots.txt?

I have added this website in my Google Search Console but it's not getting indexed. It's a Wordpress website and I'm using Yoast SEO but since it didn't work out. I had to create Sitemap manually and ...

pkguy

13

asked Jun 2, 2023 at 2:50

0 votes

0 answers

39 views

Is Firebase domain.web.app is ignore by google robots? [duplicate]

I have 2 SPA in react and 1 static website deployed on Firebase. I wrote in the browser site: domain.web.app and my site didn't appear. If I have to add custom domain to fix this? One of my SPA was ...

NewHorse

13

asked May 10, 2023 at 20:19

1 vote

2 answers

2k views

Using robots.txt to exclude one specific user-agent and allowing all others?

It sounds like a simple question. Exclude the waybackmachine crawler (ia_archiver) and allow all other user agents. So I setup the robots.txt as follows: User-agent: * Sitemap: https://www.example....

Avatar

15.1k

asked May 7, 2023 at 5:56

1 vote

1 answer

441 views

Can a robots.txt disallow use an asterisk for product id wildcard?

Is the following valid in my robots.txt file? Disallow: /*?action=addwishlist&product_id=* rather than writing individually for every product like below: Disallow: /*?action=addwishlist&...

Jimil

650

asked May 2, 2023 at 0:58

1 vote

1 answer

394 views

Using X robot tags to in .htaccess file to de index query strings URL from Google

I am looking for a solution to deindex all the URLs with query strings ?te= from Google. From example I want to deindex all the URLs https://example.com/?te= from Google. Google has currently indexed ...

mehtab fatima

19

asked Apr 28, 2023 at 12:27

2 votes

2 answers

894 views

Google search console Live Indexing failed with Server error (5xx)

My website was crawling by google perfectly until this last February, 2023. The website didn't have any robots.txt until now. Suddenly Page Indexing live test is failing due to this error Failed: ...

Smith Dwayne

2,777

asked Apr 19, 2023 at 12:47

2 votes

0 answers

558 views

Multi-Page React App - Robot.txt is not valid

I have a multi-page React app which contains a robot.txt. After deploying the site I have noticed an error that wasn't present during development which is hindering the SEO. The error states that the ...

Jeremiah

101

asked Apr 6, 2023 at 15:27

0 votes

1 answer

362 views

FTP domain indexed by google

We are facing a very strange issue where google is indexing our FTP domain ftp.example.com. We do not have that as a subdomain and there is no root folder or any other files for it. So I am not sure ...

Parth Parikh

21

asked Mar 24, 2023 at 9:02

2 votes

1 answer

1k views

Robots.txt - blocking bots from adding to cart in WooCommerce

I'm not sure how good Google's robots.txt tester is and wondering if the following example of my robots.txt for me WooCommerce site will actually do the trick for blocking bots from adding to cart and ...

dubpir8

21

asked Mar 6, 2023 at 22:04

-1 votes

2 answers

2k views

Add robots.txt for Laravel 9+

I want to add robots.txt to my Laravel project but robots.txt packages I found are not compatible with Laravel 9+ so if you know there is any tutorial or package for latest version of Laravel, please ...

Leslie Joe

313

asked Feb 19, 2023 at 9:23

1 vote

0 answers

96 views

How to make Robots.txt and Sitemap.xml only accessable through search engine bots

Several website like Quora, Stackechange, and including Stackoverflow (https://stackoverflow.com/sitemap.xml) only access through the search engine crawlers (Google, Yahoo, Bing, etc). How can i do ...

Mehul Kumar

465

asked Feb 7, 2023 at 9:59

-3 votes

1 answer

590 views

Robots.txt file and Googlebot crawability

Will this robots.txt allow Googlebot to crawl my site or not? Disallow: / User-agent: Robozilla Disallow: / User-agent: * Disallow: Disallow: /cgi-bin/ Sitemap: https://koyal.pk/sitemap/sitemap.xml

Stream Koyal

1

asked Jan 9, 2023 at 9:22

0 votes

0 answers

297 views

Ho do I disallow urls for specific countries in robots.txt file?

I have multi-site in WordPress and have a setup like this domain.com (main version for Australia) domain.com/us domain.com/eu ... I want if Australian users to search on google, which should only show ...

Adeel Arshad

1

asked Jan 4, 2023 at 22:43

0 votes

1 answer

1k views

Disallow URLs with query params in Robots.txt

My site was hacked and google crawled some weird URLs. For e.g. www.tuppleapps.com/?andsd123 www.tuppleapps.com/?itlq7433 www.tuppleapps.com/?copz656 I want to disallow this URLs with query params ...

Suraj

586

asked Dec 14, 2022 at 7:34

Collectives™ on Stack Overflow

Related Tags