Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
45 views

Show or hide app-root depending on ngIf on Angular 12+

I am trying to show/hide a development page from most of people. I changed the index.html like: <!DOCTYPE html> <html lang="en"> <head> </head> <body> <app-...
Tito's user avatar
  • 794
0 votes
1 answer
45 views

How do I write robots.txt with specific parameters?

I want Google to ignore(disallow) URLs like this: (Disallow) http://example.com/*/?page=2 But if the URL contains "article", I don't want to exclude it (allow) URLs like this: (Allow) ...
ishiodori's user avatar
0 votes
0 answers
33 views

How to test robots.txt that has disallows urls that contain somekeyword in ASP.NET MVC web app

I have an ASP.NET MVC web application. I added robots.txt to disallow a url that contains a key "somewords". Please advise how can I test that a crawler is not allowed to access the page in ...
user3497702's user avatar
1 vote
0 answers
30 views

The content of the public folder of nuxt3 is modified but not effective?

Please help. I developed a web using Nuxt3. I found that after running with nodejs, the new robots.txt file under my public folder, I modified the content of robots.txt, but refreshing my web did not ...
蜀云泉's user avatar
0 votes
0 answers
17 views

How to Implement noindex tags at scale in Magento (particularly with Wildcards)

Good afternoon, I am wondering if there is a way to implement no index tags at scale in magento, much like you would in robots.txt with wildcard selectors. Example: example.com/catalogsearch/* Whereby ...
Jack Holliday's user avatar
0 votes
0 answers
22 views

Issues with Googlebot crawling and negative number pages

I am dealing with very heavy load from Googlebot, I have hundreds of sites running on multiple IPs all on the same server and googlebot is constantly crawling these sites causing a lot of availability ...
Julien's user avatar
  • 2,309
-1 votes
1 answer
154 views

Query parameter causing error on semrush SEO

My requirement was to pre-fill one specific field of webform using query parameter and I have done it but due query parameter semrush (SEO) is claiming hundreds of pages as duplicate for example some ...
syed1234's user avatar
  • 835
0 votes
0 answers
60 views

How to solve Alternate page with proper canonical tag on search console - All Error related to query parameters?

I have found multiple error of Alternate page with proper canonical tag in search console. So how to disallow those query parameters in robots.txt file Error on these pages: https://example.com/es/ads/...
Avjot Thakur's user avatar
-1 votes
1 answer
43 views

robots.txt working in local but not uat server

I am trying to the robots.txt on my UAT server. I am able to access the robots.txt locally with localhost:9001/food/robots.txt. However, I am not able to access it when it is on my UAT server with ...
JJJJJJ's user avatar
  • 63
0 votes
0 answers
122 views

Is automatic scraping forbidden on EURLEX?

Is it possible that the EU has removed the possibility to scrape the EURLEX Website? For a university course I am required to scrape the following website for all the listed documents in an html ...
Jonas Bäumer's user avatar
1 vote
1 answer
310 views

Something is blocking Ahrefs bot on my WordPress site

I have a client with a WordPress site and I'm working with her to improve it overall. We've redesigned it, revamped content, relaunched it and we're getting more traffic. However, Ahrefs cannot load ...
MaseBase's user avatar
  • 820
0 votes
1 answer
236 views

Why is my astrojs website blocked from indexing?

I'm hosting an astrojs website on netlify. I keep getting a low lighthouse score for SEO, with the reason being that the page is blocked from indexing. The image is attached below. I'm following the ...
Alinferno's user avatar
1 vote
1 answer
64 views

Can I write a robots.txt rule to forbid crawling URLs with an anchor part (using the hash character, #)?

I have a table of content plugin on my site which made some URLs with # at the beginning. like: https://example.com/#how_to_do_something which links to that header's part in the content. Its like it ...
Kimia Houshidari's user avatar
2 votes
1 answer
160 views

How to disallow specific fragment URLs using robots.txt?

How do you disallow dynamic URLs that have the following path in the robots.txt file. https://example.com/blogs/news/nameofthepost#How-Much-do-they-cost The posts do have canonical set but that didn't ...
user1683449's user avatar
2 votes
2 answers
327 views

Generate a video sitemap for Next.js

I'm currently working on a project where I can successfully generate a sitemap. Among the various sitemaps I've created, one is specifically for "videos". After conducting research, I ...
Lhony's user avatar
  • 98
0 votes
1 answer
93 views

Why robots.txt file is supposed to block sub folder but blocking some random files also

I was getting some strange URLs indexed for my site by adding files as folders. One sample URL is here https://www.plus2net.com/python/tkinter-scale.php/math.php I have a file tkinter-scale.php but ...
Mamata Mohapatra's user avatar
1 vote
0 answers
972 views

Block GeedoProductSearch using robots.txt

i have robots.txt blocking user agent GeedoProductSearch but when i see logs i still see geedo accessing my page # START YOAST BLOCK # --------------------------- User-agent: * Crawl-delay: 100000 ...
kasper's user avatar
  • 301
0 votes
1 answer
314 views

Sitemap URL not defined in robots.txt Websiteplanet

This is what my robots.txt looks like: User-Agent: * Disallow: /info/privacy Disallow: /info/cookies Allow: / Allow : /search/Jaén Allow : /search/Tarragona Allow : /search/Rioja Sitemap: https://www....
Millard Ziadie Dos Santos Leca's user avatar
0 votes
1 answer
97 views

Protect from httrack laravel webiste

I have added below rules in my Laravel website https://example.com/robots.txt User-agent: HTTrack Disallow: / User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: * Disallow: But ...
Loveyatri's user avatar
3 votes
1 answer
3k views

How to disallow robots in .htaccess and robots.txt?

I tried to disallow Amazonbot to my website, and I tried to use robots.txt by adding these lines: User-agent: Amazonbot Disallow: / After several hours I noticed this robot did not follow robots.txt ...
Sallar Rabiei's user avatar
1 vote
0 answers
179 views

Do LLM crawlers respect the robots meta tag?

It is currently possible to use robots.txt to disallow Large Language Model crawlers via user-agent strings: User-agent: GPTBot Disallow: / But this approach is very broad and while it works for site ...
MediaFormat's user avatar
0 votes
1 answer
2k views

Allowing all access to robots.txt in nginx?

I have a website with what I think is an uncomplicated setup. In my main location / {...} block, I have a bunch of deny directives for specific IP addresses that seem to be malicious in some way. I ...
user9219182's user avatar
0 votes
0 answers
529 views

Google Search Console - Excluded by noindex tag

I have a WordPress site with pages that Google Search Console (GSC) reports as "Excluded by ‘noindex’ tag" The robots.txt file accessed by the Yoast plugin is as follows though: User-Agent: *...
Dr Madvibe's user avatar
-1 votes
1 answer
40 views

what does '.action' mean in a robots.txt file?

I'm looking at a robots.txt file that includes a line Disallow: /path/p.action. I would understad if it said /path/p* as forbidding accessing any page that begins with p. But what does action refer to?...
snapcrack's user avatar
  • 1,791
0 votes
1 answer
152 views

WP Site's Live Robots.txt Differs from local Robots.txt accessed by SFTP

I have a Wordpress site, hosted on WPEngine, that serves as a CMS for our website through an endpoint. On the Wordpress site, I have installed YoastSEO plugin, and have edited the robots.txt file to ...
Suchi's user avatar
  • 1
0 votes
0 answers
112 views

Check Url in robots.txt

I need to check if a certain URL in the robots.txt file is available for crawling by a certain agent. I'm using import urllib.robotparser rp = urllib.robotparser.RobotFileParser("https://rus-...
Максим Акулов's user avatar
0 votes
1 answer
49 views

How to disallow nested folders with robots.txt?

In my robots.txt I have a disallowing rule like: User-agent: * Disallow: /_hcms/ I want to disallow anything placed inside of _hcms and in its nested folders, like /_hcms/a/ or /_hcms/b/. Should I ...
Evgeniy's user avatar
  • 2,575
2 votes
1 answer
2k views

How to implement robots.txt in nuxt.js

I have looked at different documentation and it is still not clear to me if robots.txt is put in the root of the server or in the root of the code itself. And I would like to know how it is ...
tomasraigal's user avatar
2 votes
3 answers
1k views

Getting Unauthorised 401 for my nextjs 13 app on Search Console

I built out a nextjs 13 webapp using clerk for auth. The issue is that the homepage of the web app isn't discoverable on Google. when i Test Live URL, this is how it looks: enter image description ...
goyashy's user avatar
  • 33
0 votes
1 answer
2k views

Disable web crawling for Nextjs app from robots.txt for particular sub-domain only

I have my website deployed on vercel, The site is a Next js application directly (not using nginx or any other web serving servers) deployed on vercel. There are two domains assigned to the same ...
Anand Yadav's user avatar
1 vote
1 answer
958 views

How do I scrape my correct submissions on LeetCode?

I am looking into how I can scrape my correct LeetCode submissions and upload them to Github. Being a beginner to web scraping, I read through a few blogs and I understand that we can use python ...
Nagavel Rajasekaran's user avatar
0 votes
2 answers
1k views

Unable to find robots.txt file

In a Vue (3.4) app, I have created a robots.txt file at the root of my folder. I have deployed my website with the robots.txt file and I am unable to find it when typing the URL https://www.example....
g4rf4z's user avatar
  • 67
1 vote
1 answer
423 views

Can you rename robots.txt and favicon?

I want the following names in my server like this: (so all server setup and crawler stuff starts with a . to show up first in the list of files, and then my webpage files after in the list of files.) ....
Antarctica-UFO's user avatar
0 votes
3 answers
1k views

Disallowing /wp-includes, /wp-content/plugins (cache and themes) affects my SEO score?

Disallowing /wp-includes, /wp-content/plugins (cache and themes) affects my SEO score? I want to have a clear SEO analyze for my website, and I want to be sure that disallowing some WP folders doesn't ...
odoojs's user avatar
  • 21
0 votes
2 answers
62 views

How to program the serving of two pages, accessible under the same URL, with only one of them getting indexed

Let's say you have a website hosted on example.org. That website has a single page, whose content is static if the client requesting is not logged in, but dynamic (according to the logged client) if ...
DevelJoe's user avatar
  • 1,262
0 votes
0 answers
266 views

How to stop crawling the admin with strapi and nuxt

I have a project created with strapi and nuxt. When i try to block the link admin.mywebsite.com from being crawled in the live version with the robots.txt, it doesn't work. I tried to disallow the /...
maipo89's user avatar
0 votes
0 answers
71 views

Why is my Wordpress website not getting indexed on Google Search Console despite using Yoast SEO and manually creating a sitemap and robots.txt?

I have added this website in my Google Search Console but it's not getting indexed. It's a Wordpress website and I'm using Yoast SEO but since it didn't work out. I had to create Sitemap manually and ...
pkguy's user avatar
  • 13
0 votes
0 answers
39 views

Is Firebase domain.web.app is ignore by google robots? [duplicate]

I have 2 SPA in react and 1 static website deployed on Firebase. I wrote in the browser site: domain.web.app and my site didn't appear. If I have to add custom domain to fix this? One of my SPA was ...
NewHorse's user avatar
1 vote
2 answers
2k views

Using robots.txt to exclude one specific user-agent and allowing all others?

It sounds like a simple question. Exclude the waybackmachine crawler (ia_archiver) and allow all other user agents. So I setup the robots.txt as follows: User-agent: * Sitemap: https://www.example....
Avatar's user avatar
  • 15.1k
1 vote
1 answer
441 views

Can a robots.txt disallow use an asterisk for product id wildcard?

Is the following valid in my robots.txt file? Disallow: /*?action=addwishlist&product_id=* rather than writing individually for every product like below: Disallow: /*?action=addwishlist&...
Jimil's user avatar
  • 650
1 vote
1 answer
394 views

Using X robot tags to in .htaccess file to de index query strings URL from Google

I am looking for a solution to deindex all the URLs with query strings ?te= from Google. From example I want to deindex all the URLs https://example.com/?te= from Google. Google has currently indexed ...
mehtab fatima's user avatar
2 votes
2 answers
894 views

Google search console Live Indexing failed with Server error (5xx)

My website was crawling by google perfectly until this last February, 2023. The website didn't have any robots.txt until now. Suddenly Page Indexing live test is failing due to this error Failed: ...
Smith Dwayne's user avatar
  • 2,777
2 votes
0 answers
558 views

Multi-Page React App - Robot.txt is not valid

I have a multi-page React app which contains a robot.txt. After deploying the site I have noticed an error that wasn't present during development which is hindering the SEO. The error states that the ...
Jeremiah's user avatar
  • 101
0 votes
1 answer
362 views

FTP domain indexed by google

We are facing a very strange issue where google is indexing our FTP domain ftp.example.com. We do not have that as a subdomain and there is no root folder or any other files for it. So I am not sure ...
Parth Parikh's user avatar
2 votes
1 answer
1k views

Robots.txt - blocking bots from adding to cart in WooCommerce

I'm not sure how good Google's robots.txt tester is and wondering if the following example of my robots.txt for me WooCommerce site will actually do the trick for blocking bots from adding to cart and ...
dubpir8's user avatar
  • 21
-1 votes
2 answers
2k views

Add robots.txt for Laravel 9+

I want to add robots.txt to my Laravel project but robots.txt packages I found are not compatible with Laravel 9+ so if you know there is any tutorial or package for latest version of Laravel, please ...
Leslie Joe's user avatar
1 vote
0 answers
96 views

How to make Robots.txt and Sitemap.xml only accessable through search engine bots

Several website like Quora, Stackechange, and including Stackoverflow (https://stackoverflow.com/sitemap.xml) only access through the search engine crawlers (Google, Yahoo, Bing, etc). How can i do ...
Mehul Kumar's user avatar
-3 votes
1 answer
590 views

Robots.txt file and Googlebot crawability

Will this robots.txt allow Googlebot to crawl my site or not? Disallow: / User-agent: Robozilla Disallow: / User-agent: * Disallow: Disallow: /cgi-bin/ Sitemap: https://koyal.pk/sitemap/sitemap.xml
Stream Koyal's user avatar
0 votes
0 answers
297 views

Ho do I disallow urls for specific countries in robots.txt file?

I have multi-site in WordPress and have a setup like this domain.com (main version for Australia) domain.com/us domain.com/eu ... I want if Australian users to search on google, which should only show ...
Adeel Arshad's user avatar
0 votes
1 answer
1k views

Disallow URLs with query params in Robots.txt

My site was hacked and google crawled some weird URLs. For e.g. www.tuppleapps.com/?andsd123 www.tuppleapps.com/?itlq7433 www.tuppleapps.com/?copz656 I want to disallow this URLs with query params ...
Suraj's user avatar
  • 586

1
2 3 4 5
29