Get the subdomain from a URL

Question

Getting the subdomain from a URL sounds easy at first.

http://www.domain.example

Scan for the first period then return whatever came after the "http://" ...

Then you remember

http://super.duper.domain.example

Oh. So then you think, okay, find the last period, go back a word and get everything before!

Then you remember

http://super.duper.domain.co.uk

And you're back to square one. Anyone have any great ideas besides storing a list of all TLDs?

This question has already been asked here: Getting Parts of a URL Edit: A similar question has been asked here : ) — jb., Commented Nov 14, 2008 at 0:02
Cam you clarify what you want? It seems that you're after the "official" domain part of the URL (i.e. domain.co.uk), regardless of how many DNS labels appear before it? — Alnitak, Commented Nov 14, 2008 at 0:15
I don't think it's the same question - this seems to be more about the administrative cuts in the domain name which can't be worked out just by looking at the string — Alnitak, Commented Nov 14, 2008 at 0:21

Michael come lately · Accepted Answer · 2019-04-24 20:40:18Z

Anyone have any great ideas besides storing a list of all TLDs?

No, because each TLD differs on what counts as a subdomain, second level domain, etc.

Keep in mind that there are top level domains, second level domains, and subdomains. Technically speaking, everything except the TLD is a subdomain.

In the domain.com.uk example, "domain" is a subdomain, "com" is a second level domain, and "uk" is the TLD.

So the question remains more complex than at first blush, and it depends on how each TLD is managed. You'll need a database of all the TLDs that include their particular partitioning, and what counts as a second level domain and a subdomain. There aren't too many TLDs, though, so the list is reasonably manageable, but collecting all that information isn't trivial. There may already be such a list available.

Looks like http://publicsuffix.org/ is one such list—all the common suffixes (.com, .co.uk, etc) in a list suitable for searching. It still won't be easy to parse it, but at least you don't have to maintain the list.

A "public suffix" is one under which Internet users can directly register names. Some examples of public suffixes are ".com", ".co.uk" and "pvt.k12.wy.us". The Public Suffix List is a list of all known public suffixes.

The Public Suffix List is an initiative of the Mozilla Foundation. It is available for use in any software, but was originally created to meet the needs of browser manufacturers. It allows browsers to, for example:

Avoid privacy-damaging "supercookies" being set for high-level domain name suffixes

Highlight the most important part of a domain name in the user interface

Accurately sort history entries by site

Looking through the list, you can see it's not a trivial problem. I think a list is the only correct way to accomplish this...

Mozilla has code that uses this service. The project was spun off because the original cookie spec had linked TLD's to trust in cookies, but never worked. The "Cookie Monster" bug was the first problem, and the architecture was never fixed or replaced. — benc, Commented Nov 25, 2008 at 7:23
The preferred language to solve this in isn't listed, but there is an opensource project that uses this list in C# code here: code.google.com/p/domainname-parser — Dan Esparza, Commented May 18, 2009 at 5:27
Whether a domain is a "public suffix" or not should really be made available via the DNS protocol itself, perhaps via an EDNS flag. In that case the owner can set it, and there is no need to maintain a separate list. — Pieter Ennes, Commented Sep 21, 2013 at 21:01
@PieterEnnes EDNS is for "transport related" flags, and can't be used for content-related metadata. I do agree that this information would be best placed in the DNS itself. ISTR there's plans for a "BoF session" at the upcoming IETF in Vancouver to discuss this. — Alnitak, Commented Oct 1, 2013 at 15:00
Thanks (+1) for link to http://publicsuffix.org, I've posted some shell and bash function based on your answer: stackoverflow.com/a/63761712/1765658 — F. Hauri - Give Up GitHub, Commented Sep 7, 2020 at 17:14

Community · Accepted Answer · 2021-10-07 07:13:45Z

25

As Adam says, it's not easy, and currently the only practical way is to use a list.

Even then there are exceptions - for example in .uk there are a handful of domains that are valid immediately at that level that aren't in .co.uk, so those have to be added as exceptions.

This is currently how mainstream browsers do this - it's necessary to ensure that example.co.uk can't set a Cookie for .co.uk which would then be sent to any other website under .co.uk.

The good news is that there's already a list available at http://publicsuffix.org/.

There's also some work in the IETF to create some sort of standard to allow TLDs to declare what their domain structure looks like. This is slightly complicated though by the likes of .uk.com, which is operated as if it were a public suffix, but isn't sold by the .com registry.

edited Oct 7, 2021 at 7:13

CommunityBot

11 silver badge

answered Nov 14, 2008 at 0:32

Alnitak

339k71 gold badges418 silver badges502 bronze badges

1

Eugh, the IETF should know better than to let their URLs die. The draft (last updated in Sept 2012) can now be reached here: tools.ietf.org/html/draft-pettersen-subtld-structure
– IMSoP
Commented Sep 30, 2013 at 22:00
The IETF working group on the subject (DBOUND) has been closed.
– Patrick Mevzek
Commented Nov 20, 2017 at 23:27
Note that since I wrote this the .uk domain registry now permits registrations directly at the second level. This is reflected accordingly in the PSL.
– Alnitak
Commented Sep 5, 2018 at 10:08

Add a comment |

live2 · Accepted Answer · 2017-02-12 14:20:57Z

23

Publicsuffix.org seems the way to do. There are plenty of implementations out there to parse the contents of the publicsuffix data file file easily:

Perl: Domain::PublicSuffix
Java: http://sourceforge.net/projects/publicsuffix/
PHP: php-domain-parser
C# / .NET: https://github.com/danesparza/domainname-parser
Python: http://pypi.python.org/pypi/publicsuffix
Ruby: domainatrix, public_suffix

edited Feb 12, 2017 at 14:20

live2

4,2052 gold badges42 silver badges53 bronze badges

answered Jun 6, 2009 at 23:24

John Slade

13k2 gold badges27 silver badges20 bronze badges

2

But remember it is not just a matter of parsing! This list at Publicsuffix.org is an unofficial project, which is incomplete (eu.org is missing, for instance), does NOT reflect automatically the policies of TLD and may become unmaintained at any time.
– bortzmeyer
Commented Jun 9, 2009 at 7:32
Also, Ruby: github.com/weppos/public_suffix_service
– fractious
Commented Jun 6, 2011 at 15:30
8

The list at publicsuffix.org is not "unofficial" any more than anything else Mozilla does. Given that Mozilla, Opera and Chrome use it, it is unlikely to become unmaintained. As for being incomplete, any operator of a domain like eu.org can apply for inclusion if they want to, and they understand the consequences of doing so. If you want a domain added, get the owner to apply. Yes, it does not automatically reflect TLD policy, but then nothing does - there is no programmatic source of that information.
– Gervase Markham
Commented Aug 22, 2011 at 10:31
1

dagger/android: okhttp will give you topPrivateDomain
– bladerunner
Commented Jul 17, 2019 at 20:48

Add a comment |

Francois Bourgeois · Accepted Answer · 2013-02-04 15:16:38Z

9

As already said by Adam and John publicsuffix.org is the correct way to go. But, if for any reason you cannot use this approach, here's a heuristic based on an assumption that works for 99% of all domains:

There is one property that distinguishes (not all, but nearly all) "real" domains from subdomains and TLDs and that's the DNS's MX record. You could create an algorithm that searches for this: Remove the parts of the hostname one by one and query the DNS until you find an MX record. Example:

super.duper.domain.co.uk => no MX record, proceed
duper.domain.co.uk       => no MX record, proceed
domain.co.uk             => MX record found! assume that's the domain

Here is an example in php:

function getDomainWithMX($url) {
    //parse hostname from URL 
    //http://www.example.co.uk/index.php => www.example.co.uk
    $urlParts = parse_url($url);
    if ($urlParts === false || empty($urlParts["host"])) 
        throw new InvalidArgumentException("Malformed URL");

    //find first partial name with MX record
    $hostnameParts = explode(".", $urlParts["host"]);
    do {
        $hostname = implode(".", $hostnameParts);
        if (checkdnsrr($hostname, "MX")) return $hostname;
    } while (array_shift($hostnameParts) !== null);

    throw new DomainException("No MX record found");
}

edited Feb 4, 2013 at 15:16

answered Feb 4, 2013 at 14:33

Francois Bourgeois

3,6905 gold badges33 silver badges41 bronze badges

Is that what IETF is also suggesting here?
– Ellie K
Commented Nov 27, 2016 at 20:56
1

Even publicsuffix.org says (see sixth paragraph) that the proper way to do this is through the DNS, just like you said in your answer!
– Ellie K
Commented Nov 27, 2016 at 21:08
1

Except that you can completely have a domain without a MX record. And that the algorithm will be fooled by wildcard records. And on the opposite side you have TLDs that have MX records (like .ai or .ax to just name a few).
– Patrick Mevzek
Commented Apr 5, 2019 at 18:58
@patrick: I totally agree; like I said in the introduction this algorithm is not bullet-proof, it's just a heuristic that works surprisingly well.
– Francois Bourgeois
Commented Apr 8, 2019 at 15:27
The algorithm should return the shortest hostname having an MX record. There are domains that accept mail at subdomains. Typically mailing lists ([email protected]), but some large organizations also used to have separate servers for some departments.
– Ale
Commented May 21, 2021 at 16:56

Add a comment |

Bryan McQuade · Accepted Answer · 2012-12-06 18:43:25Z

2

For a C library (with data table generation in Python), I wrote http://code.google.com/p/domain-registry-provider/ which is both fast and space efficient.

The library uses ~30kB for the data tables and ~10kB for the C code. There is no startup overhead since the tables are constructed at compile time. See http://code.google.com/p/domain-registry-provider/wiki/DesignDoc for more details.

To better understand the table generation code (Python), start here: http://code.google.com/p/domain-registry-provider/source/browse/trunk/src/registry_tables_generator/registry_tables_generator.py

To better understand the C API, see: http://code.google.com/p/domain-registry-provider/source/browse/trunk/src/domain_registry/domain_registry.h

answered Dec 6, 2012 at 18:43

Bryan McQuade

311 bronze badge

1

I also have a C/C++ library that has its own list although it is checked against the publicsuffix.org list as well. It's called the libtld and works under Unix and MS-Windows snapwebsites.org/project/libtld
– Alexis Wilke
Commented Aug 24, 2013 at 3:30
There is an archived copy of DesignDoc. A simplified implementation following the same design (but not requiring Python) is here (it in the form of a single test.c file)
– Ale
Commented May 25, 2021 at 11:30

Add a comment |

Oleksandr Fediashov · Accepted Answer · 2016-06-23 14:53:16Z

2

As already said Public Suffix List is only one way to parse domain correctly. For PHP you can try TLDExtract. Here is sample code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('super.duper.domain.co.uk');
$result->getSubdomain(); // will return (string) 'super.duper'
$result->getSubdomains(); // will return (array) ['super', 'duper']
$result->getHostname(); // will return (string) 'domain'
$result->getSuffix(); // will return (string) 'co.uk'

answered Jun 23, 2016 at 14:53

Oleksandr Fediashov

4,3351 gold badge25 silver badges42 bronze badges

Add a comment |

Isak · Accepted Answer · 2012-03-20 15:54:39Z

1

Just wrote a program for this in clojure based on the info from publicsuffix.org:

https://github.com/isaksky/url_dom

For example:

(parse "sub1.sub2.domain.co.uk") 
;=> {:public-suffix "co.uk", :domain "domain.co.uk", :rule-used "*.uk"}

answered Mar 20, 2012 at 15:54

Isak

1,6211 gold badge18 silver badges23 bronze badges

Add a comment |

F. Hauri - Give Up GitHub · Accepted Answer · 2020-09-07 17:11:17Z

shell and bash versions

In addition to Adam Davis's correct answer, I would like to post my own solution for this operation.

As list is something big, there is three of many differents tested solutions...

First prepare your TLD List in that way:

wget -O - https://publicsuffix.org/list/public_suffix_list.dat |
    grep '^[^/]' |
    tac > tld-list.txt

Note: tac will reverse list to ensure testing .co.uk before .uk.

posix shell version

splitDom() {
    local tld
    while read tld;do
        [ -z "${1##*.$tld}" ] &&
            printf "%s : %s\n" $tld ${1%.$tld} && return
    done <tld-list.txt
}

Tests:

splitDom super.duper.domain.co.uk
co.uk : super.duper.domain

splitDom super.duper.domain.com
com : super.duper.domain

bash version

In order to reduce forks (avoid myvar=$(function..) syntax), I prefer to set variables instead of dump output to stdout, in bash functions:

tlds=($(<tld-list.txt))
splitDom() {
    local tld
    local -n result=${2:-domsplit}
    for tld in ${tlds[@]};do
        [ -z "${1##*.$tld}" ] &&
            result=($tld ${1%.$tld}) && return
    done
}

Then:

splitDom super.duper.domain.co.uk myvar
declare -p myvar
declare -a myvar=([0]="co.uk" [1]="super.duper.domain")

splitDom super.duper.domain.com
declare -p domsplit
declare -a domsplit=([0]="com" [1]="super.duper.domain")

Quicker bash version:

With same preparation, then:

declare -A TLDS='()'
while read tld ;do
    if [ "${tld##*.}" = "$tld" ];then
        TLDS[${tld##*.}]+="$tld"
      else
        TLDS[${tld##*.}]+="$tld|"
    fi
done <tld-list.txt

This step is a significantly slower, but splitDom function will become a lot quicker:

shopt -s extglob 
splitDom() {
    local domsub=${1%%.*(${TLDS[${1##*.}]%\|})}
    local -n result=${2:-domsplit}
    result=(${1#$domsub.} $domsub)
}

Tests on my raspberry-pi:

Both bash scripts was tested with:

for dom in dom.sub.example.{,{co,adm,com}.}{com,ac,de,uk};do
    splitDom $dom myvar
    printf "%-40s %-12s %s\n" $dom ${myvar[@]}
done

posix version was tested with a detailed for loop, but

All test script produce same output:

dom.sub.example.com                      com          dom.sub.example
dom.sub.example.ac                       ac           dom.sub.example
dom.sub.example.de                       de           dom.sub.example
dom.sub.example.uk                       uk           dom.sub.example
dom.sub.example.co.com                   co.com       dom.sub.example
dom.sub.example.co.ac                    ac           dom.sub.example.co
dom.sub.example.co.de                    de           dom.sub.example.co
dom.sub.example.co.uk                    co.uk        dom.sub.example
dom.sub.example.adm.com                  com          dom.sub.example.adm
dom.sub.example.adm.ac                   ac           dom.sub.example.adm
dom.sub.example.adm.de                   de           dom.sub.example.adm
dom.sub.example.adm.uk                   uk           dom.sub.example.adm
dom.sub.example.com.com                  com          dom.sub.example.com
dom.sub.example.com.ac                   com.ac       dom.sub.example
dom.sub.example.com.de                   com.de       dom.sub.example
dom.sub.example.com.uk                   uk           dom.sub.example.com

Full script containing file read and splitDom loop take ~2m with posix version, ~1m29s with first bash script based on $tlds array, but ~22s with last bash script based on $TLDS associative array.

                Posix version     $tldS (array)      $TLDS (associative array)
File read   :       0.04164          0.55507           18.65262
Split loop  :     114.34360         88.33438            3.38366
Total       :     114.34360         88.88945           22.03628

So if populating associative array is a stonger job, splitDom function become lot quicker!

jTresidder · Accepted Answer · 2008-11-14 02:31:27Z

0

It's not working it out exactly, but you could maybe get a useful answer by trying to fetch the domain piece by piece and checking the response, ie, fetch 'http://uk', then 'http://co.uk', then 'http://domain.co.uk'. When you get a non-error response you've got the domain and the rest is subdomain.

Sometimes you just gotta try it :)

Edit:

Tom Leys points out in the comments, that some domains are set up only on the www subdomain, which would give us an incorrect answer in the above test. Good point! Maybe the best approach would be to check each part with 'http://www' as well as 'http://', and count a hit to either as a hit for that section of the domain name? We'd still be missing some 'alternative' arrangements such as 'web.domain.com', but I haven't run into one of those for a while :)

edited Nov 14, 2008 at 2:31

answered Nov 14, 2008 at 0:54

jTresidder

4661 gold badge3 silver badges8 bronze badges

There is no guarantee that x.com points to a webserver at port 80 even if www.x.com does. www is a valid subdomain in this case. Perhaps an automated whois would help here.
– Tom Leys
Commented Nov 14, 2008 at 1:48
Good point! A whois would clear it up, although maintaining a list of which whois servers to use for which for which tld/2nd level would mean solving the same problem for edge cases.
– jTresidder
Commented Nov 14, 2008 at 2:24
you are assuming that there runs an HTTP server in every domain
– Francois Bourgeois
Commented Nov 11, 2013 at 7:58
Will not work for .DK and some others, as http://dk/ works as is. This kind of heuristics are not the way to go...
– Patrick Mevzek
Commented Aug 29, 2018 at 22:45

Add a comment |

jrr · Accepted Answer · 2011-02-02 00:35:37Z

0

Use the URIBuilder then get the URIBUilder.host attribute split it into an array on "." you now have an array with the domain split out.

answered Feb 2, 2011 at 0:35

jrr

91 bronze badge

Add a comment |

Mike · Accepted Answer · 2012-03-29 01:53:37Z

echo tld('http://www.example.co.uk/test?123'); // co.uk

/**
 * http://publicsuffix.org/
 * http://www.alandix.com/blog/code/public-suffix/
 * http://tobyinkster.co.uk/blog/2007/07/19/php-domain-class/
 */
function tld($url_or_domain = null)
{
    $domain = $url_or_domain ?: $_SERVER['HTTP_HOST'];
    preg_match('/^[a-z]+:\/\//i', $domain) and 
        $domain = parse_url($domain, PHP_URL_HOST);
    $domain = mb_strtolower($domain, 'UTF-8');
    if (strpos($domain, '.') === false) return null;

    $url = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';

    if (($rules = file($url)) !== false)
    {
        $rules = array_filter(array_map('trim', $rules));
        array_walk($rules, function($v, $k) use(&$rules) { 
            if (strpos($v, '//') !== false) unset($rules[$k]);
        });

        $segments = '';
        foreach (array_reverse(explode('.', $domain)) as $s)
        {
            $wildcard = rtrim('*.'.$segments, '.');
            $segments = rtrim($s.'.'.$segments, '.');

            if (in_array('!'.$segments, $rules))
            {
                $tld = substr($wildcard, 2);
                break;
            }
            elseif (in_array($wildcard, $rules) or 
                    in_array($segments, $rules))
            {
                $tld = $segments;
            }
        }

        if (isset($tld)) return $tld;
    }

    return false;
}

xiaoyu2er · Accepted Answer · 2017-05-09 07:54:11Z

You can use this lib tld.js: JavaScript API to work against complex domain names, subdomains and URIs.

tldjs.getDomain('mail.google.co.uk');
// -> 'google.co.uk'

If you are getting root domain in browser. You can use this lib AngusFu/browser-root-domain.

var KEY = '__rT_dM__' + (+new Date());
var R = new RegExp('(^|;)\\s*' + KEY + '=1');
var Y1970 = (new Date(0)).toUTCString();

module.exports = function getRootDomain() {
  var domain = document.domain || location.hostname;
  var list = domain.split('.');
  var len = list.length;
  var temp = '';
  var temp2 = '';

  while (len--) {
    temp = list.slice(len).join('.');
    temp2 = KEY + '=1;domain=.' + temp;

    // try to set cookie
    document.cookie = temp2;

    if (R.test(document.cookie)) {
      // clear
      document.cookie = temp2 + ';expires=' + Y1970;
      return temp;
    }
  }
};

Using cookie is tricky.

Alex · Accepted Answer · 2018-11-28 22:29:40Z

If you're looking to extract subdomains and/or domains from an arbitrary list of URLs, this python script may be helpful. Be careful though, it's not perfect. This is a tricky problem to solve in general and it's very helpful if you have a whitelist of domains you're expecting.

Get top level domains from publicsuffix.org

import requests

url = 'https://publicsuffix.org/list/public_suffix_list.dat'
page = requests.get(url)

domains = []
for line in page.text.splitlines():
    if line.startswith('//'):
        continue
    else:
        domain = line.strip()
        if domain:
            domains.append(domain)

domains = [d[2:] if d.startswith('*.') else d for d in domains]
print('found {} domains'.format(len(domains)))

Build regex

import re

_regex = ''
for domain in domains:
    _regex += r'{}|'.format(domain.replace('.', '\.'))

subdomain_regex = r'/([^/]*)\.[^/.]+\.({})/.*$'.format(_regex)
domain_regex = r'([^/.]+\.({}))/.*$'.format(_regex)

Use regex on list of URLs

FILE_NAME = ''   # put CSV file name here
URL_COLNAME = '' # put URL column name here

import pandas as pd

df = pd.read_csv(FILE_NAME)
urls = df[URL_COLNAME].astype(str) + '/' # note: adding / as a hack to help regex

df['sub_domain_extracted'] = urls.str.extract(pat=subdomain_regex, expand=True)[0]
df['domain_extracted'] = urls.str.extract(pat=domain_regex, expand=True)[0]

df.to_csv('extracted_domains.csv', index=False)

muratgozel · Accepted Answer · 2020-09-13 08:08:12Z

To accomplish this, I wrote a bash function which depends on publicsuffix.org data and a simple regex.

Install publicsuffix.org client on Ubuntu 18:

sudo apt install psl

Get the domain suffix (longest suffix):

domain=example.com.tr
output=$(psl --print-unreg-domain $domain)

output is:

example.com.tr: com.tr

The rest is simple bash. Extract suffix (com.tr) from the domain and test if it still has more than one dots.

# split output by colon
arr=(${output//:/ })
# remove the suffix from the domain
name=${1/${arr[1]}/}
# test
if [[ $name =~ \..*\. ]]; then
  echo "Yes, it is subdomain."
fi

Everything together in a bash function:

is_subdomain() {
  local output=$(psl --print-unreg-domain $1)
  local arr=(${output//:/ })
  local name=${1/${arr[1]}/}
  [[ $name =~ \..*\. ]]
}

Usage:

d=example.com.tr
if is_subdomain $d; then
  echo "Yes, it is."
fi

Venkatesh · Accepted Answer · 2020-12-15 13:37:54Z

0

private String getSubDomain(Uri url) throws Exception{
                        String subDomain =url.getHost();
                        String fial=subDomain.replace(".","/");
                        String[] arr_subDomain =fial.split("/");
                        return arr_subDomain[0];
                    }

First index will always be subDomain

answered Dec 15, 2020 at 13:37

Venkatesh

2004 silver badges9 bronze badges

Add a comment |

mickmackusa · Accepted Answer · 2024-02-21 22:43:02Z

0

this snippet return correct domain name.

InternetDomainName foo = InternetDomainName.from("foo.item.shopatdoor.co.uk").topPrivateDomain();
System.out.println(foo.topPrivateDomain());

edited Feb 21 at 22:43

mickmackusa

47.7k13 gold badges91 silver badges158 bronze badges

answered Sep 16, 2021 at 8:21

Vikesh Yadav

1611 silver badge5 bronze badges

Add a comment |

Peter C. · Accepted Answer · 2008-11-14 01:00:17Z

-1

List of common suffixes (.co.uk, .com, et cetera) to strip out along with the http:// and then you'll only have "sub.domain" to work with instead of "http://sub.domain.suffix", or at least that's what I'd probably do.

The biggest problem is the list of possible suffixes. There's a lot, after all.

answered Nov 14, 2008 at 1:00

Peter C.

4332 gold badges6 silver badges12 bronze badges

Add a comment |

Dave Sherohman · Accepted Answer · 2008-11-14 18:25:18Z

Having taken a quick look at the publicsuffix.org list, it appears that you could make a reasonable approximation by removing the final three segments ("segment" here meaning a section between two dots) from domains where the final segment is two characters long, on the assumption that it's a country code and will be further subdivided. If the final segment is "us" and the second-to-last segment is also two characters, remove the last four segments. In all other cases, remove the final two segments. e.g.:

http://www.domain.example

"example" is not two characters, so remove "domain.example", leaving "www"

http://super.duper.domain.example

"example" is not two characters, so remove "domain.example", leaving "super.duper"

http://super.duper.domain.co.uk

"uk" is two characters (but not "us"), so remove "domain.co.uk", leaving "super.duper"

http://foo.pvt.k12.wy.us

"us" is two characters and is "us", plus "wy" is also two characters, so remove "pvt.k12.wy.us", leaving "foo".

Note that, although this works for all examples that I've seen in the responses so far, it remains only a reasonable approximation. It is not completely correct, although I suspect it's about as close as you're likely to get without making/obtaining an actual list to use for reference.

There are lots of fail cases. This is the sort of algorithm browsers used to try and use. Don't do that, use the PSL - it works, and there are libraries to help you. — Gervase Markham, Commented Aug 22, 2011 at 10:33
Nothing prohibits gTLDs to be "segmented" also, this was the case at the beginning of .NAME for example, when you could buy only firstname.lastname.name domain names. And in opposite direction, now .US is also flat, so you can have x.y.z.whatever.us by just purchasing whatever.us at the registry and then your algorithm will fail on it. — Patrick Mevzek, Commented Aug 29, 2018 at 22:46
Also about ("segment" here meaning a section between two dots) : this is called a label in the DNS world, no need to invent a new name. — Patrick Mevzek, Commented Aug 29, 2018 at 22:49

Collectives™ on Stack Overflow

Get the subdomain from a URL

18 Answers 18

shell and bash versions

First prepare your TLD List in that way:

posix shell version

bash version

Quicker bash version:

Tests on my raspberry-pi:

Not the answer you're looking for? Browse other questions tagged
url
parsing
dns
subdomain
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

18 Answers 18

shell and bash versions

First prepare your TLD List in that way:

posix shell version

bash version

Quicker bash version:

Tests on my raspberry-pi:

Not the answer you're looking for? Browse other questions tagged urlparsingdnssubdomain or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
url
parsing
dns
subdomain
or ask your own question.