-1

How do i parse the date start and date end value using beautifulsoup?

<h2 name="PRM-013113-21017-0FSNS" class="pointer">
    <a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br>
       <span>February 8, 2013 - February 10, 2013</span>
    </a>
</h2>
1
  • i want to have a output which is, date_start = February 8, 2013, date_end = February 10, 2013, what will i do?
    – user683742
    Commented Feb 4, 2013 at 8:25

1 Answer 1

1

Something like this.

import re
from BeautifulSoup import BeautifulSoup

html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date_span = BeautifulSoup(html).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = re.findall(r'<span>(.+?)</span>', str(date_span))[0]

(PS: you can also use BeautifulSoup's text=True method with findAll to get the text instead of using regex as follows.)

from BeautifulSoup import BeautifulSoup

html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]

Update::

To have a start and end date as separate variables you can simply split them you can simply split the date variable as follows:

from BeautifulSoup import BeautifulSoup

html = '<h2 name="PRM-013113-21017-0FSNS" class="pointer"><a name="PRM-013113-21017-0FSNS">Chinese New Year Sale<br><span>February 8, 2013 - February 10, 2013</span></a></h2>'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]
# Get start and end date separately
date_start, date_end = date.split(' - ')

now date_start variable contains the starting date and date_end variable contains the ending date.

2
  • thanks @Amyth but i want to have an output of each dates, which is date_start = February 8, 2013 and date_end = February 10, 2013
    – user683742
    Commented Feb 4, 2013 at 8:28
  • how about simply splitting the date output on ` - `? Check the updated answer.
    – Amyth
    Commented Feb 4, 2013 at 8:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.