Less painful way to parse a RSS-Feed with lxml?

Question

I need to display RSS-feeds with Python, Atom for the most part. Coming from PHP, where I could get values pretty fast with $entry->link i find lxml to be much more precise, faster, albeit complicated. After hours of probing I got this working with the arstechnica-feed:

def GetRSSFeed(url):
    out = []
    feed = urllib.urlopen(url)
    feed = etree.parse(feed)
    feed = feed.getroot()
    for element in feed.iterfind(".//item"):
        meta = element.getchildren()
        title = meta[0].text
        link = meta[1].text
        for subel in element.iterfind(".//description"):
            desc = subel.text
            entry = [title,link,desc]
            out.append(entry)
    return out

Could this be done any easier? How can I access tags directly? Feedparser gets the job done with one line of code! Why?

Why are you using lxml instead of feedparser, then?
– bgporter
Commented Jun 22, 2012 at 14:24 — bgporter, Commented Jun 22, 2012 at 14:24

guyrt · Accepted Answer · 2012-06-22 14:37:30Z

9

Look at the feedparser library. It gives you a nicely formatted RSS object.

> import feedparser
> feed = feedparser.parse('http://feeds.marketwatch.com/marketwatch/marketpulse/')
> print feed.keys()
['feed',
 'status',
 'updated',
 'updated_parsed',
 'encoding',
 'bozo',
 'headers',
 'etag',
 'href',
 'version',
 'entries',
 'namespaces']

>  len(feed.entries)
    30

answered Jun 22, 2012 at 14:37

guyrt

9277 silver badges12 bronze badges

thanks for the answer. i mentioned feedparser in the op. i tested it against lxml, which came out much faster. all i wanna do now is select children by their tag name. like rss.item.description.text. impossible?
– reinhardt
Commented Jun 22, 2012 at 14:54
1

Is this more in line with what you want? (Find all descriptions that are children of items) feed.findall('.//item//description')
– guyrt
Commented Jun 22, 2012 at 15:16
Not an answer to what is being asked . can you do this hard way using lxml
– Harshit
Commented May 15, 2013 at 13:46

Add a comment |

rubayeet · Accepted Answer · 2014-04-20 04:12:28Z

3

You can try speedparser, an implementation of Universal Feed Parser with lxml. Still in beta though.

edited Apr 20, 2014 at 4:12

answered Jun 19, 2013 at 6:32

rubayeet

9,4009 gold badges47 silver badges55 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Less painful way to parse a RSS-Feed with lxml?

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
django
lxml
atom-feed
feedparser
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythondjangolxmlatom-feedfeedparser or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
django
lxml
atom-feed
feedparser
or ask your own question.