6

I'm trying to create an R API for StackOverflow. The output is gzipped. For example:

readLines("http://api.stackoverflow.com/0.9/stats/", warn=F)
[1] "\037‹\b"                                                                                                                                                                                                                                                                                         
[2] "\030\002úØÛy°óé½\036„iµXäË–[<üt—Zu[\\VmÎHî=ÜÛݹ×ýz’Í.äûû÷>ý´\a\177Ýh÷\017îÝÛÙwßÚáÿþ«¼þý\027ÅrÝæÔlgüÀëA±\017›ìŽï{M¤û.\020\037�Ë\"¿’\006³ì\032„Úß9¸ÿ`¼ç÷³*~ÿKêˆð¡\006v¦ð²ýô£�ñÃ�ì+ôU�_\026滽�]êt¼·?ÞûÈ4ù%\016~S0^>àe¶ÀG\037½n³éÛôKê缬®‚\016Êê¢úý×u‰fó¶]=º{·aΚŽ—y{·©î\026‹‹»h5^-/‚W1 |9[UŲõ^§�Ç"
[3] ":¬´¿1M\177ð\"0íö¹ñ…YÞLëbÕ*!~â\027\036§çU�®êê¢ÎˆµhòýæÅ´Zn\036S¶Z•ùv[­§óm´î�"                                                                                                                                                                                                                      
[4] "Í™t˪^d¥£·üÂ?¾ÿ\033'¿$ù\177"  

Is there a good way to gunzip this in R, short of writing the output to file, gunzip'ing it, and reading it back in?

2
  • I'm looking forward to the package that is bound to fall out the other end of this research!
    – JD Long
    Commented Jun 28, 2010 at 14:21
  • @JD: Absolutely. I'll post the google code page shortly and am happy to take on collaborators. But my initial feeling is that the SO API isn't very useful.
    – Shane
    Commented Jun 28, 2010 at 15:58

3 Answers 3

11

You could do:

conn <- gzcon(url("http://api.stackoverflow.com/0.9/stats/"))
data <- readLines(conn)
3
  • Thanks! Don't forget to close the connection when you're finished.
    – Shane
    Commented Jun 27, 2010 at 20:13
  • Why double readLines is needed? mbq answer works too.
    – Marek
    Commented Jun 28, 2010 at 8:19
  • @Marek: corrected. That was just me trying different things and I must have pasted some extra command. Thanks for pointing that out.
    – nico
    Commented Jun 28, 2010 at 11:08
5

Try:

p <- gzcon(url("http://api.stackoverflow.com/0.9/stats/"))
readLines(p)
4

Ideally we should tell the server that we can handle gzipped content, find out from the HTTP headers that the content is actually gzip encoded and then decompress only if it is. The Rcurl library can do this:

library(Rcurl)
getURL("http://api.stackoverflow.com/0.9/stats/",
       .opts=list(encoding="identity,gzip")
1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.