5

I just tried to import it into excel, but it can only accept a little over a million rows of data. It's 3 columns of data, and all I want to do is graph column 1 against 2 and 3 for two graphs.

I am thinking of making a grid preprocessor which divides the 2D landscape up into cells and marks each cell as having or not having an element within it. There will be a fiddle factor for making the cells small enough to discern information from the graph while large enough to be under 1Million filled cells to fit in excel.

While I do on that or something else, anyone know how to graph all of the data easily?

6
  • 1
    Why not use a database like SQL or shudder Microsoft Access? Commented May 28, 2013 at 23:25
  • Lack of knowledge of their existence. I'll try it out. Commented May 28, 2013 at 23:27
  • @SwimBikeRun what format are the data in right now?
    – nhinkle
    Commented May 28, 2013 at 23:40
  • 1
    In Excel you can handle more than 1M rows of data in PivotTable and PivotCharts. Go through the import data dialogs - but instead of storing in the Excel sheet in the last step, store it as PivotChart. From here, you can create your chart... Haven't tried with 6M rows, but can imagine that this should work! Good luck! Commented May 29, 2013 at 10:29
  • 2
    You should consider methods for aggregating your data. I don't think you'll be able to visually discern 6M rows (or even many less than that). Consider the resolution at which you'll be viewing the data (screen or paper) and that medium's resolution and consider that you can't discern more than one data point to unit of resolution (e.g. dpi or pixel). For example at 1200 dpi, you'd need 5000 inches/417 feet to display 6M datapoints.
    – dav
    Commented May 29, 2013 at 12:11

3 Answers 3

2

You could also try to sample the data. Take only one in ten (or one in hundert) row und try to plot the result. If your sampling is truly random you should have graph that are pretty much representative of the "population"

1
  • +1 A visualization with 6 million data points is almost certainly no more helpful than one with a (few) thousand. The huge number of points may even obscure relationships in the data or overwhelm the viewer (or for that matter, the visualization application). Sampling is the way to go.
    – Excellll
    Commented Mar 11, 2015 at 18:25
2

Save it as comma separated file and load it into R with the command

data <- read.csv('mybigfatfile.csv', header=T)

(here I assume the first row is the headers; if there are no headers, set header to F). If the column names are A, B, and C, then you can plot as

plot(data$A, data$B, col=rgb(100,80,0,10, max=255), pch=16)

Here the color will be rgb(100,80,0) with white being rgb(255,255,255) and opacity of 10 (out of 255). Per momobo's answer, you can take a random sample instead if 6 million takes too long to display:

idx <- sample.int(length(data$A), 10000)
plot(data$A[idx], data$B[idx], col=rgb(100,80,0,10, max=255), pch=16)

Here we select 10000 random integers (from 1 to length(data$A)).

To get help with R commands, type ? followed by command, eg,

?plot

However, R has a steep learning curve. But I guess this is one way.

3
  • 1
    I did this and I am currently waiting on the plot command. It took about 30 seconds to import the data, but I have been waiting two minutes and it still hasn't plotted anything. I did a quick plot of the head of the data, and the plot command is right. How long about should 6 million rows take to plot? More to the point, how do I speed this up? Is there a thinning function for R? Commented May 28, 2013 at 23:43
  • @SwimBikeRun, yes momobo has the right idea: take a random sample. I updated my answer.
    – Peon
    Commented May 29, 2013 at 18:12
  • It's amazing how it is simple to sample in R!
    – momobo
    Commented May 29, 2013 at 21:33
0

I faced the same problem, Finally I used MSChart with c# and loaded the data by code and draw it to the chart.

I think this video would help https://www.youtube.com/watch?v=82jnryBxsnI

You can also zoom the chart.

1
  • You might as well post the code snippets now. A complete answer is always more likely to be helpful to someone.
    – Excellll
    Commented Mar 11, 2015 at 18:15

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .