40

I have a numpy array results that looks like

[ 0.  2.  0.  0.  0.  0.  3.  0.  0.  0.  0.  0.  0.  0.  0.  2.  0.  0.
  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.
  0.  1.  1.  0.  0.  0.  0.  2.  0.  3.  1.  0.  0.  2.  2.  0.  0.  0.
  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  2.  0.  0.  0.  0.
  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  3.  1.  0.  0.  0.  0.  0.
  0.  0.  0.  1.  0.  0.  0.  1.  2.  2.]

I would like to plot a histogram of it. I have tried

import matplotlib.pyplot as plt
plt.hist(results, bins=range(5))
plt.show()

This gives me a histogram with the x-axis labelled 0.0 0.5 1.0 1.5 2.0 2.5 3.0. 3.5 4.0.

I would like the x-axis to be labelled 0 1 2 3 instead with the labels in the center of each bar. How can you do that?

4
  • Im not sure what you want. Do you want the bins centered around 1,2,3 (so around the integer instead of the 1.5, 2.5 values). Or do you want to label the bars with text or something? Because if I execute your command, my output is (array([ 4., 5., 1., 2.]), array([0, 1, 2, 3, 4]) (with different input values). So I have got different bins, or do I miss something?
    – Mathias711
    Commented Apr 23, 2014 at 13:58
  • @Mathias711 The first bar is the number of 0s in results, the second the numbers of 1s (there are eleven of them), the third the number of 2s (there are eight of them) and the last one is the number of 3s (there are three of them). I would like the number 0 as a label under the middle of the first bar, the number 1 as a label under the middle of the second and so on. Is that clearer?
    – Simd
    Commented Apr 23, 2014 at 14:01
  • So there are no problems with the binning, you just want to add labels to the bins?
    – Mathias711
    Commented Apr 23, 2014 at 14:02
  • @Mathias711 Yes I want to get rid of the default labels and add the ones I described.
    – Simd
    Commented Apr 23, 2014 at 14:03

7 Answers 7

65

The other answers just don't do it for me. The benefit of using plt.bar over plt.hist is that bar can use align='center':

import numpy as np
import matplotlib.pyplot as plt

arr = np.array([ 0.,  2.,  0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,
        0.,  0.,  0.,  0.,  2.,  0.,  3.,  1.,  0.,  0.,  2.,  2.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  3.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  1.,  0.,  0.,  0.,  1.,  2.,  2.])

labels, counts = np.unique(arr, return_counts=True)
plt.bar(labels, counts, align='center')
plt.gca().set_xticks(labels)
plt.show()

centering labels in a histogram

5
  • 6
    How has no one congratulated you on this? Definitely the most elegant and straightforward solution to this exact problem. This should be considered the best solution for visualizing a discrete distribution. Commented May 27, 2019 at 18:35
  • 1
    Indeed. This is so much simpler!
    – yatu
    Commented Apr 5, 2020 at 11:03
  • This should be the answer. tnx :) Commented Feb 13, 2021 at 0:30
  • depending on the python version it is either align='center' or align=mid Commented Mar 14, 2022 at 16:36
  • if you have lengthy labels, you can use this plt.xticks(rotation=90) to turn them to 90 degrees. Commented Oct 15, 2022 at 3:09
24

The following alternative solution is compatible with plt.hist() (and this has the advantage for instance that you can call it after a pandas.DataFrame.hist().

import numpy as np

def bins_labels(bins, **kwargs):
    bin_w = (max(bins) - min(bins)) / (len(bins) - 1)
    plt.xticks(np.arange(min(bins)+bin_w/2, max(bins), bin_w), bins, **kwargs)
    plt.xlim(bins[0], bins[-1])

(The last line is not strictly requested by the OP but it makes the output nicer)

This can be used as in:

import matplotlib.pyplot as plt
bins = range(5)
plt.hist(results, bins=bins)
bins_labels(bins, fontsize=20)
plt.show()

Result: success!

3
  • 1
    May you explain why you only show the bins 0,1,2,3 but use range(5) ?
    – Wikunia
    Commented Aug 1, 2017 at 16:11
  • @Wikunia : sure. Bin 0 covers from 0 to 1 in the plot, bin 1 covers from 1 to 2... and so on until bin 3, which covers from 3 to 4 in the plot. So the bins (left and right) borders must be the sequence [0, 1, 2, 3, 4]... which is precisely range(5). Strange, I know, but the only alternative I see (centering bin i going from i-1/2 to i+1/2) would be more complicated. Commented Aug 2, 2017 at 23:07
  • This answer is efficient in a more general case, if bins are redefined, e.g. as bins = np.arange(2, 7, .5)
    – Dalker
    Commented Nov 23, 2017 at 10:12
20

Here is a solution that only uses plt.hist(). Let's break this down in two parts:

  1. Have the x-axis to be labelled 0 1 2 3.
  2. Have the labels in the center of each bar.

To have the x-axis labelled 0 1 2 3 without .5 values, you can use the function plt.xticks() and provide as argument the values that you want on the x axis. In your case, since you want 0 1 2 3, you can call plt.xticks(range(4)).

To have the labels in the center of each bar, you can pass the argument align='left' to the plt.hist() function. Below is your code, minimally modified to do that.

import matplotlib.pyplot as plt

results = [0,  2,  0,  0,  0,  0,  3,  0,  0,  0,  0,  0,  0,  0,  0,  2,  0,  0,
           0,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,
           0,  1,  1,  0,  0,  0,  0,  2,  0,  3,  1,  0,  0,  2,  2,  0,  0,  0,
           0,  0,  0,  0,  0,  1,  1,  0,  0,  0,  0,  0,  0,  2,  0,  0,  0,  0,
           0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  3,  1,  0,  0,  0,  0,  0,
           0,  0,  0,  1,  0,  0,  0,  1,  2,  2]

plt.hist(results, bins=range(5), align='left')
plt.xticks(range(4))
plt.show()

enter image description here

3
  • Variant of the currently selected answer, no addition.
    – mins
    Commented Dec 18, 2020 at 11:57
  • 1
    @mins How so? This uses plt.hist instead of plt.bar for plotting a histogram, which seems to be the correct thing to do.
    – rvf
    Commented Aug 9, 2022 at 8:37
  • @rvf. My comment in 2020 wasn't referring to plt.bar in Jarad's answer. Jarad's answer has been selected only recently (cf. "This should be the answer" posted in 2021)
    – mins
    Commented Aug 9, 2022 at 11:34
11

you can build a bar plot out of a np.histogram.

Consider this

his = np.histogram(a,bins=range(5))
fig, ax = plt.subplots()
offset = .4
plt.bar(his[1][1:],his[0])
ax.set_xticks(his[1][1:] + offset)
ax.set_xticklabels( ('1', '2', '3', '4') )

enter image description here

EDIT: in order to get the bars touching one another, one has to play with the width parameter.

 fig, ax = plt.subplots()
 offset = .5
 plt.bar(his[1][1:],his[0],width=1)
 ax.set_xticks(his[1][1:] + offset)
 ax.set_xticklabels( ('1', '2', '3', '4') )

enter image description here

5
  • Thanks! How do you get rid of the spaces between the bars?
    – Simd
    Commented Apr 23, 2014 at 14:13
  • Ahh thanks. I was also working on something like this, only didnt get the xticks working. Thanks for clarifying
    – Mathias711
    Commented Apr 23, 2014 at 14:15
  • @eleanora, bars have been fixed.
    – Acorbe
    Commented Apr 23, 2014 at 14:18
  • Also, how do I create ('0', '1', '2','3') if I wanted it to go from 0 to 100, say? tuple(str(i) for i in range(101)) ?
    – Simd
    Commented Apr 23, 2014 at 14:18
  • 2
    @eleanora That works, but I would use np.arange(1, 101).astype(str), or without numpy: map(str, range(1, 101))
    – askewchan
    Commented Apr 23, 2014 at 16:40
0

Like Jarad pointed out in his answer, barplot is a neat way to do it. Here's a short way of plotting barplot using pandas.

import pandas as pd
import matplotlib.pyplot as plt

arr = [ 0.,  2.,  0.,  0.,  0.,  0.,  3.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,
        0.,  0.,  0.,  0.,  2.,  0.,  3.,  1.,  0.,  0.,  2.,  2.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  2.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  3.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  1.,  0.,  0.,  0.,  1.,  2.,  2.]

col = 'name'
pd.DataFrame({col : arr}).groupby(col).size().plot.bar()
plt.show()
3
  • "barplot is a neat way to do it". In some cases only. How would you do for these cases?
    – mins
    Commented Dec 18, 2020 at 12:00
  • Plot on the right could be created as Ted shows in his answer. To get plot on the left, size() should be used instead of sum(), i.e. df.groupby(df.sold // 10 * 10).size().plot.bar(). But i guess it's worth comparing results with other approaches. Commented Dec 21, 2020 at 12:39
  • 1
    I meant the difficulty in the linked case was about the use of hist with weight option. Cannot be replaced easily by barplot as it hasn't an equivalent possibility.
    – mins
    Commented Dec 21, 2020 at 15:44
0

To center the labels on a matplotlib histogram of discrete values is enough to define the "bins" as a list of bin boundaries.

import matplotlib.pyplot as plt
%matplotlib inline

example_data = [0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1]

fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot()
ax1_bars = [0,1]                           
ax1.hist( 
    example_data, 
    bins=[x for i in ax1_bars for x in (i-0.4,i+0.4)], 
    color='#404080')
ax1.set_xticks(ax1_bars)
ax1.set_xticklabels(['class 0 label','class 1 label'])
ax1.set_title("Example histogram")
ax1.set_yscale('log')
ax1.set_ylabel('quantity')

fig.tight_layout()
plt.show()

enter image description here

How this works?

  • The histogram bins parameter can be a list defining the boundaries of the bins. For a class that can assume the values 0 or 1, those boundaries should be [ -0.5, 0.5, 0.5, 1.5 ] which loosely translates as "bin 0" is from -0.5 to 1.5 and "bin 1" is from 0.5 to 1.5. Since the middle of those ranges are the discrete values the label will be on the expected place.

  • The expression [x for i in ax_bars for x in (i-0.4,i+0.4)] is just a way to generate the list of boundaries for a list of values (ax_bars).

  • The expression ax1.set_xticks(ax1_bars) is important to set the x axis to be discrete.

  • The rest should be self explanatory.

0

Use numpy to have bins centered at your requested values:

import matplotlib.pyplot as plt
import numpy as np
plt.hist(results, bins=np.arange(-0.5, 5))
plt.show()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.