Sci Vis) 1. Rotated histogram

2021-12-02

2021-12-27

Nicolas P. Rougier, matplotlib, python, visualization

Scientific Visualization: Python + Matplotlib 책에는 최고 수준의 시각화 기술이 담겼습니다.
높은 수준에 비해 설명이 다소 짧아 초심자에게 어려워 보였습니다.
저자인 Nicolas P. Rougier 박사님의 허가 하에 일부를 해설합니다.
이 시리즈는 해외 독자를 위해 영어를 회색 상자 안에 병기합니다.

Scientific Visualization: Python + Matplotlib contains cutting-edge visualization skills and examples of Matplotlib and Python.
Despite its high quality content, relatively short explanations in the book may prevent the readers from fully grasping all of its contents.
With the permission of the author, Dr. Nicolas P. Rougier, I'll try to help the beginners understand the more complex concepts within the book by providing explanations and commentary where needed.
This series will be posted in English as well as Korean, for international readers.

영어 감수: EY한영회계법인 이지수 시니어

Scientific Visualization: Python + Matplotlib
github: Nicolas P. Rougier

1. Rotated Histogram

transforms-hist.py

이번에 살펴볼 그림은 scatter plot에 histogram이 겹친 모양입니다.
데이터의 PC1 축 방향으로 돌려 놓인 histogram은 PC2 방향의 데이터 분포를 나타내고 있습니다.

We are going to investigate a scatter plot with a histogram.
The histogram is rotated and placed along PC1 axis, representing data distribution along the PC2's direction.

2. Matplotlib coding

저자는 Matplotlib 자체 기능만을 사용해서 그렸습니다.
저자의 코드를 따라가며 어떻게 그렸는지 확인해 보겠습니다.

The author created the figure with Matplotlib only.
We'll see the author's code is working by following every step.

2.1. import libraries

코드는 라이브러리를 불러오는 것부터 시작합니다.
Polygon, Affine2D, floating_axes처럼 전에 보기 힘들었던 기능을 불러옵니다.
numpy random seed를 설정하여 재현성을 확보합니다.

We start by importing the libraries.
Unlike "normal" visualizations, the modules named Polygon, Affine2D, floating_axes are imported.
Reproducibality is secured by setting numpy random seed.

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.transforms import Affine2D
import mpl_toolkits.axisartist.floating_axes as floating_axes

# Reroducibility seed
np.random.seed(123)

2.2. Affine transformation

Affine Transformation

Affine 변환은 공선성과 거리 비율을 유지하는 모든 종류의 형태 변환을 말합니다.
Matplotlib은 2차원 Affine 변환을 구현하는 Affine2D 함수를 지원합니다.
본격적으로 코드를 들여다보기 전에 Affine2D 용법을 확인합니다.

An affine transformation is any transformation that preserves collinearity (i.e., all points lying on a line initially still lie on a line after transformation) and ratios of distances (e.g., the midpoint of a line segment remains the midpoint after transformation).
Matplotlib provides Affine2D function in which 2-dimensional Affine transformation is implemented.
Before diving into the original code, we need to learn how to use Affine2D.

7개의 점으로 이루어진 예제 데이터 S0을 만듭니다. 각 점은 x, y 좌표가 있으므로 형상은 (7, 2)가 됩니다.
각 점마다 다른 색을 입히기 위해 0부터 1까지 7개의 값 데이터 Sc를 만듭니다. numpy.linspace()를 사용합니다.
Affine2D().rotate_deg(30).transform(S0) 명령으로 S0를 30도 회전해 S1을 얻습니다. 방향은 반시계 방향입니다.

S0, sample data consisted of 7 points, is prepared on cartesian coordinate system. Each point has x and y coordinate, therefore shape of S0 is (7, 2).
Sc, a dataset with 7 values are prepared by using numpy.linspace() to paint each point with a differnt color.
S1 is obtained by rotating S0 by 30 degree in a counterclockwise direction by using Affine2D().rotate_deg(30).transform(S0).

S0과 S1를 그려 데이터가 의도대로 변환되었음을 확인합니다.

Successful rotational transform is confirmed by visualization of S0 and S1.

# numpy array
S0 = np.array([[0, 0], [1, 0], [3, 0], [0, 1], [1, 1], [0, 3], [3, 3]])
Sc = np.linspace(0, 1, len(S0))
S1 = Affine2D().rotate_deg(30).transform(S0)

# Visualize
fig_s, axs_s = plt.subplots(ncols=2, figsize=(6, 3), sharex=True, sharey=True)

for ax_s, S, t in zip(axs_s, [S0, S1], [0, 30]):
    ax_s.scatter(S[:, 0], S[:, 1], c=Sc, s=100, ec="k")
       
    ax_s.set_aspect(1)
    ax_s.set_ylim(-0.5, 4.5)
    ax_s.set_title(f"rotation = {t}$^o$", fontsize="large")
    
fig_s.tight_layout()

다른 방식으로도 변환할 수 있습니다. Matplotlib Rectangle 객체 R0을 대상으로 합니다.
Rt라는 이름의 변환을 선언하고 R0.set_transform(Rt)를 적용하면 R0의 좌표가 바뀝니다.

We can get the same result by using a different method. We will rotate R0, a Matplotlib Rectangle object.
Declerating Rt as the designation and applying R0.set_transform(Rt) to R0 changes the coordinates of itself.

from matplotlib.patches import Rectangle

R0 = Rectangle((0, 0), 3, 3)

for ax_s in axs_s:
    R0 = Rectangle((0, 0), 3, 3, fc="none", ec="c", zorder=-1)
    if ax_s == axs_s[1]:
        Rt = Affine2D().rotate_deg(30) + axs_s[1].transData
        R0.set_transform(Rt)
    ax_s.add_patch(R0)
    
fig_s.tight_layout()
display(fig_s)

(150, 2) 데이터 Z0를 생성하고 같은 방식으로 원본 데이터를 35도 회전해 Z1를 만듭니다.
Z1의 x, y 평균([ 0.13746892, -0.02793329])을 구해 Zm에 저장합니다.

Create (150, 2) points data Z0 then rotate 35 degree in the same manner to generate Z1.
Calculate mean of Z1 ([ 0.13746892, -0.02793329]) and store it as Zm.

# Generate some data
Z0 = np.random.normal(0, (1.25, 0.75), (150, 2))  # Z0: 2D random points array
Z1 = Affine2D().rotate_deg(35).transform(Z0)      # Z1: rotated Z0
Zm = Z1.mean(axis=0)                              # mean of Z1. Zm = np.array([ 0.13746892, -0.02793329])

# Visualize
fig_r, axs_r = plt.subplots(ncols=2, figsize=(6, 3), sharex=True, sharey=True)
for Z, ax, t in zip([Z0, Z1], axs_r, [0, 35]):
    ax.scatter(Z[:,0], Z[:,1], c=np.linspace(0, 1, Z.shape[0]))
    ax.axhline(0, c="k", zorder=-1)
    ax.axvline(0, c="k", zorder=-1)
    ax.set_title(f"rotation = {t}$^o$", fontsize="large")

Zall = np.concatenate([Z0, Z1])    
xmin, xmax = Zall[:,0].min(), Zall[:,0].max()
ymin, ymax = Zall[:,1].min(), Zall[:,1].max()

axs_s[0].set_xlim(xmin, xmax)

fig_r.tight_layout()

2.3. Principal Component Analysis (PCA)

numpy.linalg.eig를 사용해 PCA를 수행합니다.
eigenvalue V와 eigenvector W로부터 PC1과 PC2를 구하고 각각 T(PC1)와 O(PC1과 직각. PC2)으로 복사합니다.

Perform PCA using numpy.linalg.eig().
Get PC1 and PC2 from the eigenvalue V and eigenvector W, and copy them as T(tangent to PC1) and O(orthogonal to PC1, equivalent to PC2).

# Principal components analysis
# Note that for some seeds, the PC1 and PC2 needs to be inverted
# It could be fixed by looking at the orientation but I'm lazy
W, V = np.linalg.eig(np.cov(Z1.T))                 # W: eigenvalues, V: eigenvectors
PC1, PC2 = V[np.argsort(abs(W))]                   # PC1, PC2: 1st and 2nd Principal components
if PC2[1] < 0:                                     # to make PC2 "upwards"
    PC2 = -PC2
    
rotation = 180 * np.arctan2(*PC1) / np.pi
T = np.array([PC1[0], PC1[1]])                     # tangent vector of PC1 (a deep copy of PC1)
O = np.array([PC2[0], PC2[1]])                     # orthogonal vector of PC1 (a deep copy of PC2)

# visualize
fig, ax = plt.subplots(figsize=(4, 4))
ax.scatter(Z1[:,0], Z1[:,1], c="lightgray")

for pc, c in zip(["PC1", "PC2"], ["b", "r"]):
    ax.arrow(0, 0, *eval(pc), width=0.1, length_includes_head=True, ec="none", fc=c)
    ax.text(*eval(pc), pc, ha="center", va="bottom", fontsize="large", color=c)

ax.set_aspect(1)

rotation, T, O를 확인합니다.

Let's take a look at rotation, T, and O.

1
2
3

print(f"rotation = {rotation}")
print(f"tangent vector = {T}")
print(f"orthogonal vector = {O}")

실행 결과

output

1
2
3

rotation = 37.89555000213857
tangent vector = [0.61422391 0.78913179]
orthogonal vector = [-0.78913179  0.61422391]

2.4. scatter plot and PC1 axis

이제 본격적으로 시각화를 수행합니다.
scatter plot을 그리고 x, y 축의 범위와 값을 지정합니다.
PC1 축을 따라 점선을 긋습니다.

Finally data visualization is about to begin.
Draw a scatter plot; subsidiary x, y ticks and ticklabels are controlled.
PC1 axis is marked as a dashed line.

# Draw
fig = plt.figure(figsize=(5, 5))
ax1 = fig.add_axes([0.05, 0.05, 0.9, 0.9], aspect=1)

# Main scatter plot
ax1.scatter(Z1[:, 0], Z1[:, 1], s=50, fc="C0", ec="white", lw=0.75)
ax1.set_xlim([-3, 6])
ax1.set_xticks([-3, 6])
ax1.set_xticklabels([])
ax1.set_ylim([-3, 6])
ax1.set_yticks([-3, 6])
ax1.set_yticklabels([])
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)


# Draw main PCA axis
P0 = np.vstack([Zm - T * 10, Zm + T * 10])         # P0 : endpoints of a long line along PC1
ax1.plot(P0[:, 0], P0[:, 1], color="black", linestyle="--", linewidth=0.75, zorder=10)

2.5. Bounding box along PCA axes

데이터 전체를 감싸는 bounding box를 그릴 차례입니다.
데이터 Z1을 x축을 따라 역회전시킨 후 y 절대값의 최대치로부터 box 높이의 절반 d를 구합니다.
O, T, d로부터 bounding box의 네 꼭지점을 구해 P2에 저장합니다.
마지막으로 P2로부터 Polygon 객체를 생성해 ax1.add_patch()로 그림에 붙입니다.

It's time to draw bounding box around the points.
Rotate the Z1 along its X-axis, then find d which is half of the box's abolute y value.
Get 4 points of the bounding box from O, T and d, store as P2.
Finally, create Polygon object from P2 and add it by using ax1.add_patch().

# Compute the width of the distribution along orthogonal direction to the PCA
# main axis. This is made by rotating points and taking max on the Y axis.
transform = Affine2D().rotate_deg(-rotation)
P1 = transform.transform(Z1 - Z1.mean(axis=0))     # P1 : rotated Z1, along x-axis
d = np.abs(P1[:, 1]).max()                         # d  : max. distance between P0 and Z1

# Draw a rectangle surrounding the distribution & oriented along PCA main axis
P2 = np.vstack(                                    # P2 : rectangle surrounding Z1
    [
        Zm - 10 * T - d * O,
        Zm + (6 - d) * T - d * O,
        Zm + (6 - d) * T + d * O,
        Zm - 10 * T + d * O,
    ]
)
ax1.add_patch(
    Polygon(
        P2,
        closed=True,
        fill=True,
        edgecolor="None",
        facecolor="C0",
        alpha=0.1,
        zorder=-50,
    )
)

bounding box에 윤곽선을 추가하고 축 위에 평균 Zm을 표시하는 삼각형 마커를 추가합니다.
Zm, d, T, O로부터 윤곽선 좌표 P3과 P4를 구해 ax1.plot()으로 조금 진한 윤곽선을 입힙니다.

Boundary of the bounding box should be more clear, and the triangle markers denoting means Zm should be visible on each axis.
Get the coordinates of the outline, P3 and P4, from Zm, d, T and O, then apply a slightly darker outline by using the command ax1.plot().

# P3, P4 : edges of P2 parallel to PC1
P3 = np.vstack([Zm - 10 * T, Zm + (6 - d) * T]) - d * O  
plt.plot(P3[:, 0], P3[:, 1], color="C0", linestyle="-", linewidth=0.75, alpha=0.25)
P4 = np.vstack([Zm - 10 * T, Zm + (6 - d) * T]) + d * O
plt.plot(P4[:, 0], P4[:, 1], color="C0", linestyle="-", linewidth=0.75, alpha=0.25)

# Some markers on the axis to show the mean (we could compute exactly the delta
# for placing the marker but it is not the point of this example)
ax1.scatter(Zm[0], -2.85, s=50, color="black", marker="v", clip_on=False)
ax1.scatter(-2.85, Zm[1], s=50, color="black", marker="<", clip_on=False)

2.6. Add Axes for rotated histogram

이번 시각화의 하일라이트, 회전된 histogram을 붙일 차례입니다.
회전된 histogram을 붙이기 위해서는 정확한 자리에 정확한 각도와 크기로 새 Axes를 형성하고
이 Axes 위에 histogram을 회전시켜 붙여야 합니다.

The next step is the grand finale of this visualization, the rotated histogram.
In order to add rotated histogram, new Axes should be formed at the exact site with the exact angle and size.
And then histogram is added with the rotation.

정사각형 모양의 공간을 PC1 방향의 연장선에 형성할 것입니다.
높이와 너비는 P2로 만들어진 Polygon의 높이와 동일합니다.
histogram이 놓일 공간의 중심 좌표 C를 구합니다.

We are now going to form a square shaped space in the direction, and as an extension, of PC1.
The height and width of the space is same as the height of Polygon based on P2.
The center coordinate C of the histogram space is calculated.

1 2	# 1. Compute the center of the histogram C = Zm + 6 * T

1
2
3

# visualization
ax1.scatter(*C, s=100, c="orange", marker="X", zorder=1)
display(fig)

histogram이 놓일 공간의 높이 h0와 너비 w0를 구합니다.

Calculate the height h0 and width w0 of the histogram space.

# 2. Compute the coordinate and the size in normalized figure coordinates
x, y = fig.transFigure.inverted().transform(ax1.transData.transform(C))
xo, yo = fig.transFigure.inverted().transform(ax1.transData.transform(C + 2 * d * O))
h0 = w0 = np.sqrt((xo - x) ** 2 + (yo - y) ** 2)    # preparation of the histogram Axes

#visualize
import matplotlib.lines as lines

text_xy = fig.text(x, y, "(x, y)", c="r", zorder=1,
                   ha="center", va="center", fontsize="large")
text_xyo = fig.text(xo, yo, "(xo, yo)", c="r", zorder=1,
                    ha="center", va="center", fontsize="large")
line_hw = fig.add_artist(lines.Line2D([x, xo], [y, yo], c="orange", ls=":"))
text_hw = fig.text((x+xo)/2, (y+yo)/2, "h = w", c="r", 
                   ha="center", va="center", fontsize="large")
display(fig)

matplotlib: mpl_toolkits.axisartist.floating_axes

이 치수를 바탕으로 histogram이 놓일 Axes ax2를 생성합니다.
불규칙한 Axes를 만들 때 floating_axes를 사용합니다.

Based on this dimensions Axes for histogram, ax2 is generated.
floating_axes is used for irregular Axes.

# 3. Create the secondary axis
#    Warning: it must be squared, ie. xmax-xmin = ymax-ymin
#    It is possible to have non squared axis, but it would complicate things.
xmin, xmax = -16, 16
ymin, ymax = 0, xmax - xmin
transform = Affine2D().rotate_deg(-rotation)
helper = floating_axes.GridHelperCurveLinear(transform, (xmin, xmax, ymin, ymax))
ax2 = floating_axes.FloatingSubplot(fig, 111, grid_helper=helper, zorder=0)

# 4. We know the size of the axis we want but it is rotated. When we specify
#    the size and position, it related to the non-rotate axis and we thus need
#    to compute the bounding box. To do that, we rotate the four coordinates
#    from which we deduce the bounding box coordinates.
transform = Affine2D().rotate_deg(-rotation)
R = transform.transform(                            # outline of the histogram Axes
    [
        (x - w0 / 2, y - h0 / 2),
        (x + w0 / 2, y - h0 / 2),
        (x - w0 / 2, y + h0 / 2),
        (x + w0 / 2, y + h0 / 2),
    ]
)
w1 = R[:, 0].max() - R[:, 0].min()                  # width of the histogram Axes
h1 = R[:, 1].max() - R[:, 1].min()                  # height of the histogram Axes
ax2.set_position((x - w1 / 2, y - h1 / 2, w1, h1))
fig.add_subplot(ax2)                                # Adding histogram Axes

# visualization
for text in fig.texts:
    text.set_zorder(1)
ax2.set_zorder(3)

display(fig)

2.7. Add Rotated histogram and axis settings

이제 공간이 마련되었으니 histogram을 붙일 차례입니다.
회전된 공간에 데이터를 넣으려면 ax2.get_aux_axes(transform)을 사용해 별도의 Axes를 만들어야 합니다.
PC2 방향의 histogram을 얻기 위해 numpy.histogram(-Z @ PC1)을 사용했습니다. PC1에 수직 방향의 빈도를 구하라는 뜻입니다.

Since the new Axes is prepared, it is time to add the histogram.
To visualize data on a rotated space, another Axes should be generated using ax2.get_aux_axes(transform).
numpy.histogram(-Z1 @ PC1) is used to obtain histogram along the PC2 axis. The code means return data frequences orthogonal to PC1 axis.

# 5. Some decoration the secondary axis
ax2.axis["left"].major_ticklabels.set_visible(False)
ax2.axis["bottom"].major_ticklabels.set_visible(False)
ax2.axis["bottom"].major_ticks.set_tick_out(True)
ax2.axis["left"].set_visible(False)
ax2.axis["right"].set_visible(False)
ax2.axis["top"].set_visible(False)
ax2.set_xticks([0, 1])
ax2.patch.set_visible(False)

# 6. Display the histogram, taking care of the extents of the X axis
counts, bins = np.histogram(-Z1 @ PC1, bins=12)     # histogram of -Z1 orthogonal to PC1 direction with 12 bins
X0 = (bins - bins[0]) / (bins[-1] - bins[0])        # X0 : normalized bins range [0, 1]
X1 = xmin + (xmax - xmin) * X0                      # X1 : stretched bins range [xmin, xmax] = [-16, 16]
Y = np.array(counts)

# This auxiliary axis is necessary to draw stuff (no real idea why)
ax2_aux = ax2.get_aux_axes(transform)

# Plotting histogram 
ax2_aux.hist(X1[:-1], X1, weights=Y, facecolor="C0", edgecolor="white", linewidth=0.25)

text_xy.remove()
text_xyo.remove()
text_hw.remove()
line_hw.remove()

display(fig)

마지막으로 히스토그램 숫자를 기입합니다.

At last, put numbers on top of the bars of histogram.

# 7. Adding some labels
dx, dy = (X1[1] - X1[0]) / 2, 0.75
for x, y in zip(X1, Y):
    ax2_aux.text(
        x + dx,
        y + dy,
        "%d" % y,
        ha="center",
        va="center",
        size=8,
        rotation=-rotation,
    )
    
display(fig)

3. Conclusion

코드를 여러 부분으로 나누어 그림이 만들어지는 과정을 코드와 함께 살펴봤습니다.
리뷰 과정에서 원본을 일부 수정한 Pull Request가 적용되었습니다. 이 글은 수정본을 기반으로 작성되었습니다.
Matplotlib은 표현력이 매우 강한 라이브러리입니다.
여러분의 상상을 실현하는 데, Nicolas의 훌륭한 코드를 이해하는 데 도움이 되시길 바랍니다.

In this article we divide the source code into many parts, and looked through how the final figure was created.
A couple of changes were suggested as a form of Pull Request to correct small errors, and accepted. This article is written based on the revised version of mine.
Matplotlib is a library with a strong power of expression.
I hope this article to be an assistant to realize your imagination, as well as help you understand Nicolas's excellent code.

도움이 되셨나요? 카페인을 투입하시면 다음 포스팅으로 변환됩니다

PythonData Science

1. Rotated Histogram

2. Matplotlib coding

2.1. import libraries

2.2. Affine transformation

2.3. Principal Component Analysis (PCA)

2.4. scatter plot and PC1 axis

2.5. Bounding box along PCA axes

2.6. Add Axes for rotated histogram

2.7. Add Rotated histogram and axis settings

3. Conclusion