0
\$\begingroup\$

Can someone provide me an explanation of how the lookAt matrix works?

+----+----+----+----+
| Xx | Xy | Xz |  0 |  <- x axis
+----+----+----+----+
| Yx | Yy | Yz |  0 |  <- y axis
+----+----+----+----+
| Zx | Zy | Zz |  0 |  <- z axis
+----+----+----+----+
| Tx | Ty | Tz |  1 |  <- camera position
+----+----+----+----+

From my current understanding using the diagram above from this tutorial, the zAxis is created by subtracting camera position from the target position. Then you create two more perpendicular vectors calculated by: xAxis = zAxis x up, yAxis = xAxis x zAxis.

However, I don't understand how these exact operations result in the camera facing towards something? Like why are these exact calculations needed? Why could it not have been zAxis = camPosition - targetPosition? Or yAxis = zAxis x down instead of zAxis x up? Or have xAxis = targetPosition - camPosition instead of the zAxis?

\$\endgroup\$
1

2 Answers 2

1
\$\begingroup\$

The positive zAxis actually points away from the target. Thus what I wrote previously was incorrect, it should instead be: zAxis = norm(camPosition - targetPosition). This will cause the positive zAxis to point out of the screen from the camera's perspective, and thus the negative zAxis to go into the screen towards the target. This means then means the target can be rendered as it is located on the negative zAxis of the camera (remember that the camera only renders vertices on negative zAxis).

The zAxis for the lookAt matrix is actually

Because of the above, that is why zAxis = norm(camPosition - targetPosition) is needed.

\$\endgroup\$
0
\$\begingroup\$

You are creating an orthogonal coordinate system with the operations described that moves objects from world space into view space.

The vector of target - pos gives you the Z axis. It is from the eye towards the direction you are looking. The next cross product of Z x Up creates the X vector, which points left or right depending on the handedness of the coordinate system. Finally, the Y vector is created by X x Z, giving you your orthogonal basis vectors. These are then offset by the camera's position.

The reason why we typically do the math in that order: By convention, Z typically points into the screen in a left handed system and out of the screen in a right handed system. This causes X to match the horizontal of the screen and Y to match the vertical of the screen, leaving Z to represent depth.

\$\endgroup\$
1
  • \$\begingroup\$ "that moves objects from world space into view space", this is technically not correct. The matrix created from the operations is a camToWorld matrix which, given an object's position relative to the created camera coordinate system, gives its position in world coordinates. In order to bring an object from world to camera coordinates you need the inverse of that matrix or worldToCam \$\endgroup\$
    – PentaKon
    Commented Dec 6, 2021 at 11:50

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .