Multispectral image alignment plays a crucial role in exploiting complementary information between different spectral images. Homography-based image alignment can be a practical solution considering a tradeoff between runtime and accuracy. Existing methods, however, have difficulty with multispectral images due to the additional spectral gap or require expensive human labels to train models. To solve these problems, this paper presents a comprehensive study on multispectral homography estimation in an unsupervised learning manner. We propose a curriculum data augmentation, an effective solution for models learning spectrum-agnostic representation by providing diverse input pairs. We also propose to use the phase congruency loss that explicitly calculates the reconstruction between images based on low-level structural information in the frequency domain. To encourage multispectral alignment research, we introduce a novel FLIR corresponding dataset that has manually labeled local correspondences between multispectral images. Our model achieves state-of-the-art alignment performance on the proposed FLIR correspondence dataset among supervised and unsupervised methods while running at 151 FPS . Furthermore, our model shows good generalization ability on the M3FD dataset without finetuning.