Welcome To Crax.Pro Forum!

Check our new Marketplace at Crax.Shop

   Login! SignUp Now!
A vision transformer (ViT) is a transformer designed for computer vision. A ViT breaks down an input image into a series of patches (rather than breaking up text into tokens), serialises each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.
ViT has found applications in image recognition, image segmentation, and autonomous driving.

View More On Wikipedia.org
Top Bottom