Visual Search

There’s a very good reason why the cave paintings at Lascaux don’t have captions – other than the fact that they were made about 11,000 years before the first writing systems, they’re a very obvious reminder that we’re a visual species. While we think, perceive, and can immediately recognize pictures visually, reading, recognizing, and understanding written/typed text usually takes significantly more time. For the longest time, this truth has been at odds with the majority of the Internet; however, advancements over the past decade in camera quality, mobile connectivity, processing power, and machine learning are changing how we can use the Internet to better meet our needs. Seeing is believing, so let’s take a look at today’s term, visual search.

As a term, visual search involves the usage of an image to search for related results via a search engine – much like keywords, only visual. Breaking the term down into its components, visual comes originally from the Latin videre, meaning ‘to see’, and first appears in English thanks to Philemon Holland’s 1603 translation of Plutarch’s Morals, writing that: “As the one [sc. the sun] kindles, bringeth forth and stirreth up the visual power and virtue of the sense,” whereas search, being, in this sense, a shortened form of “search engine”, comes from the Old French cerchier, and first appears in a 1984 issues of American Libraries Magazine, stating: “Notice the trend toward associative hardware search engines, e.g., GESCAN 2, a General Electric computer built specifically for searching rather than for general purposes.” The first applicable mention of the term (that can be found thus far) is in a paper by Christian Schulte presented at the 1997 International Conference on Principles and Practice of Constraint Programming, entitled Programming Constraint Inference Engines, where he states that: “Using computation spaces, the paper covers several inference engines ranging from standard search strategies to techniques new to constraint programming, including limited discrepancy search, visual search, and saturation.”

Contrary to the traditional image search, which involves a text-based search for images, visual search is solely based around images. In other words, as opposed to using text metadata to define what an image is, visual search is based around the idea of teaching computers via machine learning to visually understand what they are “seeing”. Once the computer begins to “understand” both what it is seeing and the context of what it is seeing through training sets, scaling, and cross-referencing, it can then begin to comprehend how and where to best use images.

Though we are just beginning to see what is possible with visual search, some companies have already made significant leaps into the field. Unsurprisingly, Google (Google Lens, launched in 2017) is at the forefront, applying the technology to their Photos app, Google Search app, and Google Assistant, by allowing users to classify photos, get information/reviews on a place from a simple photo, and translate images of text in real time. Pinterest, which can tout 600 million visual searches every month, allows brands to target over 5,000 categories through visual search and expects to see USD 1 billion in visual search ad revenues this year. Finally, for anyone who shops at Amazon, the release of products such as the Echo Show and Spot, the ability to visually search for products from Snapchat and Instagram, and the ability to shop on the app via Camera Search functionality, all demonstrate a substantial investment in visual search.

When it comes to visual searching, to quote an overused phrase, “I’ve seen the future, and the future is now.”

Tagged as: camera search shopping, google lens product search, google lens shopping, image search engines, image search shopping, visual search ad revenue, visual search investment trends