This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper Attend the next NDC conference near you: https://ndcconferences.com https://ndclondon.com/ Subscribe to our YouTube channel and learn every day: / @NDC Follow our Social Media! https://www.facebook.com/ndcconferences https://twitter.com/NDC_Conferences https://www.instagram.com/ndc_conferences/ #machinelearning #database #bigdata Many real-world problems are inherently multimodal, from the communicative modalities humans use such as spoken language and gestures to the force, proprioception, and visual sensors ubiquitous in robotics. In order for machine learning models to address these problems and interact more naturally and wholistically with the world around them and ultimately be more general and powerful reasoning engines we need them to understand data across all of its corresponding image, video, text, audio, and tactile representations. In this talk, Zain Hasan will discuss how we can use open-source multimodal models (such as https://github.com/facebookresearch/ImageBind), that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos and large-scale datasets, how being able to perform this cross-modal retrieval in real-time can help users add natural search interfaces to their apps. This talk will revolve around how we scaled the usage of multimodal embedding models in production and how you can add cross-modal search into your apps.
Get notified about new features and conference additions.