publications
2024
- Evaluating Multiview Object Consistency in Humans and Image ModelsTyler Bonnen, Stephanie Fu, Yutong Bai, Thomas O’Connell, Yoni Friedman, Nancy Kanwisher, Joshua B. Tenenbaum, and Alexei A. Efros2024
- A Vision Check-up for Language ModelsPratyusha Sharma, Tamar Rott Shaham, Manel Baradad, Stephanie Fu, Adrian Rodriguez-Munoz, Shivam Duggal, Phillip Isola, and Antonio Torralba2024
- OpenStreetView-5M: The Many Roads to Global Visual GeolocationGuillaume Astruc, Nicolas Dufour, Ioannis Siglidis, Constantin Aronssohn, Nacim Bouia, Stephanie Fu, Romain Loiseau, Van Nguyen Nguyen, Charles Raude, Elliot Vincent, Lintao XU, Hongyu Zhou, and Loic Landrieu2024
- FeatUp: A Model-Agnostic Framework for Features at Any ResolutionStephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, and William T. Freeman2024
2023
- DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic DataStephanie Fu*, Netanel Tamir*, Shobhita Sundaram*, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip IsolaarXiv:2306.09344 2023
Current perceptual similarity metrics operate at the level of pixels and patches. These metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level similarities and differences in image layout, object pose, and semantic content. In this paper, we develop a perceptual metric that assesses images holistically. Our first step is to collect a new dataset of human similarity judgments over image pairs that are alike in diverse ways. Critical to this dataset is that judgments are nearly automatic and shared by all observers. To achieve this we use recent text-to-image models to create synthetic pairs that are perturbed along various dimensions. We observe that popular perceptual metrics fall short of explaining our new data, and we introduce a new metric, DreamSim, tuned to better align with human perception. We analyze how our metric is affected by different visual attributes, and find that it focuses heavily on foreground objects and semantic content while also being sensitive to color and layout. Notably, despite being trained on synthetic data, our metric generalizes to real images, giving strong results on retrieval and reconstruction tasks. Furthermore, our metric outperforms both prior learned metrics and recent large vision models on these tasks.
2022
2021
- Axiomatic Explanations for Visual Search, Retrieval, and Similarity LearningMark Hamilton, Scott Lundberg, Lei Zhang, Stephanie Fu, and William T FreemanInternational Conference on Learning Representations (ICLR) 2021
Visual search, recommendation, and contrastive similarity learning power technologies that impact billions of users worldwide. Modern model architectures can be complex and difficult to interpret, and there are several competing techniques one can use to explain a search engine’s behavior. We show that the theory of fair credit assignment provides a unique axiomatic solution that generalizes several existing recommendation- and metric-explainability techniques in the literature. Using this formalism, we show when existing approaches violate "fairness" and derive methods that sidestep these shortcomings and naturally handle counterfactual information. More specifically, we show existing approaches implicitly approximate second-order Shapley-Taylor indices and extend CAM, GradCAM, LIME, SHAP, SBSM, and other methods to search engines. These extensions can extract pairwise correspondences between images from trained opaque-box models. We also introduce a fast kernel-based method for estimating Shapley-Taylor indices that require orders of magnitude fewer function evaluations to converge. Finally, we show that these game-theoretic measures yield more consistent explanations for image similarity architectures.
- MosAIc: Finding Artistic Connections across Culture with Conditional Image RetrievalMark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp, Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris Hoder, and William T. FreemanIn Proceedings of the NeurIPS 2020 Competition and Demonstration Track 06–12 dec 2021
We introduce MosAIc, an interactive web app that allows users to find pairs of semantically related artworks that span different cultures, media, and millennia. To create this application, we introduce Conditional Image Retrieval (CIR) which combines visual similarity search with user supplied filters or “conditions”. This technique allows one to find pairs of similar images that span distinct subsets of the image corpus. We provide a generic way to adapt existing image retrieval data-structures to this new domain and provide theoretical bounds on our approach’s efficiency. To quantify the performance of CIR systems, we introduce new datasets for evaluating CIR methods and show that CIR performs non-parametric style transfer. Finally, we demonstrate that our CIR data-structures can identify “blind spots” in Generative Adversarial Networks (GAN) where they fail to properly model the true data distribution.
- Digital electronics in fibres enable fabric-based machine-learning inferenceGabriel Loke, Tural Khudiyev, Brian Wang, Stephanie Fu, Syamantak Payra, Yorai Shaoul, Johnny Fung, Ioannis Chatziveroglou, Pin-Wen Chou, Itamar Chinn, Wei Yan, Anna Gitelson-Kahn, John Joannopoulos, and Yoel FinkNature Communications Jun 2021
Digital devices are the essential building blocks of any modern electronic system. Fibres containing digital devices could enable fabrics with digital system capabilities for applications in physiological monitoring, human-computer interfaces, and on-body machine-learning. Here, a scalable preform-to-fibre approach is used to produce tens of metres of flexible fibre containing hundreds of interspersed, digital temperature sensors and memory devices with a memory density of ~7.6\thinspace\texttimes\thinspace105 bits per metre. The entire ensemble of devices are individually addressable and independently operated through a single connection at the fibre edge, overcoming the perennial single-fibre single-device limitation and increasing system reliability. The digital fibre, when incorporated within a shirt, collects and stores body temperature data over multiple days, and enables real-time inference of wearer activity with an accuracy of 96% through a trained neural network with 1650 neuronal connections stored within the fibre. The ability to realise digital devices within a fibre strand which can not only measure and store physiological parameters, but also harbour the neural networks required to infer sensory data, presents intriguing opportunities for worn fabrics that sense, memorise, learn, and infer situational context.