1Semantic Image Representation as long as Visual RecognitionNikhil Rasiwasia, Nuno Vasc

1Semantic Image Representation as long as Visual RecognitionNikhil Rasiwasia, Nuno Vasc www.phwiki.com

1Semantic Image Representation as long as Visual RecognitionNikhil Rasiwasia, Nuno Vasc

Perry, Bill, News Director has reference to this Academic Journal, PHwiki organized this Journal 1Semantic Image Representation as long as Visual RecognitionNikhil Rasiwasia, Nuno VasconcelosStatistical Visual Computing LaboratoryUniversity of Cali as long as nia, San DiegoThesis DefenseIll pause as long as a few moments so that you all can finish reading this. 2© Bill WattersonVisual RecognitionHumans brains can per as long as m recognition with astonishing speed in addition to accuracy [Thorpe’96]Can we make computers per as long as m the recognition taskWith astonishing speed in addition to accuracy :)Several applications3

Holy Names College US www.phwiki.com

This Particular University is Related to this Particular Journal

WhyInternet in Numbers5,000,000,000 – Photos hosted by Flickr (Sept’ 2010).3000+ – Photos uploaded per minute to Flickr.3,000,000,000 – Photos uploaded per month to Facebook.20,000,000 – Videos uploaded to Facebook per month.2,000,000,000 – Videos watched per day on YouTube.35 – Hours of video uploaded to YouTube every minute.Source: http://www.cbsnews.com/8301-501465-162-20028418-501465.html Several other sources of visual contentPrinted media, surveillance, medical imaging, movies, robots, other automated machines, etc.4 manual processing of the visual content is prohibitive.ChallengesMultiple viewpoints occlusions, clutter etc.Multiple illumination,Semantic gap, Multiple interpretation,Role of context, etc.5Outline. Semantic Image RepresentationAppearance Based Image RepresentationSemantic Multinomial [Contribution]Benefits as long as Visual RecognitionAbstraction: Bridging the Semantic Gap (QBSE) [Contribution]Sensory Integration: Cross-modal Retrieval [Contribution]Context: Holistic Context Models [Contribution]Connections to the literatureTopic Models: Latent Dirichlet AllocationText vs ImagesImportance of Supervision: Topic-supervised Latent Dirichlet Allocation (ts LDA) [Contribution]6

Current ApproachIdentify classes of interestDesign set of “appearance” based featuresPixel intensity, color, edges, texture, frequency spectrum, etc.Postulate an architecture as long as their recognitionGenerative models, discriminative models, etc. Learn optimal recognizers from training dataExpectation Maximization, convex optimization, variational learning, Markov chain Monte Carlo etc. Reasonably successful in addressing multiple viewpoints / clutter / occlusions, in addition to illumination to an extent.But: semantic gap multiple interpretation role of context7Image RepresentationBag-of-featuresLocalized patch based descriptorsSpatial relations between features are discardedImageWhere are N feature vectorsDefined on the space of low-level appearance features Several feature spaces , have been proposed in the literature8etc. Assume each image is a class determined by Y in addition to induces a probability on Bag-of-features: Mixtures Approach9Gaussian Mixture ModelBag of FeaturesExpectation MaximizationFeature Trans as long as mationAppearance Feature Space

10Bag-of-wordsQuantize feature space into unique binsUsually K-means clusteringEach bin, represented by its centroid is called a visual-wordA collection of visual-words as long as ms a codebook, Each feature vector is mapped to its closest visual wordAn image is represented as a collection of visual words,Also as a frequency count over the visual word codebook Eg. Image Retrieval 11Pause as long as a moment – The Human PerspectiveWhat is this —>An image ofBuildingsStreetCarsSkyFlowersCity scene Some concepts are more prominent than others. From ‘Street’ class! 12

Human underst in addition to ing of images suggests that they are “visual representations” of certain “meaningful” semantic concepts. There can be several concepts represented by an image.But, practically impossible to enlist all possible concepts represented So, define a ‘vocabulary’ of concepts. Assign weights to the concepts based on their prominence in the image. An Image – An Intuition. 13{buildings, street, sky, clouds, tree, cars, people, window, footpath, flowers, poles, wires, tires, }An Image – An Intuition 14Semantic gap This has buildings in addition to not as long as est.Multiple semantic interpretationBuildings, Inside cityContextInside city, Street, Highway, Buildings co-occurBuilds upon bag-of-features representationGiven a vocabulary of concepts Image are represented as vectors of concept counts Where is the number of low level features drawn from the ith concept.The count vector as long as yth image is drawn from a multinomial with parameters, The probability vector is denoted as the Semantic Multinomial (SMN) can be seen as a feature trans as long as mation from to the L-dimensional probability simplex , denoted as the Semantic SpaceSemantic Image Representation15Semantic Multinomial

16Semantic Labeling SystemGMMwi = streetstreetAppearance based Class ModelEfficient Hierarchical Estimation“Formulating Semantics Image Annotation as a Supervised Learning Problem” [G. Carneiro, IEEE Trans. PAMI, 2007]17Semantic Labeling SystemImageLikelihoods Posterior Probabilities Likelihood under various modelsAppearance based concept models. ConceptsSemantic Image Representation18Semantic SpaceSemantic MultinomialSemantic Labeling System

Semantic Multinomial192021

Was alone, not anymore!Learning visual attributes by Ferrari,V.,Zisserman,A (NIPS 2007) Describing objects by their attributes by Farhadi, A., Endres, I., Hoiem, D., Forsyth, D. (CVPR 2009) Learning to detect unseen object classes by between-class attribute transfer by Lampert, C.H., Nickisch, H., Harmeling, S. (CVPR 2009) Joint learning of visual attributes, object classes in addition to visual saliency by Wang, G., Forsyth, D.A. (ICCV2009) Attribute-centric recognition as long as cross-category generalization by Farhadi, A., Endres, I., Hoiem, D. (CVPR 2010)A Discriminative Latent Model of Object Classes in addition to Attributes by Yang Wang, Greg Mori (ECCV 2010)Recognizing Human Actions by Attributes by Jingen Liu, Benjamin Kuipers, Silvio Savarese (CVPR 2011)Interactively Building a Discriminative Vocabulary of Nameable Attributes by Devi Parikh, Kristen Grauman (CVPR 2011)Sharing Features Between Objects in addition to Their Attributes by Sung Ju Hwang, Fei Sha, Kristen Grauman (CVPR 2011)22Outline. Semantic Image RepresentationAppearance Based Image RepresentationSemantic Multinomial [Contribution]Benefits as long as Visual RecognitionAbstraction: Bridging the Semantic Gap (QBSE) [Contribution]Sensory Integration: Cross-modal Retrieval [Contribution]Context: Holistic Context Models [Contribution]Connections to the literatureTopic Models: Latent Dirichlet AllocationText vs ImagesImportance of Supervision: Topic-supervised Latent Dirichlet Allocation (ts LDA) [Contribution]2324QBSEQBVE“whitish + darkish”“train + railroad”Higher abstraction

Perry, Bill KWRQ-FM News Director www.phwiki.com

25VSPeople 0.09Buildings 0.07Street 0.07Statue 0.05Tables 0.04Water 0.04Restaurant 0.04Buildings 0.06People 0.06Street 0.06Statue 0.04Tree 0.04Boats 0.04Water 0.03People 0.08Statue 0.07Buildings 0.06Tables 0.05Street 0.05Restaurant 0.04House 0.03People 0.12Restaurant 0.07Sky 0.06Tables 0.06Street 0.05Buildings 0.05Statue 0.05QBVEQBSECommercial ConstructionPeople 0.1Statue 0.08Buildings 0.07Tables 0.06Street 0.06Door 0.05Restaurant 0.04Out of Vocabulary GeneralizationRobust Estimation of SMNRegularization of the semantic multinomialsUsing conjugate prior: Dirichlet distribution with parameter Semantic labeling systems should have “soft” decisions2627Is the gain really due to the semantic structure of the semantic spaceTested by building semantic spaces with no semantic structureR in addition to om image groupingsWith r in addition to om groupings quite poor, indeed worse than QBVEthere seems to be an intrinsic gain of relying on a space where the features are semanticThe Semantic Gain

Outline. Semantic Image RepresentationAppearance Based Image RepresentationSemantic Multinomial [Contribution]Benefits as long as Visual RecognitionAbstraction: Bridging the Semantic Gap (QBSE) [Contribution]Sensory Integration: Cross-modal Retrieval [Contribution]Context: Holistic Context Models [Contribution]Connections to the literatureTopic Models: Latent Dirichlet AllocationText vs ImagesImportance of Supervision: Topic-supervised Latent Dirichlet Allocation (ts LDA) [Contribution]28Sensory IntegrationRecognition systems that are transparent to different in as long as mation modalitiesText, Images, Music, Video, etc.Cross-modal Retrieval: systems that operates across multiple modalitiesCross modal text query, eg. retrieval of images from photoblogs using text Finding images to go along with a text articleFinding music to enhance videos, slide shows.Image positioning.Text summarization based on images in addition to much more 30Cross-modal RetrievalCurrent retrieval systems are predominantly uni-modal.The query in addition to retrieved results are from the same modalityCross-modal Retrieval: Given query from modality A, retrieve results from modality B.The query in addition to retrieved items are not required to share a common modality.

64Questions© Bill Watterson65Learn mappings ( ) that maps different modalities into intermediate spaces ( ) that have a natural in addition to invertible correspondence ( )Given a text query in the cross-modal retrieval reduces to find the nearest neighbor of: Similarly as long as image query:The task now is to design these mappings. An IdeaLike most of the UK, the Manchester area mobilised extensively during World War II. For example, casting in addition to machining expertise at Beyer, Peacock in addition to Company’s locomotive works in Gorton was switched to bomb making; Dunlop’s rubber works in Chorlton-on-Medlock made barrage balloons;Martin Luther King’s presence in Birmingham was not welcomed by all in the black community. A black attorney was quoted in ”Time” magazine as saying, “The new administration should have been given a chance to confer with the various groups interested in change. In 1920, at the age of 20, Coward starred in his own play, the light comedy ”I’ll Leave It to You”. After a tryout in Manchester, it opened in London at the New Theatre (renamed the Noël Coward Theatre in 2006), his first full-length play in the West End.Thaxter, John. British Theatre Guide, 2009 Neville Cardus’s praise in ”The Manchester Guardian”

Perry, Bill News Director

Perry, Bill is from United States and they belong to KWRQ-FM and they are from  Thatcher, United States got related to this Particular Journal. and Perry, Bill deal with the subjects like International News; Local News; Regional News

Journal Ratings by Holy Names College

This Particular Journal got reviewed and rated by Holy Names College and short form of this particular Institution is US and gave this Journal an Excellent Rating.