
In our work we are interested in searching for firearms locations, particularly in scenes where people are holding firearms. In fact, searching for a particular object in a scene can be extremely difficult, as one has to consider all possible views that the object can take. Models of integration are natural in human vision, but are difficult to define for computer vision applications. In a generic scene analysis, both of them are integrated for a faster visual search. Bottom-up processing is a primitive function of the human vision system and responds to various stim- uli such as intensity, color, and orientation, etc. In the top-down process, attention detects salient areas through understanding and recognition mechanisms. The attention process selects visual information on the basis of both saliency in the image (bottom-up, task-independent process), and of prior knowledge about the context and the objects in the scene (top-down, task dependent process). human visual system is able to easily detect an in- teresting object in natural scenes through the selective attention mechanism, that discard useless information, selecting the most relevant ones for higher-level cognitive processing.
