A technique is disclosed for determining a portion of a document corresponding to a captured image. A user employs a pen to create a stroke in a document, and images are captured by a camera mounted on the pen. The locations of some of the images are determined by, e.g., analyzing a pattern on the document captured by the image or by a pixel-by-pixel comparison of the image with the document. The locations of other images are determined by segmenting the sequence of images into groups corresponding to the shape of the stroke. Information relating to a located image in a segment is employed to determine the position of an unlocated image in the segment. This determined position is used for obtaining further information that may be used to determine the position of another unlocated image in the segment, and so on, until the segment is finished.