Academia.eduAcademia.edu

POSTGRESQL-IE: An Image-handling Extension for PostgreSQL

2009, Journal of Digital Imaging

The last decade witnessed a growing interest in research on content-based image retrieval (CBIR) and related areas. Several systems for managing and retrieving images have been proposed, each one tailored to a specific application. Functionalities commonly available in CBIR systems include: storage and management of complex data, development of feature extractors to support similarity queries, development of index structures to speed up image retrieval, and design and implementation of an intuitive graphical user interface tailored to each application. To facilitate the development of new CBIR systems, we propose an image-handling extension to the relational database management system (RDBMS) PostgreSQL. This extension, called PostgreSQL-IE, is independent of the application and provides the advantage of being open source and portable. The proposed system extends the functionalities of the structured query language SQL with new functions that are able to create new feature extraction procedures, new feature vectors as combinations of previously defined features, and new access methods, as well as to compose similarity queries. PostgreSQL-IE makes available a new image data type, which permits the association of various images with a given unique image attribute. This resource makes it possible to combine visual features of different images in the same feature vector. To validate the concepts and resources available in the proposed extended RDBMS, we propose a CBIR system applied to the analysis of mammograms using PostgreSQL-IE.

POSTGRESQL-IE: An Image-handling Extension for PostgreSQL Denise Guliato,1 Ernani V. de Melo,1 Rangaraj M. Rangayyan,2 and Robson C. Soares1 The last decade witnessed a growing interest in research on content-based image retrieval (CBIR) and related areas. Several systems for managing and retrieving images have been proposed, each one tailored to a specific application. Functionalities commonly available in CBIR systems include: storage and management of complex data, development of feature extractors to support similarity queries, development of index structures to speed up image retrieval, and design and implementation of an intuitive graphical user interface tailored to each application. To facilitate the development of new CBIR systems, we propose an image-handling extension to the relational database management system (RDBMS) PostgreSQL. This extension, called PostgreSQL-IE, is independent of the application and provides the advantage of being open source and portable. The proposed system extends the functionalities of the structured query language SQL with new functions that are able to create new feature extraction procedures, new feature vectors as combinations of previously defined features, and new access methods, as well as to compose similarity queries. PostgreSQL-IE makes available a new image data type, which permits the association of various images with a given unique image attribute. This resource makes it possible to combine visual features of different images in the same feature vector. To validate the concepts and resources available in the proposed extended RDBMS, we propose a CBIR system applied to the analysis of mammograms using PostgreSQL-IE. KEY WORDS: Database management systems, digital mammography, digital image management, image database, image retrieval, information storage and retrieval, information system INTRODUCTION C ontent-based image retrieval (CBIR) has emerged as an important area in computer vision and multimedia computing. A large number of new techniques to derive image features1–7 and Journal of Digital Imaging, Vol 22, No 2 (April), 2009: pp 149Y165 several CBIR systems8–14 have been developed and used in a vast range of applications. An important subject related to the development of CBIR systems is searching a database using similarity queries with high performance. Queries of this kind search the database for objects that are similar to one given as the reference, according to a similarity measure given by a distance function.15 The most wellknown strategies to implement similarity queries are: K-nearest neighbor (KNN), range search, and similarity join.16,17 With the aim of increasing the efficiency of similarity searches, several index structures have been proposed.18–22 However, few works have been proposed to extend the structured query language SQL to provide support for similarity queries and are restricted to predefined access methods and a limited number of feature extraction procedures.15,25,26 A CBIR system developer is required to be involved with several complex tasks such as: understanding the domain of the application, organizing the data of the application and storing 1 From the Faculdade de Computação, Universidade Federal de Uberlândia, Av. João Naves de Ávila, 2121, 38.400-902, Minas Gerais, Brazil. 2 From the Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive N. W., Calgary, Alberta, Canada T2N 1N4. Correspondence to: Denise Guliato, Faculdade de Computação, Universidade Federal de Uberlândia, Av. João Naves de Ávila, 2121, 38.400-902, Minas Gerais, Brazil; e-mail: [email protected] Copyright * 2008 by by Society for Imaging Informatics in Medicine Online publication 23 January 2008 doi: 10.1007/s10278-007-9097-5 149 150 them using a database management system (DBMS) or a repository as the folders of an operating system, designing a layout and developing an intuitive graphical user interface (GUI), developing feature extraction algorithms, access methods, and similarity operators, and finally, defining and implementing similarity query strategies. However, many of these tasks are common to the majority of CBIR systems and could be available in an extendable and configurable application-independent framework. With the aim of providing facilities for the development of CBIR systems, we propose an extension to the architecture of PostgreSQL,27 termed PostgreSQL with Image-handling Extension (PostgreSQL-IE).13 The proposed system is independent of the application, makes use of the powerful tools available with PostgreSQL, while including more flexibility with respect to the creation of new feature descriptors and the definition of new feature vectors, without extra cost. At the current stage of its development, PostgreSQL-IE makes available two similarity search procedures: KNN and range search. Twelve feature extraction procedures are available, with one of them related to color: histogram, and the remainder related to shape factors: Fourier descriptors (FDs), area, perimeter, compactness, spiculation index (SI), concavity index,6 index of spiculation based on the turning angle function, fractal dimension, index of the presence of concavity, index of the presence of convexity, and index of convexity.3,4,28 To illustrate and validate the power of the resources available in PostgreSQL-IE, we have developed a research system that supports contentbased retrieval of mammograms. The research system allows the user to retrieve information from a mammographic database by combining conventional and visual data, using a friendly graphical Web interface. The remainder of this paper is structured as follows: “Review of Related Works on CBIR” presents a review of related works. “PostgreSQL with Image-handling Extension” presents the fundamental concepts related to PostgreSQL-IE, including the data structure of the new image data type and the extension of the database catalog.13 “A Breast Cancer Database Application” provides a description of the dataset used in the examples. “The Extended SQL: SQL-IE” gives an overview of the extended SQL, including a description of the main commands. “SISPRIM—A System Design GULIATO ET AL. for CBIR Applied to Mammograms” describes SISPRIM—Sistema de Pesquisa para Recuperação de Imagens Mamográficas por Conteúdo,29 a research system designed using PostgreSQL-IE that supports CBIR applied to the analysis of mammograms. Finally, “Conclusions” gives the conclusions of this paper and suggestions for future works. REVIEW OF RELATED WORKS ON CBIR To support the development of CBIR systems, commercial extensions are available to relational DBMS (RDBMS) that are able to store images as a special data type and to include strategies for searching by similarity, restricted to KNN and range search.16,17 Examples of such extended RDBMS (XRDBMS) are DB2 with Image Extender,30 Informix with Excalibur Image DataBlade,31 Informix with Image Foundation DataBlade (IFDB),32 and Oracle with interMedia.33 However, these XRDBMS have a private architecture, are expensive, and do not include facilities for extension or modification. In general, these systems possess similar functionalities but support different sets of image data type and include different feature extraction procedures, access methods (if available), and image manipulation commands. With the aim of standardizing the resources and the image data type of XRDBMS, the International Organization for Standardization proposed SQL/ MM Part 5: Still Image. 26,34 The proposed standardization provides a description for an image data type, termed SI_StillImage, that should encapsulate: the binary image, its dimension, format, width, and height, a set of five image manipulation functions that includes scaling, rotation, cropping, color reduction, and sharpening and blurring, and a set of four feature extraction procedures, of which one is based on texture and the remainder are based on color (global color average, local color average, and color histogram). Each feature, derived from the execution of a feature extraction procedure, can be associated with a feature vector data type. A special feature vector data type called SI_FeatureList makes it possible to combine different features, using different weights, in the same feature vector. However, SQL/MM Part 5: Still Image does not cover aspects related to the query image by similarity search and POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL does not provide flexibilities to include new feature extraction procedures. Notwithstanding the efforts, none of the available commercial XRDBMS is completely aligned with the standardization proposed by SQL/MM Part 5: Still Image, except Oracle interMedia—version 10, which includes the SQL/MM Part 5: Still Image module besides its own previously defined functionalities. Oracle interMedia33 supports two image data types, StillImage and ORDImage, and is limited to five feature extraction procedures (global color, local color, texture, structure, and facial). The image data type available in DB2 with Image Extender30 is defined as a userdefined type (UDT), termed DB2Image, and is limited to four feature extraction procedures (color average, color histogram, positional color, and texture). Informix Excalibur Image DataBlade Module31 supports a new image data type, defined as a UDT, called IfdImageDesc, and makes available six feature extraction procedures (color content, color structure, texture, shape, brightness structure, and aspect ratio). IFDB32 defines another image data type, termed IfxImage; the image is stored as a binary large object (BLOB) type attribute, and no feature extraction function is available in this system. The system proposed by Baroni et al.15 does not allow execution of the feature extraction procedures as part of the system and does not include a special data type to store image data. The feature vector is comprised of features previously obtained by the user and is stored in the database as a special data type termed PARTICULATE. The system makes it possible to create new metrics associated with the PARTICULATE data type. The similarity query is carried out by taking into account the metric associated with the feature vector defined by the user and the similarity operators KNN, Range, and Join. Although this system extends the SQL to pose similarity queries, the approach is different from that proposed in the present work. Because PostgreSQL-IE is closer to the commercial RDBMS, we restrict further discussions and comparisons to the related context. The features supported for CBIR in all available commercial XRDBMS are mostly restricted to color and texture. The Informix Excalibur Image DataBlade and Oracle RDBMS go beyond of the standardized features and implement a shape-based feature. However, additional specialized features would be desirable for specific applications.35 151 All of the commercial systems described above are able to store only one image in a given image data type attribute (for short, image attribute) and make it possible to associate only one feature vector with each image attribute. The feature vector may be composed as a weighted combination of all of the features available in the system: No other user configuration is permitted. The systems do not allow the inclusion of new functionalities, new feature extraction procedures, or new access methods. POSTGRESQL WITH IMAGE-HANDLING EXTENSION To develop CBIR systems, it is desirable that the XRDBMS provides flexibility to store images, to define new feature extraction procedures, and to combine the features into vectors in new manners that are not predefined. To address these requirements, we propose PostgreSQL-IE, an extension designed to support similarity queries in a straightforward manner. To manage the new functionalities, PostgreSQL-IE extends the PostgreSQL database catalog and includes the definition of a new image data type, as described in the following sections. The Extended Database Catalog The extended database catalog for PostgreSQLIE is a database, called extended_template1, created using template1 as the template, with a set of 12 internal functions written in the C language, a set of 16 functions written in plpgsql,36 which extends the SQL commands, whose details are described in “The Extended SQL: SQL-IE,” and a set of six tables that manages all of the information related to image handling, whose scheme is shown in Figure 1. The tables pge_extractor, pge_vector, and pge_accessmethod keep track of all new feature extraction procedures, new feature vectors, and new access methods added to the system by authorized users, respectively. The table pge_vector_ extractor relates each feature vector to a respective set of features that compose the vector. Because of this aspect, PostgreSQL-IE makes it possible to combine one or more features in the same feature vector, according to the needs of the application. To 152 GULIATO ET AL. pge_regimage pge_vector_extractor imgid:int imgclass: varchar vector_id: int (FK) ext_id: int (FK) imgsize: bigint imgwidht: bigint imgheight: bigint imgvalue: oid imgformat: varchar tabname: varchar colname:varchar imgclass: varchar pge_vector vector_id: int ext_id: int name: varchar func: varchar path: varchar rtype: varchar param: varchar rnum: int name: varchar pge_tabvector pge_accessmethod ix_id: int vector_id: int (FK) maccess: int (FK) maccess: int ix_name: varchar tabname: varchar colname: varchar pge_extractor name: varchar path: varchar Fig 1. Scheme of the extended database catalog for PostgreSQL-IE. PostgreSQL-IE implements a new image data type as a UDT, termed PGImage, according to the plpgsql command: This is possible by attributing different classes to different images of the same attribute. This novel approach is helpful in developing applications where each image attribute is composed of a set of images such as computed tomography (CT), magnetic resonance imaging (MRI), mammography, or architectural plans. The PGImage data type encapsulates an identifier of the instance of a given image set (imgid) and the classes associated with each image of this image set (imgclass). The identifier imgid is a key of the pge_regimage table, which stores information about each image stored in the database, such as image dimension, width, height, format, the referred table name, the referred column, and the binary image stored as a BLOB. This new image data type includes the capability to store more than one image in the same image attribute, different from all other image data types reviewed in “Review of Related Works on CBIR.” The new data type PGImage allows the modeling of a database application in a simpler way, in comparison to other XRDBMS. For example, consider an application that involves a radiological speed up the answer to a query, an access method can be associated with an available feature vector: The table pge_tabvector manages this relationship. Finally, the table pge_regimage maintains information about all of the images stored in a given database, which is accessed directly from the application table, as described in the following sections. New databases created using extended_template1 as the template inherit all the resources to manage and handle CBIR operations. A New Image Data Type POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL electronic report that includes a short clinical history of each patient and her set of related exams obtained at different instants of time. Each exam set can be composed of X-ray, CT, and MRI images (the number of slices in CT and MRI may vary between patients). Figure 2 illustrates the database scheme using DB2 and PostgreSQL-IE XRDBMS. Because PGImage allows the storage of more than one image in the same attribute, the application model using PostgreSQL-IE is simpler and can be modeled with fewer tables, as compared to DB2. By modeling the database using PostgreSQL-IE, the visual information related to all exams carried out for a patient at a given instant of time can be combined in the same similarity query. This is not possible using any commercial XRDBMS available at present. 153 images: a mammographic image whose class is named mammography and the lesion contour whose name is lesion_contour, drawn by an expert radiologist. Figure 3 shows an instance for the patient_ case table and its respective pge_regimage table with the aim of illustrating the management for the PGImage data type. THE EXTENDED SQL: SQL-IE SQL-IE is composed of a set of 16 new functions that includes commands to create new feature extraction procedures, new feature vectors as combinations of previously defined features, and new access methods. The details about the syntax of the SLQ-IE commands using the functions to be described in the following sections can be found in Appendix A. A BREAST CANCER DATABASE APPLICATION Data Definition Functions To illustrate the use of resources available in PostgreSQL-IE and the commands for the extended SQL, to be presented in the following sections, we will use a breast cancer database application. The mammographic database, called mammo_ database, is modeled by two tables: patient that contains information about the patients and patient_ case that contains information about the mammographic exams carried out on the patients at a given instant of time. Each mammogram is associated with the contour of the lesion, drawn by an expert radiologist. The command creates the patient case table. The attribute c_mammo is a PGImage data type and stores two different In this section, we describe three of the six functions available in SQL-IE for data manipulation. The others three functions termed Delete_ Extractor, Delete_Feature_Vector, and Delete_ Accessmethod are used for maintenance of the database catalog.  Create_Extractor —This function makes it possible to include a new feature in PostgreSQLIE, by inserting specific information about the feature extraction procedure in the pge_extractor table and by creating internal control functions, which are able to access the function in the appropriate library. 154 GULIATO ET AL. a b patient patient patient_case p_id:int p_gender:char p_bday:date ct_images c_id:int c_patient:int(FK) p_id:int c_exam:date c_diagnosis:varchar c_ct_id:int c_mri_id:int p_gender:char p_bday:date patient_case t_id:int t_ct_id:int c_id:int c_patient:int(FK) mri_images t_slice_number:int t_slice_ct:DB2Image r_id:int r_mri_id:int(FK) r_slice_number:int r_slice_mri:DB2Image c_exam:date c_diagnosis:varchar c_images:PGImage Fig 2. Two possible database schemes: a for DB2 with Image Extender, b for PostgreSQL with Image-handling Extension. To include a new feature, it is required to convert the corresponding function’s code to the library format dll (dynamic linked library) or so (shared object), depending on the operating system, and to store it in the lib directory of PostgreSQL. At present, PostgreSQL-IE makes available four different libraries in dll and so: ExtShape with six feature extraction procedures proposed by Rangayyan et al.37 (SI, FDs, fractional a patient_case table c_id c_patient c_examdate c_tumor c_mammo 1 1 2000-04-05 B (1, "mammography, lesion_contour") 2 2 2004-08-01 B (2, "mammography, lesion_contour") 3 3 2004-10-10 M (3, "mammography, lesion_contour") b pge_regimage table imgid imagclass imgsize imgwidth imgheight imgformat imgvalue tabname colname 1 lesion_contour 4194304 2048 2048 bnd 140338 patient_case c_mammo 1 mammography 4194304 2048 2048 bmp 140337 patient_case c_mammo 2 lesion_contour 1048576 1024 1024 bnd 140346 patient_case c_mammo 2 mammography 1048576 1024 1024 bmp 140345 patient_case c_mammo 3 lesion_contour 1048576 1024 1024 bnd 140352 patient_case c_mammo 3 mammography 1048576 1024 1024 bmp 140351 patient_case c_mammo Fig. 3. a An instance of the patient case relation scheme. b The corresponding pge regimage table. POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL 155 concavity, perimeter, area, and compactness), ExtColor with a feature extraction procedure based on the global histogram, ExtShapePolyModelTAF with five feature extraction procedures,2,28 and ExtShapeTAFModeling with five feature extraction procedures.3,4 The last two libraries include the same set of feature extraction procedures, derived from the turning angle function of a given contour: index of spiculation, fractal dimension, presence of concavity, presence of convexity, and convexity index; they differ from each other with respect to the contour approximation approach used to remove noise and artifacts from the original contour. The commands given below illustrate the use of the Create_ Extractor function to include new feature extraction procedures in PostgreSQL-IE. In the example, we use the functions getFourierDescriptors, getSpiculationIndex, and getHistogram available in the ExtShape and ExtColor libraries.  Create_AccessMethod—This function allows to create a new access method. The similarity operators are functions defined within the access method, with a specific structure, as shown in Appendix A1. The table pge_ accessmethod of the extended database catalog is updated to support the new access method. At present, PostgreSQL-IE does not include any multidimensional access method. The queries are carried out by scanning all the tuples in a specified table. This procedure is referred to in this paper as the StandardAccess-Method (Scan) and includes KNN search and range search. The similarity operators allow to associate weights with each feature of a feature vector being used in the query. To carry out a KNN or range search, the difference between the normalized weighted sum of the feature vector associated with the reference image and all the images in the referenced database table are computed. The images are ranked in in-  Define_Feature_Vector—This function defines a new feature vector as a combination of features derived from the feature extraction procedures previously created. The tables pge_ vector and pge_vector_extractor of the extended database catalog keep information about all feature vectors that have already been defined and about their elements, respectively. The following commands illustrate the definition of three different feature descriptors: Shape_SI that is composed of SI, Shape_SI_FD that is composed of SI and FDs, and finally, Hist_SI that is composed of SI and histogram feature extractors. Note that the last feature vector mentioned above combines features of images from different classes (lesion_contour and mammography in our example). creasing order of the resulting distances, and the similarity operations are appropriately processed. Data Manipulation Functions In this section, we present the main functions to manipulate data in the XRDBMS. Other data manipulation functions include Width_IE, Show_Image, UnSet_Feature_Vector, Show_Extractor, Update_ Image_Attr, Replace_Image_Attr, Height_IE, and Value_Extractor.  Insert_Image—This function is used as part of the conventional SQL Insert command. The 156 GULIATO ET AL. Insert_Image function allows the user to insert images into an image attribute. If there are one or more feature vectors already associated with the image attribute of the table being updated, the database system launches the execution of all of the feature extractors related to the feature vectors. Additional information such as the width, height, and format of the images are also saved. The command are executed for these instances. To avoid allocating extra storage space and to avoid repeated calls to execute an extractor function for the same attribute, PostgreSQL-IE reuses the results of the feature extractors that have already been executed before. An internal table named with the result of the concatenation of the referred database table name and the attribute nameis dynamically created, with only one column for each  illustrates the insertion of one tuple into the table patient_case. The pge_regimage table of the extended database catalog is updated to reflect this insertion (see Fig. 3a, an instance for the patient case table).  Set_Feature_Vector—This function associates an image attribute of a given database table with a previously defined feature vector and creates an index structure if an access method is explicitly specified in the command (if not, a scan of all the tuples in the database is executed). More than one feature vector can be associated with the same image attribute of a table, as shown below. feature extractor. Note that although the SI and Fourier feature extractors comprise two different feature vectors (Index SI and Index SI FD), they appear only once in the internal table. The proposed extended SQL (SQL-IE) supports similarity query in a simple way. The similarity operators are implemented as functions enclosed within the access method and are used as subqu-  Once the feature vectors have been associated with a given attribute with instances already inserted, all of the feature extraction functions associated with the feature vectors eries of the WHERE clause of a conventional SQL command. The answer to the query is ranked in increasing order, according to the similarity of the objects in the database and an object given by the SQL-IE Commands to Search an Image Database by Similarity POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL 157 reference. The follow examples illustrate queries using PostgreSQL-IE. of mammograms for computer-aided diagnosis of breast cancer. The system is composed of a Because SQL-IE allows the searching of the database by combining similarity operators connected by logical operators, as shown in example 2 above, the explicit statement of a score name in the Score IE function, in the SELECT clause, makes it possible to rank the answer to the query according to the score given by the specified similarity operator. mammographic database modeled using PostgreSQL-IE, a research engine developed in PHP, and a Web GUI. SISPRIM—A SYSTEM DESIGN FOR CBIR APPLIED TO MAMMOGRAMS This section presents SISPRIM,13,29 a research system that supports CBIR, applied to the analysis The Mammographic Database The mammographic database was modeled using PostgreSQL-IE and stores, for each patient, her historical clinical information, some relevant aspects of her life style, and the mammographic exams performed at different instants of time.38 For each mammographic exam, the following are stored: the four standard views (two views of each breast: cranio-caudal and medio-lateral oblique or 158 MLO), complementary exams, information about the presence of abnormalities, such as architectural distortion, asymmetric density, palpable lump, and calcifications, and the diagnosis, as applicable, according to the Breast Imaging—Reporting and Data System (BI-RADS) classification system.39 The mammographic database associates, with each mammographic exam, the report given by the radiologist and links each mammogram of each exam with the contour of the breast, the boundary of the pectoral muscle (MLO views only), the contours of masses (if present), the regions of clusters of calcifications (if present), the number of calcifications (if present), and the locations and details of any other features of interest. The contours of masses and regions of clusters of calcifications may be drawn and entered into the system by an authorized expert radiologist. In the current stage of development, the mammographic database of SISPRIM is composed of contours and regions of interest (ROIs) of breast masses obtained in two preceding studies; one set of images was derived from mammograms of 20 cases obtained from Screen Test: the Alberta Program for the Early Detection of Breast Cancer.14,40,41 The mammograms were digitized using the Lumiscan 85 scanner at a resolution of 50μm with 12b/pixel. The set includes 57 ROIs, of which 37 are related to benign masses and 20 are related to malignant tumors. Another set of images was obtained from the Mammographic Image Analysis Society (MIAS, UK) database42 and the teaching library of the Foothills Hospital (Calgary).43 The MIAS images were digitized at a resolution of 50μm; the Foothills Hospital images were digitized at a resolution of 62μm. This set includes smooth, lobulated, and spiculated contours in both the benign (28) and malignant (26) categories. The contour of each lesion was manually drawn by an expert radiologist specialized in mammography. The combined dataset has 111 contours, including both typical and atypical shapes of benign masses (65) and malignant tumors (46). The development of an additional mammographic database is in progress: The new database is modeled to contain cases from Digital Database for Screening Mammography44 with images in a well-supported format. The radiological findings of each case stored in the database are being extended by expert radiologists GULIATO ET AL. at the Clinical Hospital of Federal University of Uberlândia, Brazil. The Research Engine The research engine is the heart of the research system: It manages the actions of the user and accesses the mammographic database as required to answer queries. The proposed research system can answer queries such as return five images that are similar to the given reference image and the patient takes antidepressive medication or return the identification of the patient and the diagnosis associated with each one of ten images similar to the reference image, and the density of parenchyma is heterogeneous, and the age at menarche is 12 years. To handle queries as above, the research engine requires that each mammogram in the database be associated with a set of features combined into one or more feature vectors and that the data manipulation language possesses resources to combine visual and conventional information in the condition clause of the query. As the mammographic database used by SISPRIM was modeled using PostgreSQL-IE, all requirements to support CBIR are available. To facilitate easy access to the information stored in the mammographic database and to access the resources available in the XRDBMS, the research engine manages a Web GUI that incorporates facilities to support two kinds of users: administrators and researchers. The interface for an administrator guides the user in the creation of a new feature extractor and/or a new feature vector and in the association of a given feature vector with an image attribute, including an access method (optional), as shown in Figure 4. The interface for a researcher, shown in Figure 5, guides the user through the resources previously configured by an administrator. A researcher can interactively configure a query, combining conventional and visual data, as desired. Figure 6 shows the results of the query as posed in Figure 5. The results of the queries indicate the capabilities of the proposed system in CBIR: The specific query illustrated shows CBIR based upon the shape of masses with a condition on the age of the patient. Further work is in progress to include into SISPRIM strategies to improve the retrieval performance using relevance feedback.45–47 POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL 159 Fig 4. The administrator–user interface to configure the research system. CONCLUSIONS We have presented two important contributions. The first one is PostgreSQL with Image-handling Extension, an XRDBMS that supports the development of CBIR systems in a flexible manner. The system is open source, portable, extendable, easy to be installed, and available via the Web for Windows and Linux operating systems. At present, the system includes 12 different feature extraction procedures organized in four libraries. Detailed discussions on the accuracy of the features used in the present paper can be found in our previous works.2–4,6,28 PostgreSQL-IE supports a new image data type attribute termed PGImage, which permits the user to model a relational scheme by storing various images of different classes in the same attribute. Because of this image attribute, the proposed system makes it possible to design the conceptual model for a given database application with fewer relation schemes than those required in currently available commercial RDBMS. The new image data type also makes it possible to combine features derived from different images, stored in the same image attribute, in the same feature vector. To date, PostgreSQL-IE makes available two conventional similarity operators (KNN and range); further studies are being conducted to extend PostgreSQL-IE with two new similarity search procedures based on fuzzy sets to take into 160 GULIATO ET AL. Fig 5. The researcher graphical Web interface to define a query. account the uncertainty present in medical applications of CBIR. New feature extraction procedures and new access methods can be easily included in the system. The second contribution is SISPRIM, a research system designed by using PostgreSQL-IE. The research system accesses a mammographic database that includes complete information about the patient (clinical history and life style), her mammographic exams, complementary exams, and the report of each exam associated with the radiological findings of each mammogram, according to the BI-RADS classification system. The information about the clinical history and lifestyle includes details such as the use of alcohol, tobacco, antidepressive medication, and hormone replacement medication, age at menarche, age at first pregnancy, and menopausal status. The radiological findings include asymmetric density, architectural distortion, parenchymal type, and descriptions of the characteristics of masses and calcifications (as applicable). With information as above, it is possible to compute statistical measures and correlation coefficients to investigate and derive relationships between the incidence of breast cancer and the lifestyle of the patient. The system also permits temporal analysis of the evolution of the breast, which is useful in un- POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL 161 Fig 6. The results of the search as defined in Figure 5. derstanding the natural changes of the breast and interval cancer. SISPRIM makes available a graphical Web interface that helps the user to configure and execute queries. To illustrate the power and facilities of SISPRIM and PostgreSQL-IE, we have modeled a mammographic database with 111 images and carried out queries using the KNN operator. The development of an additional mammographic database is in progress. The relevance of the results obtained by the proposed system depends on the feature extractors and the feature vectors used. To improve the performance of the CBIR system, strategies of relevance feedback that take into account the perception of the user are being developed. SISPRIM is part of An Indexed Atlas of Digital Mammograms38,48,49 and can be accessed at www. lcc.ufu.br/amdi. The script to install PostgreSQL-IE is available at www.lcc.ufu.br/pdi/downloads. ACKNOWLEDGMENT This work was supported by the Conselho Nacional Desenvolvimento Científico e Tecnológico, Brazil, Universidade Federal de Uberlândia, Brazil, and Research Services, University of Calgary, Canada. 162 APPENDIX A A1—Syntax for SQL-IE Data Definition Functions In this Appendix, we present the detailed syntax of the definition functions used in “Data Definition Functions.”  The Create_Extractor function  The Define_Feature_Vector function  The Create_AccessMethod function—The access method must be developed in the C programming language, converted to library format (dll or so), and has to include the following functions: The input parameters for the similarity operators are: the score name, the value of the neighborhood for KNN and the value of ratio for the RANGE operators, the name of the index structure, the image class, and the file path of the reference image for the similarity query. GULIATO ET AL. POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL 163 A2—Syntax for SQL-IE Manipulation Commands In this Appendix, we present the detailed syntax of the manipulation functions used in “Data Manipulation Functions.”  The Insert_Image function  The Set_Feature_Vector function REFERENCES 1. Datta R, Li J, Wang JZ: Content-based image retrieval— approaches and trends of the new age. In: Proceedings of the ACM International Workshop on Multimedia Information Retrieval, ACM Multimedia, Singapore, November, 2005, pp 253–261 2. Guliato D, Rangayyan RM, Carvalho JD, Santiago SA: Spiculation-preserving polygonal modeling of contours of breast tumors. In: Proceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, New York City, NY, September, 2006, pp 2791–2794 3. Rangayyan RM, Guliato D, Carvalho JD, Santiago SA: Feature extraction from the turning angle function for the classification of breast tumors. In Proceedings of the International Special Topics Conference on Information Technology in Biomedicine—IEEE ITAB2006, Ioannina, Greece, October, 2006 (6 pages on CDROM) 4. Carvalho JD, Rangayyan RM, Guliato D, Santiago SA: Polygonal modeling of contours using the turning angle function. In 20th IEEE Canadian Conference on Electrical and Computer Engineering, Vancouver, BC, April, 2007, pp 1090– 1267 5. Chen Y, Wang JZ: A region-based fuzzy feature matching approach to content-based image retrieval. IEEE Trans Pattern Anal Mach Intell 24(9):1252–1267, 2002 6. Mudigonda NR, Rangayyan RM, Desautels JEL: Detection of breast masses in mammograms by density slicing and texture flow-field analysis. IEEE Trans Med Imag 20(12):1215– 1227, 2001 7. Rangayyan RM, Nguyen TM: Fractal analysis of contours of breast masses in mammograms. J Digit Imaging 20(3):223– 237, 2007 (September) 8. Veltkamp RC, Tanase M: Content-based Image and Video Retrieval, Norwell, MA: Kluwer, 2002 9. Csillaghy A, Hinterberger H, Benz AO: Content-based image retrieval in astronomy. Inf Retr 3(3):229–241, 2000 10. Painter TH, Dozier J, Roberts DA, Davis RE, Green RO: Retrieval of subpixel snowcovered area and grain size from imaging spectrometer data. Remote Sens Environ 85(1):64–77, 2003 11. Schroder M, Rehrauer H, Seidel K, Datcu M: Interactive learning and probabilistic retrieval in remote sensing images archives. IEEE Trans Geosci Remote Sens 38(5):2288–2298, 2000 12. Wang JZ, Li J, Wiederhold J: SIMPLIcity: semantics sensitive integrated matching for picture libraries. IEEE Trans Pattern Anal Mach Intell 23(9):947–963, 2001 13. Guliato D, Rangayyan RM, Melo EV, Soares RC: A system for content-based image retrieval and analysis of mammograms using PostgreSQL with image-handling extension. In: Proceedings of the Fifth IASTED International Conference on Biomedical Engineering, Innsbruck, Austria, February, 2007, pp 402–408 164 14. Alto H, Rangayyan RM, Paranjape RB, Desautels JEL, Bryant H: An indexed atlas of digital mammograms for computer-aided diagnosis of breast cancer. Ann Telecommun 58(5–6):820–835, 2003 15. Baroni MCN, Rezende HL, Traina-Jr C, and Traina AJM: Queryng complex objects by similarity in SQL*. In: Proceedings of Brazilian Symposium on Databases—SBBD2005, Uberlândia, MG, Brazil, October–November, 2005, pp 1–5 16. Böhm C, Berchtold S, Keim DA: Searching in highdimensional spaces—index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322– 373, 2001 17. Chávez E, Navarro G, Baeza-Yates R, Marroquím J: Searching in metric space. ACM Comput Surv 33(3):273–321, 2001 18. Ciaccia P, Patella M: M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of International Conference on Very Large Data Bases (VLDB), Athens, Greece, 1997, pp 426–435 19. Traina Jr, C, Traina AJM, Faloutsos C, Seeger B: Fast indexing and visualization of metric datasets using slim-trees. IEEE Trans Knowl Data Eng 14(2):244–260, 2002 20. Digout C, Nascimento M, Coman A: Similarity search and dimensionality reduction: not all dimensions are equally useful. In: Proceedings of Database Systems for Advanced Applications—9th International Conference, DASFAA 2004, Jeju Island, Korea, March 17–19. Springer, Berlin, Germany, 2004 (also in Lect Notes Comput Sci 2973:831–842) 21. Kailing K, Kriegel HP, Schönauer S, Seidl T: Efficient similarity search for hierarquical data in large databases. In: Proceedings of Advances in Database Technology—EDBT 2004—9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, March. Springer, Berlin, Germany, 2004 (also in Lect Notes Comput Sci 2992:676–693) 22. Vieira MR, Chino F, Traina-Jr, C, Traina AJM: Dbmtree: a metric access method sensitive to local density data. In: Proceedings of Brazilian Symposium on Databases— SBBD2004, Brasilia, DF, Brazil, October, 2004, pp 163–177 23. Carey MJ, Kossmann D: On saying enough already in SQL. In: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, AZ, May, 1997, pp 219–230 24. Carey MJ, Kossmann D: Reducing the braking distance of an SQL query engine. In: Proceedings of the Conf. on Very Large Data Bases (VLDB), New York City, NY. VLDB Endowment, Saratoga, CA, 1998, pp 158–169 25. Gao L, Wang M, Wang XS, Padmanabhan S: Expressing and optimizing similarity queries in SQL. In: Proceedings of Conceptual Modeling—ER—23rd International Conference on Conceptual Modeling, Shanghai, China, November. Springer, Berlin, Germany, 2004 (also in Lect Notes Comput Sci 3288:464–478) 26. Melton J, Eisenberg A: SQL multimedia and application packages (SQL/MM). ACM SIGMOD Record 30(4):97–102, 2001 (December) 27. The POSTGRES Group: The POSTGRES Reference Manual, Berkeley, CA: Computer Science Division, University of California, 1993 (January) 28. Guliato D, Rangayyan RM, Carvalho JD, Santiago SA: Polygonal modeling of contours of breast tumors with the preservation of spicules. IEEE Trans Biomed Eng 55:14–20, 2008 GULIATO ET AL. 29. Melo EV, Guliato D, Rangayyan RM, Soares RS: SISPRIM —Sistema De Pesquisa Com Suporte Para Recupera, cão de imagens por conte’udo. In: Proceedings of WIM2006 - VI Workshop de Inform’atica M’edica, Vila Velha, ES, Brazil, June, 2006. 30. IBM: DB2 Universal Database Image, Audio, and Video Extenders Administration and Programming. 2000. http://www306.ibm.com/software/data/db2/extenders/index.html 31. Informix: Excalibur Image Datablade Module, Users Guide. 2000. http://informix.com.ua/answers/english/alpha.htm 32. Informix: Informix Image Foundation DataBlade Module, Users Guide, Version 2.0. 2000 (December) 33. Oracle: Oracle8i interMedia Audio, Image, and Video— Users Guide and Reference. 2005. http://download.oracle.com/ docs/pdf/A67296_01.pdf 34. ISO: ISO/IEC IS 13249-5:2001 SQL/MM, Information Technology Database Languages SQL Multimedia and Application Packages Part 5: Still Image. 2001. 35. Stolze K: Still image extensions in database systems—a product overview. In: Datenbank-Spektrum, 2002, pp 40–47 (February) 36. The POSTGRES Group. PostgreSQL 8.0.0 Documentation. 2005 37. Rangayyan RM, El-Faramawy NM, Desautels JEL, Alim OA: Measures of acutance and shape for classification of breast tumors. IEEE Trans Med Imag 16(6):799–810, 1997 38. Guliato D, Bôaventura RS, Melo EV, Rangayyan RM: AMDI: an indexed atlas of digital mammograms that integrates case studies, e-learning, and research systems via the web. In: Suri JS, Rangayyan RM Eds. Recent Advances in Breast Imaging, Mammography, and Computer-aided Diagnosis of Breast Cancer. Bellingham, WA: SPIE, 2006, pp. 529–555 39. American College of Radiology: Breast Imaging Reporting and Data System BI-RADS, 4th edition. Reston, VA: American College of Radiology, 2004 40. Screen Test: Alberta Program for the Early Detection of Breast Cancer—2001/03 Biennial Report. 2004. http://www. cancerboard.ab.ca/screentest 41. Alto H, Rangayyan RM, Desautels JEL: Content-based retrieval and analysis of mammographic masses. J Electron Imaging 14(2):023016, 2005 42. The Mammographic Image Analysis Society digital mammogram database. http://peipa.essex.ac.uk/info/mias.html, accessed October, 2006 43. Rangayyan RM, Mudigonda NR, Desautels JEL: Boundary modelling and shape analysis methods for classification of mammographic masses. Med Biol Eng Comput 38:487–496, 2000 44. Digital Database for Screening Mammography. http:// marathon.csee.usf.edu/Mammography/Database.html, accessed June, 2007. 45. Rui Y, Yang TS, Mehrotra S: Content-based Image Retrieval with Relevance Feedback in Mars. In: IEEE International Conference in Image Processing, volume 2, Santa Barbara, CA, 1997, pp 815–818 46. Rui Y, Yang TS, Ortega M, Mehrotra S: Relevance feedback: a powerful tool in interactivecontent-based image retrieval. IEEE Trans Circuits Syst Video Technol 8(5):644– 655, 1998 (September) POSTGRESQL-IE: AN IMAGE-HANDLING EXTENSION FOR POSTGRESQL 47. Triana AJM, Marques J, Traina Jr C. Fighting the semantic gap on CBIR system through new relevance feedback techniques. In: Proceedings of 19th IEEE Symposium on Computer-based Medical Systems (CBMS’06), 2006, pp 881–886 48. Guliato D, de Melo EV, Bôaventura RS, Janones FR, de Deus V, Rangayyan RM: AMDI: an atlas to integrate case studies, e-learning, and research systems via the Web. In: 165 Proceedings of the IASTED International Conference on Telehealth. Banff, AB, Canada, 2005, pp 69–74 49. Guliato D, Caetano M, Janones FR, de Deus V, Lima SC, Rangayyan RM, Bôaventura RS, and Marques PMA. AMDI: An indexed atlas of digital mammograms availablevia the Web. In: III Latin American Congress on Biomedical Engineering, IFMBE Proceedings, 5, 2004 (4 pages on CDROM)