NAME UMLS::Interface README SYNOPSIS This package provides a Perl interface to the Unified Medical Language System (UMLS). The UMLS is a knowledge representation framework encoded designed to support broad scope biomedical research queries. There exists three major sources in the UMLS. The Metathesaurus which is a taxonomy of medical concepts, the Semantic Network which categorizes concepts in the Metathesaurus, and the SPECIALIST Lexicon which contains a list of biomedical and general English terms used in the biomedical domain. The UMLS-Interface package is set up to access the Metathesaurus and the Semantic Network present in a MySQL database. CONFIGURATION UMLS-Interface allows information to be extracted from the UMLS given a specified set of sources and relations through the use of a configuration file. The format of the configuration file is as follows: SAB :: include FMA, MSH REL :: include PAR, CHD, RB, RN where SAB refers to the sources and REL refers to the relations. Another example can be found in the configuration file in the utils/ directory. You can specify a single source, multiple sources or the entire UMLS (using the UMLS_ALL option). Keep in mind that the greater the number of sources the larger the search space so if you obtaining path information about two concepts this will take longer. The names of the sources in the configuration file are expected to be in the SAB (sourse abbreviation) form. A listing of the sources and their SABs can be found: You can specify any relations that exist in the specified set of sources that you defined. The directional (hierarchical) relations though are PAR/CHD and RB/RN. The other relations (such as RO and SIB) are not directional which means when obtaining path information when using these relations may take much longer than obtaining path information using the directional relations. A listing of the different relations can be found here (scroll down to the REL table): If you do plan on using a multiple sources or the entire UMLS, we would advise you to use the --realtime option which is explained below, in the Interface.pm documentation and the path programs in the utils/ directory. INSTALL To install the module, run the following magic commands: perl Makefile.PL make make test make install This will install the module in the standard location. You will, most probably, require root privileges to install in standard system directories. To install in a non-standard directory, specify a prefix during the 'perl Makefile.PL' stage as: perl Makefile.PL PREFIX=/home/programs It is possible to modify other parameters during installation. The details of these can be found in the ExtUtils::MakeMaker documentation. However, it is highly recommended not messing around with other parameters, unless you know what you're doing. DATABASE SETUP The interface assumes that the UMLS is present as a mysql database. The names of these databases can be passed as configuration options at initialization. However, if the names of the database is not provided at initialization, then default values are used -- the database for the UMLS is called 'umls'. The UMLS database must contain six tables: 1. MRREL 2. MRCONSO 3. MRSAB 4. MRDOC 5. MRDEF 6. SRDEF 7. MRSTY All other tables in the databases will be ignored, and any of these tables missing would raise an error. The mysql server can be on the same machine as the module or could be on a remotely accessible machine. The location of the server can be provided during initialization of the module. INITIALIZING THE MODULE To create an instance of the interface object, using default values for all configuration options: use UMLS::Interface; my $interface = UMLS::Interface->new(); The database onfiguration options can be included in the MySQL my.cnf file. This is preferable. The directions for this are in the INSTALL file. It is Stage 5 Step D. The following configuration options are also provided though: 'driver' -> Default value 'mysql'. This option specifies the Perl DBD driver that should be used to access the database. This implies that the some other DBMS system (such as PostgresSQL) could also be used, as long as there exist Perl DBD drivers to access the database. 'umls' -> Default value 'umls'. This option specifies the name of the UMLS database. 'hostname' -> Default value 'localhost'. The name or the IP address of the machine on which the database server is running. 'socket' -> Default value '/tmp/mysql.sock'. The socket on which the database server is using. 'port' -> The port number on which the database server accepts connections. 'username' -> Username to use to connect to the database server. If not provided, the module attempts to connect as an anonymous user. 'password' -> Password for access to the database server. If not provided, the module attempts to access the server without a password. 'forcerun' -> This parameter will bypass any command prompts such as asking if you would like to continue with the index creation. 'realtime' -> This parameter will not create a database of path information (what we refer to as the index) but obtain the path information about a concept on the fly 'cuilist' -> This parameter contains a file containing a list of CUIs in which the path information should be store for - if the CUI isn't on the list the path information for that CUI will not be stored 'verbose' -> This parameter will print out the table information to a config file in the UMLSINTERFACECONFIG directory USING THE MODULE Once the object of module is successfully created after following the steps described in the previous section, a number of methods can be called upon this object: getError() -- Returns the error code and error string rom the last method call on the object. root() -- Returns the concept ID of the root of the tree. depth() -- Returns the depth of the tree. version() -- Return the version of UMLS. exists() -- Determines if a CUI exists validCui() -- Checks if CUI is a valid concept getSab() -- Returns the list of sources the concept exists in getConceptList() -- Returns the list of all concept IDs for the term in a specified set of sources. getTermsList() -- Returns the list of terms and their sources given a particular concept ID getAllTerms() -- Returns the list of terms corresponding to a particular concept ID for all sources getParents() -- Returns the parent of a given CUI getChildren() -- Returns the children of a given CUI getRelated() -- Returns the relations of a given CUI and relation getRelations -- Returns all of the relations associated with a specific CUI in a given source pathsToRoot() -- Returns a list of concept IDs that denote the path from the input concept ID to the root concept of the taxonomy. findShortestPath() -- Returns the shortest path between two CUIs findLeastCommonSubsumer() -- Returns the least common subsumer between two CUIs getCuiDef() -- Returns the definition(s) of a given CUI dropTable() -- Drops the temporary table created by the UMLS-Interface module of path information for a specified set of sources findMinimumDepth() -- Returns the minimum depth of a given CUI in the current view of the UMLS findMaximumDepth() -- Returns the maximum depth of a given CUI in the current view of the UMLS getSts() -- Returns the TUI(s) of the semantic type(s) associated with a given CUI getStAbr() -- Returns the abbreviation of a semantic type given its cooresponding TUI getStString() -- Returns the name of the semantic type given its cooresponding abbreviation getStDef() -- Returns the definition of a semantic type given its cooresponding abbreviation checkConceptExists() -- Returns true or false (1 or 0) if a concept exists given the current view of the UMLS returnTableNames() -- Returns the table names in human and hex form created by the package for a given configuration getIC() -- Returns the information content of a CUI getFreq() -- Returns the propogation count of a CUI These methods essentially expose an interface as required by the UMLS::Similarity modules. The UMLS::Similarity modules require that any interface to a taxonomy provide the above methods. REFERENCING If you write a paper that has used UMLS-Interface in some way, we'd certainly be grateful if you sent us a copy and referenced UMLS-Interface. We have a published paper that provides a suitable reference: @inproceedings{McInnesPP09, title={{UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity}}, author={McInnes, B.T. and Pedersen, T. and Pakhomov, S.V.}, booktitle={Proceedings of the American Medical Informatics Association (AMIA) Symposium}, year={2009}, month={November}, address={San Fransico, CA} } This paper is also found in or CONTACT US If you have any trouble installing and using UMLS-Interface, please contact us via the users mailing list : umls-similarity@yahoogroups.com You can join this group by going to: You may also contact us directly if you prefer : Bridget T. McInnes: bthomson at cs.umn.edu Ted Pedersen : tpederse at d.umn.edu SOFTWARE COPYRIGHT AND LICENSE Copyright (C) 2004-2009 Bridget T McInnes, Siddharth Patwardhan, Serguei Pakhomov and Ted Pedersen This suite of programs is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Note: The text of the GNU General Public License is provided in the file 'GPL.txt' that you should have received with this distribution.