README.zxid ########### <> <> <> <> 1 Who needs this? ================= ZXID project has currently (Sept 2006) four main outputs libzxid:: A C library for supporting SAML 2.0, including federated Single Sign-On zxid:: A C program that implements a SAML Service Provider (SP) as a CGI script Net::SAML:: A Perl module wrapping libzxid. Also zxid.pl, that implements SP in mod_perl environment, is supplied. php_zxid:: A PHP extension that wraps libzxid. Also supplied: zxid.php that implements SP in mod_php environment. *You need this if you are* Web Master:: You want to enable SAML based Single Sign-On (SSO) to your web site. In this case you would use the zxid SP CGI script directly, only configuring it slightly. Or you can hint your PHP or perl developer that this functionality is available and your want it. Perl Developer:: You can use the Net::SAML module to integrate SSO to your application and web site. Given the direct perl, support this is easier than fully understanding the C interface. Both mod_perl and perl as CGI are supported. PHP Developer:: You can use ~dl("php_zxid.so")~ to load the module and access the high level functionality, such as SAML 2.0 SSO. We support functionality roughly equivalent to perl Net::SAML. The PHP module is fully ready to use for SSO, but we expect to add a lot more, such as WSC, in future. Both mod_php5 and php as CGI are supported. php4 should also work. Web Developer:: You want to integrate SAML based SSO to your web site tool or product so that your customers can enjoy SSO enabled web sites. In this case you would study zxid.c for examples and use libzxid.a to implement the functionality in your own program. Identity Management hacker:: You need some building blocks: you will study libzxid and add to it, contributing to the project. ZXID Project has vastly more ambitious goals. See the ZXID Project chapter later in this document. 2 Installing ============ If you want to try ZXID out immediately, we recommend compiling the library and examples and installing one of the examples as a CGI script in an existing web server. See later chapters for more details. tar xvzf zxid-0.7.tgz cd zxid-0.7 # N.B. There is no configure script. The Makefile works for all # supported platforms as is. # N.B2: We distribute some generated files. If they are missing you need # to regenrate them do make cleaner; make dep ENA_GEN=1 make make samlmod # optional make samlmod_install # optional: install Net::SAML perl module make phpzxid # optional make phpzxid_install # optional: install php_zxid.so PHP extension cp zxid / # configure your web server to recognize zxid a CGI, e.g. mini_httpd -p 8443 -c zxid -S -E zxid.pem # Edit your /etc/hosts to contain 127.0.0.1 localhost sp1.zxidcommon.org sp1.zxidsp.org # Point your browser to https://sp1.zxidsp.org:8443/zxid?o=E https://sp1.zxidsp.org:8443/zxid.pl?o=E # Perl version # Find an IdP to test with and configure it... 2.1 Prerequisites ----------------- This software depends on following packages: 1. zlib from zlib.net. Generally whatever comes with your distro is sufficient. 2. openssl-0.9.8c or later. See www.openssl.org. Generally openssl libraries distributed with most Linux distros are sufficient.<> 3. libcurl from http://curl.haxx.se/. I used version 7.15.5, but probably whatever ships with your distribution is fine. libcurl is needed for SOAP bindings and for fetching metadata. It needs to be compiled to support HTTPS.<> 4. HTTPS capable web server. For most trivial testing CGI support is needed. We recommend mini_httpd(8) available from http://www.acme.com/software/mini_httpd/ Following additional packages are needed by developers who wish to build from scratch, including the code generation (the standard distribution includes the output of the code generation, so most people do not need these). a. gperf from gnu.org (only for build process when generating code) b. swig from swig.org (only for build process and only if you want scripting interfaces) c. perl from cpan.org (only for build process and only if you want to generate code from .sg) d. plaindoc from http://mercnet.pt/plaindoc/pd.html (only for build process, for code generation from .sg, and for documentation) Although technically not needed to build zxid, you will need an IdP to test against. We do not, at the time, supply one so you will need to find a third party, perhaps a free download of one of the commercial ones like http://symlabs.com/Products/SFIAM.html. 2.2 Canned Tutorial: Running ZXID as CGI under mini_httpd --------------------------------------------------------- While zxid will run easily under Apache httpd (see <>), for sake of simplicity we first illustrate running it with mini_httpd(8), a very simple SSL capable web server by Jef Poskanzer. 2.2.1 Getting and installing mini_httpd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can download the source for mini_httpd from http://www.acme.com/software/mini_httpd/ You should already have installed OpenSSL, or quite probably OpenSSL shipped with your distribution. If it is not located at /usr/local/ssl, the you need to edit the mini_httpd ~Makefile~ to indicate where it is. At any rate you need to uncomment all lines that start by SSL_ in the ~Makefile~. Then say make Now copy the mini_httpd binary somewhere in your path. 2.2.2 Running mini_httpd ~~~~~~~~~~~~~~~~~~~~~~~~ After building zxid, in zxid directory, run mini_httpd -p 8443 -c zxid -S -E zxid.pem where -p 8443 specifies the port to listen to -c zxid specifies that URL paths ending in "zxid" are CGI scripts -S specifies that https is to be used -E zxid.pem specifies the SSL certificate to use See <> for alternative that avoids mini_httpd, but is more complicated otherwise. > N.B. The zxid.pem certificate and private key combo is shipped with zxid > for demonstration purposes. Obviously everybody who downloads zxid > has that private key, so there is no real security what-so-ever. For > production use, you must generate, or acquire, your own private > key-certificate pair (and keep the private key secret). See Certificates > chapter for further info. 2.2.3 Accessing ZXID ~~~~~~~~~~~~~~~~~~~~ Edit your /etc/hosts file so that the definition of localhost also includes sp1.zxidcommon.org and sp1.zxidsp.org domain names, e.g: 127.0.0.1 localhost sp1.zxidcommon.org sp1.zxidsp.org Point your browser to > https://sp1.zxidsp.org:8443/zxid or if you do not want the common domain cookie check > https://sp1.zxidsp.org:8443/zxid?o=E 2.2.4 Setting up an IdP ~~~~~~~~~~~~~~~~~~~~~~~ Currently zxid does not ship with an IdP (though the necessary protocol encoders and decoders are latently available in libzxid, should anyone wish to make an attempt to hack an IdP together). For you to test zxid, you will need to acquire an IdP from somewhere - any vendor whose product is SAML 2.0 certified will do. One possible source is http://symlabs.com/Products/SFIAM.html who have a free download. If you do not want to install an IdP yourself (even for testing), find someone who already runs one and ask if they would be willing to load the metadata of your zxid SP. If you do this, you will need to get externally visible domain names. This canned tutorial uses /etc/hosts (see previous step) which is only visible on your own machine. Once you get your IdP up and running, you need to make sure it accepts the zxid SP in its Circle of Trust (CoT). This is done by placing the metadata of the SP in right place in the IdP product configuration. If your IdP supports automatic CoT management, just turn it on and chances are you are done.<> If not, you can obtain the zxid SP metadata (which is slightly different for each install so you can't just copy it from existing install) from > https://sp1.zxidsp.org:8443/zxid?o=B This URL is the +well known location method+ metadata URL. It is also the SP +Entity ID+ or Provider ID, should the IdP product ask for this in its configuration. If the IdP product needs you to supply the metadata manually as an xml file, just point your web browser to the above URL and save to file. zxid SP, by default, has automatic fetching of IdP metadata enabled so there is no manual configuration step needed, provided that the IdP supports the well known location method. All SAML 2.0 certified IdP implementations must support it (but you may still need to enable it in configuration). However, you will need the Entity ID (Provider ID) of the IdP. This is the URL that the IdP uses for well known location method of metadata sharing. You may need to dig the IdP documentation or GUI for a while to find it. If you already have the IdP metadata as an xml file, open it and look for EntityDescriptor/entityID. If you already have the file, you can also import it manually by running following command ./zxid -import file:///path/to/idp-meta.xml But the preferred method still is just let the automatic method do its job. 2.2.5 Your first SSO ~~~~~~~~~~~~~~~~~~~~ 1. Start at > https://sp1.zxidsp.org:8443/zxid or > https://sp1.zxidsp.org:8443/zxid?o=E If you had common domin cookie already in place, and you are already logged in the IdP, the SSO may happen automatically (go to step 3). The automatic experience will be typical when you use SSO regularily for more than one web site (i.e. SP). However, if you get a screen titled "ZXID SP SSO", you need to paste the IdP's Entity ID to the supplied field and click "Login". If zxid SP already obtained the metadata for the IdP, you may also see a button specific for your IdP (and in this case there is no need to know the Entity ID anymore or paste anything). 2. Next step depends on the IdP product you are using. Usually a login screen will appear asking for user name and password. Supply these and login. You will need an account at the IdP. 3. For more slick IdPs, that's all you need to do and you will land right back at the zxid SP page titled "ZXID SP Management". > Congratulations, you have made your first SSO! However, some IdPs will pester you with additional questions and you will have to jump through their hoops. A typical question is whether you want to accept a federation. You do. Sometimes the federation question does not appear automatically and you need to figure out a way to create a federation in their user interface and how to get them to send you back to SP. Sometimes the word used is "account linking" instead of federation.<> 3 Configuring and Running ========================= ZXID ships with working demo configuration so you can run it right away and once you are familiar with the concepts, you can return to this chapter. ZXID uses a configuration file in hardwired path<> /var/zxid/zxid.conf for figuring out its parameters. If this file is not present, built-in default configuration is used. The built-in configuration will allow you to test features of ZXID, but should not be used in production because it uses default certificates and private keys. Obviously the demo private key is of public knowledge since it is distributed with the ZXID package, and as such it provides no privacy protection what-so-ever. For production use you MUST generate your own certificate and private key. Usually configuration of a system involves following tasks 1. Configure web server (see your web server documentation) a. HTTPS operation and TLS certificate. In the minimum you need the main site, but you may want to configure the Common Domain Cookie virtual host as well. b. Arrange for ZXID to be invoked. This could mean configuring zxid.x or zxid.pl to be recognized as a CGI script, or it could mean setting up your ~mod_perl~ or ~mod_php~ system to call ZXID at the appropriate place. 2. Configure ZXID, including signing certificate and CoT with peer metadata a. generate or acquire certificate b. Obtain peer metadata (from their well known location) or enable +Instant CoT+ feature. 3. Configure CoT peers with your metadata. They can download your metadata from your well known location (which is the URL that is your entity ID). For this to happen you need to have web server and ZXID up and running. 3.1 Configuration Parameters ---------------------------- 3.1.1 zxidroot ~~~~~~~~~~~~~~ The root directory of ZXID configuration files and directories. By default this is /var/zxid and has following directories and files in it /var/zxid/ | +-- zxid.conf Main configuration file +-- pem/ Our certificates +-- cot/ Metadata of CoT partners (metadata cache) +-- ses/ Sessions `-- log/ Log files, pid files, and the like 3.1.2 pem ~~~~~~~~~ Directory that holds various certificates. The certificates have hardwired names that are not configurable. ca.pem:: Certification Authority certificates. These are used for validating any certificates received from peers (other sites on the CoT). The CA certificates may also be shipped to the peers to facilitate them validating our signatures. This is especially relevant if the certificate is issued by multilayer CA hierarchy where the peer may not have the intermediate CA certificates. sign-nopw-cert.pem:: The signing certificate AND private key (concatenated in one file). The private key MUST NOT be encrypted (there will not be any opportunity to supply decryption password). enc-nopw-cert.pem:: The encryption certificate AND private key (concatenated in one file). The private key MUST NOT be encrypted (there will not be any opportunity to supply decryption password). The signing certificate can be used as the encryption certificate. If encryption certificate is not specified it will default to signing certificate. In addition to the above certificates and private keys, you will need to configure your web server to use TLS or SSL certificates for the main site and the Common Domain site. We suggest the following naming ssl-nopw-cert.pem:: SSL or TLS certificate for main site. In order to avoid browser warnings, the CN field of this certificate should match the domain name of the site. The SSL certificate can be same as signing or encryption certificate. cdc-nopw-cert.pem:: SSL or TLS certificate for Common Domain Cookie introduction site. In order to avoid browser warnings, the CN field of this certificate should match the domain name of the site. The SSL certificate can be same as signing or encryption certificate. 3.1.3 cot ~~~~~~~~~ Directory that holds metadata of the Circle of Trust (CoT) partners. If +Instant CoT+ is enabled, this directory needs to be writable at run time. 4 Compilation for Experts ========================= make cleaner make dep ENA_GEN=1 make 4.1 Build Process ----------------- The build process of ZXID relies heavily on code generation techniques that are not for the faint of heart. Some of these techniques, like xsd2sg.pl were innovated for this project, while others like SWIG and gperf are existing software. Here and there some additional perl(1) and sed(1) scripts are run to fix a thing or two. < hc [label="xsd2sg.pl"]; sg -> gperf [label="xsd2sg.pl"]; gperf -> hc [label="gperf"]; hc -> hc [label="gen-consts-from-gperf-output.pl"]; hc -> libzxid [label="gcc"]; hc -> pm [label="swig"]; i -> pm [label="swig"]; hc -> php [label="swig"]; phpi -> php [label="swig"]; libzxid -> zxid [label="ld"]; libzxid -> netsaml [label="ld"]; pm -> netsaml [label="perl, gcc, ld"]; libzxid -> php_zxid [label="ld"]; php -> php_zxid [label="gcc, ld"]; >> Carefully study the Makefile and this should all start to make sense. 4.2 Special or embedded compile (reduced functionality) ------------------------------------------------------- libzxid contains thousands of functions and any given application is unlikely to use them all. Thus the easiest, safest, no loss of functionality, way to reduce the footprint is to simply enable compiler and linker flags that support dead function elimination.<> If you need to squeeze zxid into as minimal space as possible, some functionality tradeoffs are supported. I stress that you should only attempt these tradeoffs once you are familiar with zxid and know what you are doing. The canned install instructions and tutorial walk throughs stop working if you omit significant functionality. 4.2.1 Compilation without OpenSSL ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Comment out the -DUSE_OPENSSL flag from CFLAGS in Makefile and recompile. This will cripple zxid from security perspective because it will no longer be able to verify or generate digital signatures. Unless your environment does not need trust and security, or you understand thoroughly how to provide trust and security by other means, it is a very bad idea to compile without OpenSSL. N.B. Compiling, or not, zxid with OpenSSL does not affect whether your web server will use SSL or TLS. Unless you know what you are doing, you should be using SSL at web server layer. Given that SSL is used at web server layer, the savings you would gain from compiling zxid without OpenSSL may be neglible if you use dynamic linking. 4.2.2 Compilation without libcurl ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Comment out the -DUSE_CURL flag from CFLAGS in Makefile and recompile. Disabling libcurl does not have adverse security implications: you only loose some functionality and depdending on your situation you may well be able to live without it. 1. Without libcurl, zxid can not act as a SOAP client. This has a few consequences a. Artifact profile for SSO is not supported because it needs SOAP to resolve the artifact. In most cases a perfectly viable alternative is to use POST profile for SSO. b. SOAP profiles for Single Logout and NameID management (aka defederation) are not supported. You can use the redirect profiles and get mostly the same functionality. 2. Automatic CoT metadata fetching using well known location method is not supported without libcurl. You can fetch the metadata manually, e.g. using web browser, and place it in /var/zxid/cot directory. If you want to manually control your Circle of Trust relationships, you probably want to do this anyway so loss of automatic functionality is a nonissue.<> 3. Web Services Client (WSC) functionality is not supported without libcurl. Effectively this is just another case of SOAP needed. If you have your own SOAP implementation, you may, at lesser automation, achieve much of the same functionality by calling the encoder and decoder functions manually. 4.2.3 Compiling without zlib (not supported) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ zlib is used mainly in redirect profiles. Since zlib foot print is small, we have made no supported provision to compile without it. If you hack something together, let us know. 4.3 Choosing Which Standards to Compile in (default: all) --------------------------------------------------------- On space constrained systems you may shed additional weight by only compiling in the IdM standards you actully use. Of course, if you do not use them the dead function eliminations should take care of them, but sometimes you can gain additional savings in space and especially compile time. Another reason could be, in the land of the free, if some modules are covered by a software patent, you may want to compile a binary without the contested functionality.<> You can tweak the flags, shown in accompanying table, in the Makefile or by supplying new values on commend line. For example make TARGET=sol8 ENA_SAML2=0 would disable SAML 2.0 (and trigger build for Sparc Solaris 8). <> 4.4 localconf.mk ---------------- You can use localconf.mk to remember your own make options, such as TARGET and different ENA flags, wihtout editing the distributed Makefile. One useful option to put in localconf.mk is ENA_GEN which will turn on the dependencies that will trigger generation of the files in zxid/c directory. For example echo 'ENA_GEN=1' >>localconf.mk make 5 Net::SAML Perl Module ======================= * perl CGI example: zxid.pl * using with mod_perl After building the main zxid tree, you can cd Net perl Makefile.PL make make test # Tests are extremely sparse at the moment make install This assumes you use the pregenerated Net/SAML_wrap.c and Net/SAML.pm files that we distribute. If you wish to generate these files from origin, you need to have SWIG installed and then say in main zxid directory make perlmod # Makes all available perl modules (including heavy low level ones) make samlmod # Only makes Net::SAML (much faster) make wsfmod # Only makes Net::WSF (much faster) > WARNING: Low level interface is baroque, and consequently, it > will take a lot of disk space, RAM and CPU to build it: 100 MB > would not be exageration and over an hour (on 1GHz CPU). Build > time memory consumption of single cc1 process will be over > 256 MB of RAM. You have been warned. 5.1 Current major modules are ----------------------------- * Net::SAML - The high level interfaces for Single Sign-On (SSO) * Net::SAML::Raw - Low level assertion and protocol manipulation interfaces * Net::SAML::Metadata - Low level metadata manipulation interfaces 5.2 Planned modules ------------------- * Net::WSF - The high level interfaces for Web Services Frameworks (WSF) * Net::WSF::Raw - The low level interfaces for WSF variants * Net::WSF::WSC - The high level interfaces for Web Services Clients * Net::WSF::WSC:Raw 5.3 Perl API Adaptations ------------------------ The perl APIs were generated from the C .h files using SWIG. Generally any C functions and constants that start by zxid_, ZXID_, SAML2_, or SAML_ have that prefix changed to <>. Note, however, that the zx_ prefix is not stripped. Since ZXID wants to keep strings in many places in length + data representation, namely as ~struct zx_str_s~, SWIG typemaps were used to make this happen automatically. Thus any C function that takes as an argument <> can take a perl string directly. Similarily any C function that returns such a pointer, will return a perl string instead. As a final goodie, any C function, such as struct zx_str_s* zx_ref_len_str(struct zx_ctx* c, int len, char* s); that takes ~length~ and <> as explicit arguments, takes only single argument that is a perl string (the one argument automatically satisfies two C arguments, thanks to a type map). The above could be called like $a = Net::SAML::zx_ref_len_str(c, "foo"); First the "foo" satisfies both ~len~ and ~s~, and then the return value is converted back to perl string. 5.4 Testing Net::SAML and zxid.pl as CGI script ----------------------------------------------- To test the perl module, you must restart the mini_httpd(8) so that it recognizes zxid.pl as CGI script: mini_httpd -p 8443 -c zxid.pl -S -E zxid.pem Then start browsing from https://sp1.zxidsp.org:8443/zxid.pl or if you want to avoid the common domain cookie check https://sp1.zxidsp.org:8443/zxid.pl?o=E 5.5 Testing Net::SAML and zxid.pl under mod_perl ------------------------------------------------ You can run zxid.pl under mod_perl using the Apache::Registry module. See <> for how to compile Apache to support mod_perl. After configuration it should work the same as the CGI approach. 5.6 Debugging Net::SAML with GDB -------------------------------- As bizarre as it may sound, it is actually quite feasible to debug libzxid and the SAML_wrap.c using GDB while in perl. For example cd zxid gdb /usr/local/bin/perl set env QUERY_STRING=o=E r ./zxid.pl If the script crashes inside the C code, GDB will perfectly reasonably take control allowing you to see stack back-trace (bt) and examine variables. Of course it helps if openssl and perl were compiled with debug symbols (libzxid is compiled with debug symbols by default), but even if they weren't you can ususally at least get some clue. When preparing a perl module, generally Makefile.PL mechanism causes the same compilation flags to be used as were used to compile the perl itself. Generally this is good, but if libzxid was compiled with different flags, mysterious errors can crop up. For example, I compile my libzxid agains openssl that I have also compiled myself. However, I once had a bug where the perl had been compiled such that the Linux distribution's incompatible openssl would be picked by perl compile flags, resulting in mystery crashes deep inside openssl ASN.1 decoder routines (c2i_ASN1_INTEGER() while in d2i_X509() to be exact). When I issued `info files' in GDB I finally realized that I was using the wrong openssl library. 6 PHP extension php_zxid.so =========================== The PHP integration is incomplete due to incomplete support in SWIG for php5. However, enough interface exists to get most high level API working and thus successfully run an SP. After building main zxid distribution, say make phpzxid You MUST have php-config(1) in path. If not, try make phpzxid PHP_CONFIG=/path/to/php-config If the extension built successfully, you can use it by copying it to a suitable place, e.g. make phpzxid_install The install again uses the php-config(1) to figure out where php(1) can find the module. Then add in your script dl("php_zxid.so"); // Load the module You may need to tweak the paths, or LD_LIBRARY_PATH, to get this to work. 7 Python Extension ================== TBD using SWIG 8 Java Native API ================= TBD using SWIG 9 Integration with Existing Web Sites ===================================== Single Sign-On is used to protect some useful resources. ZXID does not have any means of serving these resources, rather a normal web server or application server should do it. ZXID should just concentrate on verifying that a user has valid session, and if not, establishing the session by way of SSO. 9.1 Brief Overview of Control Flow ---------------------------------- The SAML 2.0 specifications mandate a wire protocol, and in order to speak the wire protocol, the SP application typically has to follow certain standard sequence of control flow. <> First a user<> tries to access a web site that acts in SP role. This triggers following sequence of events 1. User is redirected to URL in a common domain. This is so that we can read the Common Domain Cookie that indicates which IdP the user uses. Alternatively, if you started at https://sp1.zxidsp.org:8443/zxid?o=E, the CDC check is by-passed and flow 2b. happens. 2. After the CDC check, a Authentication Request (AuthnReq) is generated. The IdP may have been chosen automatically using CDC (2a), or there may have been some user interface interaction (not show in the diagram) to choose the IdP. 3. User is redirected to the IdP. The redirection carries as a query string a compressed and encoded form of the SAML 2.0 AuthnReq. 4. Once the IdP has authenticated the user, or observed that there already is a valid IdP session (perhaps from a cookie), the IdP redirects the user back to the SP. The AuthnResponse may be carried in this redirection in a number of alternate ways a. The redirect contains a special token called +artifact+. The artifact is a reference to the AuthnResponse and the SP needs to get the actual AuthnResponse by using a SOAP call (the 4bis step). b. The "redirect" is actually a HTML page with a form and little JavaScript that causes the form to be automatically posted to the SP. The AuthnResponse is carried as a form field. 5. After verifying that AuthnResponse indicated a success, the SP establishes a local session for the user (perhaps setting a cookie to indicate this). Depending on how the SP to web site integration is done the user is taken to the web site in one of the two ways a. Redirect to the content. This time the session is there, therefore the flow passes directly from check session to the web content. b. It is also possible to show the content directly without any intervening redirection. 9.2 Redirect Approach to Integration ------------------------------------ 9.3 Pass-thru Approach to Integration ------------------------------------- 9.3.1 mod_perl pass-thru ~~~~~~~~~~~~~~~~~~~~~~~~ 9.3.2 PHP pass-thru ~~~~~~~~~~~~~~~~~~~ 9.3.3 mod_zxid pass-thru ~~~~~~~~~~~~~~~~~~~~~~~~ 9.4 Proxy Approach to Integration --------------------------------- 10 Native C API =============== The generated aspects of the native C API are in c/*-data.h, for example c/saml2-data.h Studying this file is very instructive. 10.1 C Data Structures ---------------------- From .sg a header (NN-data.h) is generated. This header contains structs that represent the data of the elements. Each element and attribute generates its own node. Even trivial nodes like strings have to be kept this way because the nodes form basis of remembering the ordering of data. This ordering is needed for exclusive XML canonicalization, and thus for signature verification.<> Any missing data is represented by NULL pointer. Any repeating data is kept as a linked list, in reverse order of being seen in the data stream.<> Simple elements and all attributes are represented by simple string node (even if they are booleans or integers). *Example* Consider following XML lNIzVMrp8CwTE= GeMp7LS...vnjn8= Decoding would produce the data structure in Fig-<>. You should also look at c/saml2-data.h to see the structs involved in this example. <gg.kids|SignedInfo|SignatureValue|KeyInfo (0)|Object (0)|Id (0)}}"]; siginfo [shape=record,label="zx_ds_SignedInfo_s|{|{gg.kids|gg.g.wo|CanonicalizationMethod|SignatureMethod|Reference|Id (0)}}"]; canonmeth [shape=record,label="zx_ds_CanonicalizationMethod_s|{|{gg.g.wo|Algorithm\n\"http://w3.org/xml-exc-c14n#\"}}"]; sigmeth [shape=record,label="zx_ds_SignatureMethod_s|{|{gg.g.wo|Algorithm\n\"http://w3.org/xmldsig#rsa-sha1\"}}"]; ref [shape=record,label="zx_ds_Reference_s|{|{gg.kids|gg.g.wo (0)|Transforms|DigestMethod|DigestValue|Id (0)|Type (0)|URI\n\"#RrcrNwFIw6n\"}}"]; xforms [shape=record,label="zx_ds_Transforms_s|{|{gg.kids|gg.g.wo|gg.g.n (0)|Transform}}"]; xform_c14n [shape=record,label="zx_ds_Transform_s|{|{gg.g.wo|gg.g.n (0)|XPath (0)|Algorithm\n\"http://w3.org/xml-exc-c14n#\"}}"]; xform_env [shape=record,label="zx_ds_Transform_s|{|{gg.g.wo (0)|gg.g.n|XPath (0)|Algorithm\n\"http://w3.org/xmldsig#env-sig\"}}"]; xforms:f_xform -> xform_env xform_env:f_n -> xform_c14n digmeth [shape=record,label="zx_ds_DigestMethod_s|{|{gg.g.wo|Algorithm\n\"http://w3.org/xmldsig#sha1\"}}"]; digval [shape=record,label="zx_elem_s|{|{gg.g.wo (0)|content\n\"lNIzVMrp8CwTE=\"}}"]; sigval [shape=record,label="zx_ds_SignatureValue_s|{|{gg.g.wo (0)|gg.content\n\"GeMp7LS...vnjn8=\"|Id (0)}}"]; sig:f_siginfo -> siginfo sig:f_sigval -> sigval siginfo:f_canonmeth -> canonmeth siginfo:f_sigmeth -> sigmeth siginfo:f_ref -> ref ref:f_xforms -> xforms ref:f_digmeth -> digmeth ref:f_digval -> digval sig:f_kids ->siginfo [weight=0,style=dashed,color=red] siginfo:f_wo ->sigval [weight=0,style=dashed,color=red] siginfo:f_kids -> canonmeth [weight=0,style=dashed,color=red] canonmeth:f_wo -> sigmeth [weight=0,style=dashed,color=red] sigmeth:f_wo -> ref [weight=0,style=dashed,color=red] ref:f_kids -> xforms [weight=0,style=dashed,color=red] xforms:f_wo -> digmeth [weight=0,style=dashed,color=red] digmeth:f_wo -> digval [weight=0,style=dashed,color=red] xforms:f_kids -> xform_c14n [weight=0,style=dashed,color=red] xform_c14n:f_wo -> xform_env [weight=0,style=dashed,color=red] >> There are two pointer systems at play here. The black solid arrows depict the logical structure of the XML document. For each child element there is a struct field that simply points to the child. If there are multiple occurances of the child, as in ~sig->SignedInfo->Reference->Transforms->Transform~, the children are kept in a linked list connected by gg.g.n (next) fields.<> The +wide order+ structure, depicted by red dashed arrows, is maintained using gg.kids and gg.g.wo fields. For example ~sig->SignedInfo->Reference->Transforms~ keeps its kids, the ~zx_ds_Transform~ objects, in the original order hanging from the kids and linked with the wo field. As can be seen the order kept with wo fields can be different than the one kept using n (next) fields. What's more, the kids list can contain dissimilar objects, witness ~sig->SignedInfo->Reference->gg.kids~. The wire order representation is only captured when decoding the document and is mainly useful for correctly cononicalizing the document for signature verification. If you are building a data structure in your own program, you typically will not set the gg.kids and gg.g.wo fields. In the diagram, the objects of type ~zx_str_s~ were collapsed to double quoted strings. Superfluous gg.kids, gg.g.wo, and gg.g.n fields were omitted: they exist in all structures, but are not shown when they are ~NULL~. The ~NULL~ is depicted as zero (0).<> <> 10.1.1 Handling Namespaces ~~~~~~~~~~~~~~~~~~~~~~~~~~ An annoying feature of XML documents is that they have variable namespace prefixes. The namespace prefix for the unqualified elements is taken to be the one specified in target() directive of the .sg input. Name of an element in C code is formed by prefixing the element by the namespace prefix and an underscore. Attributes will only have namespace prefix if such was expressly specified in .sg input. When decoding, the actual namespace prefixes are recorded. The wire order encoder knows to use these recorded prefixes so that accurate canonicalization for XMLDSIG can be produced. If the message on wire uses wrong namespaces, the wrong ones are remembered so that canonicalization for signature validation will work irrespective. The ability to accept wrong namespaces only works as long as there is no ambiguity as to which tag was meant - there are some tags that need namespace information to distinguish. If you hit one of these then either you get lucky and the one that is arbitrarily picked by the decoder happens to be the correct one, or you are stuck with no easy way to make it right. Of course the XML document was wrong to start with so theoretically this is not a concern. Generally the more schemas that are simultaneously generated to one package, the greater the risk of collisions between tags. The schema order encoder always uses the prefixes defined using target() directives in .sg files. The runtime notion of namespaces is handled by ~ns_tab~ field of the decoding and encoding context. It is initialized to contain all namespaces known by virtue of .sg declarations. The runtime assigned prefixes are held in a linked list hanging from <> (next) field of ~struct zx_ns_s~. (*** more work needed here) The code generation creates a file such as c/saml2-ns.c which contains initialization for the table. The main program should point the ns_tab field of context as follows: main { struct zx_ctx* ctx; ... ctx->ns_tab = zx_ns_tab; /* Here zx_ is the chosen prefix */ } Consider following evil contortion Assuming the ~ns_tab~ assigns prefix <> to the namespace URI, we would have following data structure as a result of a decode <}|{z|iru|}}"] e [shape=record,label="e|uri|"] h [shape=record,label="h|uri|"] b [shape=record,label="b|uri|0"] i [shape=record,label="e|iru|0"] ns_tab:uri_n -> e ns_tab:iru_n -> i e:n -> h h:n -> b E -> H [style=bold] E -> B [style=bold] B -> C [style=bold] B -> D [style=bold] D -> F [style=bold] E -> e [color=red] H -> h [color=red] B -> b [color=red] C -> e [color=red] D -> i [color=red] F -> e [color=red] >> The red thin arrows indicate how the elements reference the namespaces. Since none of the elements used the prefix originally specified in the schema grammar target() directive, we ended up allocating "alias" nodes for the uri. However, since E and C use the same prefix, they share the alias node. Things get interesting with D: it redefines the prefix e to mean different namespace URI, "iru", which happens to be an alias of prefix z. Later, when wire order canonical encode is done, the red thin arrows are chased to determine the namespaces. However, we need to keep a separate "seen" table to track whether parent has already declared the prefix and URI. E would declare xmlns:e="uri", but C would not because it had already been "seen". However, F would have to declare it again because the xmlns:e="iru" in D masks the declaration. The ~zx_ctx~ structure is used to track the namespaces and "seen" status through out decoders and encoders. <}|{z|iru|0|0|}}"] e [shape=record,label="e|uri|0|0|"] ee [shape=record,label="e|uri||0|"] h [shape=record,label="h|uri|0||"] b [shape=record,label="b|uri|0||0"] i [shape=record,label="e|iru||0|0"] ctx [shape=record,label="{ctx|{|{ns_tab|seen_n}}}"] ns_tab:uri_n -> ee ns_tab:iru_n -> i ee:n -> e e:n -> h h:n -> b E -> H [style=bold] E -> B [style=bold] B -> C [style=bold] B -> D [style=bold] D -> F [style=bold] E -> e [color=red] H -> h [color=red] B -> b [color=red] C -> e [color=red] D -> i [color=red] F -> ee [color=red] ns_tab -> ctx:ns [arrowhead=none,arrowtail=normal] b -> ctx:sn [color=blue,style=dotted,arrowhead=none,arrowtail=normal] b:sn -> h [color=blue,style=dotted] h:sn -> ee [color=blue,style=dotted] ee:s -> i [color=green,style=dashed] i:s -> e [color=green,style=dashed] >> Here we can see how the ~seen_n~ list, represented by the blue dotted arrows, was built: at the head of the list, ~ctx->seen_n~, is the last seen prefix, namely b (beacuse, although the meaning of e at F was different, e as a prefix had already been seen earlier at E), followed by other prefixes in inverse order of first occurance.<> The green dashed arrows from e:uri to e:iru and then on to second e:uri reflect the fact that e:uri (second) was put to the list first (when we were at E), but later, at D, a different meaning, iru, was given to prefix e. Finally at F we give again a different meaning for e, thus pushing to the "seen stack" another node. Although e at E and at F have namespace URI, "uri", we are not able to use the same node because we need to keep the stack order. Thus we are forced to allocate two identical nodes. 10.1.2 Handling any and anyAttribute ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since our aim is to be lax in what we accept, every element can handle unexpected additional attributes as well as unexpected elements. Thus whether the schema specifies any or anyAttribute or not, we handle everything as if they were there. However, when attributes and elements are received out side of their expected context, they are simply trated as strings whith string names. This is true even for those attributes and elements that would be recognizable in their proper context. The any extension points, as well as some bookkeeping data are hidden inside ~ZX_ELEM_EXT~ macro. If you tinker with this macro, be sure you know what you are doing. If you want to add your own specific fields to all structs, redefining ~ZX_ELEM_EXT~ may be appropriate, but if you want to add more fields only to some specific structures, you can define a macro of form TPF_EEE_EXT and put in it whatever fields you want. These fields will be initialized to zero when the structure is created, but are not touched in any other way by the generated code. In particular, if some of your fields are pointers, it will be your responsibility to free them. The standard free functions will not understand to free them. See the data structure walking functions, below for one way to accomplish this. 10.1.3 Root data structure ~~~~~~~~~~~~~~~~~~~~~~~~~~ The root data structure struct zx_root_s; is a special structure that has a field for evey top level recognizable element. 10.1.4 Per element data structures ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *** TBW 10.1.5 Memory Allocation ~~~~~~~~~~~~~~~~~~~~~~~~ After decoding all string data points directly into the input buffer, i.e. strings are NOT copied. Be sure to not free the input buffer until you are done processing the data structure. If you need to take a copy of the strings, you will need to walk the data structure as a post processing step and do your copies. This can be done using void TPF_dup_strs_len_NS_EEE(struct zx_dec_ctx* c, struct TPF_NS_EEE_s* x); The structures are allocated via ZX_ZALLOC() macro, which by default calls zx_zalloc() function, which in turn uses system malloc(3). However, you can redefine the macro to use whatever other allocation scheme you desire. The generated libraries never free(3) memory. In many programming patterns, this is actually desireable: for example a CGI program can count on dying - the process exit(2) will free all the memory. If you need to free(3) the data structure, you will need to walk it using void TPF_free_len_NS_EEE(struct zx_dec_ctx* c, struct TPF_NS_EEE_s* x, int free_strings); void zx_free_any(struct zx_dec_ctx* c, struct zx_note_s* n, int free_strs); The zx_free_any() works by having a gigantic switch statement that calls the appropriate specific free function. You can deep clone the data structure with void TPF_deep_clone_NS_EEE(struct zx_dec_ctx* c, struct TPF_NS_EEE_s* x, int dup_strings); struct zx_note_s* zx_clone_any(struct zx_dec_ctx* c, struct zx_note_s* n, int dup_strs); The zx_clone_any() works by having a gigantic switch statement that calls the appropriate specific free function. 10.2 Decoder as Recursive Descent Parser ---------------------------------------- The entry point to the decoder is struct zx_root_s* zx_DEC_root(struct zx_dec_ctx* c, struct zx_ns_s* dummy, int n_decode); The decoding context holds pointer to the raw data and must be initialized prior to calling the decoder. The third argument specifies how many recognized elements are decoded before returning. Usually you would specify 1 to consume one top level element from the stream.<> The returned data structure, ~struct zx_root_s~, contains one pointer for each type of top level element that can be recognized. The ~tok~ field of the returned value identifies the last top level element recognized and can be used to dispatch to correct request handler: zx_prepare_dec_ctx(c, TPF_ns_tab, start_ptr, end_ptr); struct TPF_root_s* x = TPF_DEC_root(c, 0, 1); switch (x->gg.g.tok) { case TPF_NS_EEE_ELEM: return process_EEE_req(x->NN_EEE); } When processing responses, it is generally already known which type of response you are expecting, so you can simply check for NULLness of the respective pointer in the returned data structure. Internally zx_DEC_root() works much the same way: it scans a beginning of an element from the stream, looks up the token number corresponding to the element name, and switches on that, calling element specific decoder functions (see next section) to do the detailed processing. In the above code fragment, you should note the call to zx_prepare_dec_ctx() which initializes the decoder machinery. It takes +ns_tab+ argument, which specifies which namespaces will be recognized. This table MUST match the TPF_DEC_root() function you call (i.e. both must have been generated as part of the same xsd2sg.pl invocation). The other arguments are the start of the buffer to decode and pointer one past the end of the buffer to decode. 10.2.1 Element Decoders ~~~~~~~~~~~~~~~~~~~~~~~ For each recognizable element there is a function of form struct TPF_NS_EEE_s* zx_DEC_NS_EEE(struct zx_dec_ctx* c); where TPF is the prefix, NS is the namespace prefix, and EEE is the element name. For example: struct zx_se_Envelope_s* zx_DEC_se_Envelope(struct zx_ctx* c); These functions work much the same way as the root decoder. You should consult dec-templ.c for the skeleton of the decoder. Generally you should not be calling element specific decoders: they exist so that zx_DEC_root() can call them. They have somewhat nonintuitive requirtements, for example the opening <, the namespace prefix, and the element name must have already been scanned from the input stream by the time you call element specific decoder. 10.2.2 Decoder Extension Points ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The generated code is instrumented with following macros ZX_ATTR_DEC_EXT(ss):: Extension point called just after decoding known attribute ZX_XMLNS_DEC_EXT(ss):: Extension point called just after decoding xmlns attribute ZX_UNKNOWN_ATTR_DEC_EXT(ss):: Extension point called just after decoding unknown attr ZX_START_DEC_EXT(x):: Extension point called just after decoding element name and allocating struct, but before decoding any of the attributes. ZX_END_DEC_EXT(x):: Extension point called just after decoding the entire element. ZX_START_BODY_DEC_EXT(x):: Extension point called just after decoding element tag, including attributes, but before decoding the body of the element. ZX_PI_DEC_EXT(pi):: Extension point called just after decoding processing instruction ZX_COMMENT_DEC_EXT(comment):: Extension point called just after decoding comment ZX_CONTENT_DEC(ss):: Extension point called just after decoding string content ZX_UNKNOWN_ELEM_DEC_EXT(elem):: Extension point called just after decoding unknown element Following macros are available to the extension points TPF:: Type prefix (as specified by -p during code generation) EL_NAME:: Namespaceful element name (NS_EEE) EL_STRUCT:: Name of the struct that describes the element EL_NS:: Namespace prefix of the element (as seen in input schema) EL_TAG:: Name of the element without any namespace qualification. 10.3 Exclusive Canonical Encoder -------------------------------- The encoder receives a C data structure and generates a gigantic string containing an XML document corresponding to the data structure and the input schemata. The XML document conforms to the rules of exclusinve XML canonicalization and hence is useful as input to XMLDSIG. One encoder is generated for each root node specified at the code generation. Often these encoders share code for interior nodes. The encoders allow two pass rendering. You can first use the length computation method to calculate the amount of storage needed and then call one of the rendering functions to actually render. Or if you simply have large enough buffer, you can just render directly. The encoders take as argument next free position in buffer and return a char pointer one past the last byte used. Thus you can discover the length after rendering by subtracting the pointers. This is guaranteed to result same length as returned by the length computation method.<> You can also call the next encoder with the return value of the previous encoder to render back-to-back elements. The XML namespace and XML attribute handling of the encoders is novel in that the specified sort is done already at code generation time, i.e. the renderers are already in the order that the sort mandates. For attributes we know the sort order directly from the schema because [xml-c14n], sec 2.2, p.7, specifies that they sort first by namespace URI and then by name, bot of which we know from the schema. For ~xmlns~ specifications the situation is similarily easy in the schema order encoder case because we know the namespace prefixes already at code generation time. However, for the wire order encoder we actually need a runtime sort because we can not control which namespace prefixes get used. However, for both cases we can make a pretty good guess about which namespaces might need to be declared at any given element: the element's own namespace and namespaces of each of its attribuites. That's all, and it's all known at code generation time. At runtime we only need to check if the namespace has already been seen at outer layer. 10.3.1 Length computation ~~~~~~~~~~~~~~~~~~~~~~~~~ Compute length of an element (and its subelements). The XML attributes and elements are processed in schema order. int TPF_LEN_SO_NS_EEE(struct zx_ctx* c, struct TPF_NS_EEE_s* x); For example: int zx_LEN_SO_se_Envelope(struct zx_ctx* c, struct zx_se_Envelope_s* x); Compute length of an element (and its subelements). The XML namespaces and elements are processed in wire order. int TPF_LEN_WO_NS_EEE(struct zx_ctx* c, struct TPF_NS_EEE_s* x); For example: int zx_LEN_WO_se_Envelope(struct zx_ctx* c, struct zx_se_Envelope_s* x); 10.3.2 Encoding in schema order ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Render an element into string. The XML elements are processed in schema order. The xmlns declarations and XML attributes are always sorted per [xml-exc-c14n] rules.<> This is what you generally want for rendering new data structure to a string. The wo pointers are not used. char* TPF_ENC_SO_NS_EEE(struct zx_ctx* c, struct TPF_NS_EEE_s* x, char* p); For example: char* zx_ENC_SO_se_Envelope(struct zx_ctx* c, struct zx_se_Envelope_s* x, char* p); Since it is a very common requirement to allocate correct sized buffer and then render an element, a helper function is provided to do this in one step. struct zx_str_s* zx_EASY_ENC_SO_se_Envelope(struct zx_ctx* c, struct zx_se_Envelope_s* x); The returned string is allocated from allocation arena described by ~zx_ctx~. 10.3.3 Encoding in wire order ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Render element into string. The XML elements are processed in wire order by chasing wo pointers. This is what you want for validating signatures on other people's XML documents. If the wire representation was schema invalid, e.g. elements were in wrong order, the wire representation is still respected, except for xmlns declarations and XML attributes, which are always sorted, per exc-c14n rules. For each element a function is generated as follows char* TPF_ENC_WO_NS_EEE(struct zx_ctx* c, struct TPF_NS_EEE_s* x, char* p); For example char* zx_ENC_WO_se_Envelope(struct zx_ctx* c, struct zx_se_Envelope_s* x, char* p); A helper function is also available struct zx_str_s* zx_EASY_ENC_WO_se_Envelope(struct zx_ctx* c, struct zx_se_Envelope_s* x); 10.4 Signatures (XMLDSIG) ------------------------- 10.4.1 Signature Generation ~~~~~~~~~~~~~~~~~~~~~~~~~~~ *** TBW 10.4.2 Signature Validation ~~~~~~~~~~~~~~~~~~~~~~~~~~~ For signature validation you need to walk the decoded data structure to locate the signature as well as the references and pass them to zxsig_validate(). The validation involves wire order exclusive canonical encoding of the referenced XML blobs, computation of SHA1 or MD5 checksums over them, and finally computation of SHA1 check sum over the element and validation of the actual against that. The validation involves public key decryption using the signer's certificate. A nasty problem in exclusive canonicalization is that the namespaces that are needed in the blob may actually appear in the containing XML structures, thus in order to know the correct meaning of a namespace prefix, we need to perform the +seen+ computation for all elements outside and above the blob of interest.<> To verify signature, you have to do certain amount of preparatory work to locate the signature and the data that was signed. Generally what should be signed will be evident from protocol specifications or from the security requirements of your application environment. Conversely, if there is a signature, but it does not reference the appropriate elements, its worthless and you might as well reject the document without even verifying the signature. *Example* struct zxsig_ref refs[1]; cf = zxid_new_conf("/var/zxid/"); ent = zxid_get_ent_from_file(cf, "YV7HPtu3bfqW3I4W_DZr-_DKMP4."); refs[0].ref = r->Envelope->Body->ArtifactResolve ->Signature->SignedInfo->Reference; refs[0].blob = (struct zx_elem_s*)r->Envelope->Body->ArtifactResolve; res = zxsig_validate(cf->ctx, ent->sign_cert, r->Envelope->Body->ArtifactResolve->Signature, 1, refs); if (res == ZXSIG_OK) { D("sig vfy ok %d", res); } else { ERR("sig vfy failed due to(%d)", res); } This code illustrates 1. You have to determine who signed and provide the entity object that corresponds to the signer. Often you would determine the entity from element somewhere inside the message. The entity is used for retrieving the signing certificate. Another alternative is that the signature itself contains a element and you extract the certificate from there. You would still need to have a way to know if you trust the certificate. 2. You have to prepare the refs array. It contains pairs of specifications combined with the actual elements that are signed. Generally the URI XML attribute of the element points to the data that was signed. However, it is application dependent what type of ID XML attribute the URI actually references or the URI could even reference something outside the document. It would be way too unreliable for the zxsig_validate() to attempt guessing how to locate the signed data: therefore we push the responsibility to you. Your code will have to walk the data to locate all referenced bits and pieces. In the above example, locating the one signed bit was very easy: the specification says where it is (and this location is fixed so there really is no need to check the URI either). You pass the length of the refs array and the array itself as two last arguments to zxsig_validate(). 3. You need to locate the element in the document and pass it as argument to zxsig_validate(). Usually a protocol specification will say where the element is to be found, so locating it is not difficult. 4. The return value will indicate validation status. ZXSIG_OK, which has numerical value of 0, indicates success. Other nonzero values indicate various kinds of failure. 10.4.3 Certificate Validation and Trust Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Trust models for TLS and signature validation are separate. In signature validation the primary trust mechanism is that entity's metadata specifies the signing certificate and there is no Certification Authority check at all.<> This model works well if you control the admission to your CoT. However, ZXID ships by default with the automatic CoT feature turned on, thus anyone can get added to the CoT and therefore signature with any certificate they declare is "valid". This hardly is acceptable for anything involving money. 10.5 Data Accessor Functions ---------------------------- Simple read access to data should, in C, be done by simply referencing the fields of the struct, e.g. if (!r->EntitiesDescriptor->EntityDescriptor) goto bad_md; *** TBW 10.6 Memory Allocation and Free ------------------------------- *** TBW 10.7 Walking the data structure ------------------------------- *** TBW 10.9 Thread Safety ------------------ All generated libraries are designed to be thread safe, provided that the underlying libc APIs, such as malloc(3) are thread safe. 11 Creating New Interfaces Using ZXID Methodology ================================================= The ZXID code generation methodology can be used to create interfaces to any XML document or protocol that can be described as a Schema Grammar (which includes any document that can be expressed as XML Schema - XSD). The general steps are 1. Convert .xsd file to .sg, or write the .sg directly. For conversion, you would typically use a command like ~/pd/xsd2sg.pl foo.sg 2. Tweak and rationalize the resulting .sg file. In ideal world any construct expressible as .xsd should be nicely representable, but in practise some work better than others, thus you can create a much nicer interface if you invest in some manual tweaking. Note that the tweaked .sg still is able to represent the same document as the original .xsd described, though often the tweaking causes some relaxation. Most common tweaks a. If the .xsd is written so that the targetted namespace is also the default namespace, you should introduce a namespace prefix because this is needed during code generation to keep different C identifiers from clashing with each other. Ideally you should coordinate the namespace prefixes globally so that even two different projects will not clash. b. Where the choice construct is used, indicated by pipey symbol (|) in the .sg file, you should refactor these into sequences of zero-or-one occurance (?) instances of the alternatives of the choice. This is needed because for the foreseeable future xsd2sg.pl has a limitation in code generation feature. If the choice has maxOccurs="unbounded" you should use (*) instead. c. xml:lang and other similar attributes may need to be factored open to be just of type %xs:string. This is a bug in xsd2sg.pl 3. "Connect" the schema to bigger framework. Usually this means adding your schema grammar to the ZX_SG variable in zxid/Makefile and supplying additional -r flags in ZX_ROOT variable. This allows your new schema to be visible at top level. If your schema is meant to extend leafs or interior nodes of the parse tree, such as SOAP Body, you would edit the SOAP schema to accept your new protocol elements in the Body. Or that the generic SOAP header can accept your specific header schemata, or that the SAML attribute definitions accept your kind of attributes - whatever makes sense in your context. Alternative to this is to create an entirely new monolothic encoder decoder, i.e. instead of extending the existing ZXID project to accommodate your new protocol, you just start a new project that uses the same methodology. You should see how the SAML protocol part is separated from the SAML metadata parsing and from the WSF parsing in the existing project. 12 ZXID Project =============== Immediate goal: build a SAML 2.0 SP and ID-WSF 1.1 WSC Goals of ZXID project include * SOAP 1.1 support (done) * SAML 2.0 compliance - SP role (done) - IdP role * Liberty ID-FF 1.2 support - SP - IdP - SAML 1.1 * Liberty ID-WSF 1.1 support - Discovery bootstrap - Discovery WSC - ID-DAP WSC - ID-DAP WSP * Liberty ID-WSF 2.0 support - Discovery bootstrap - Discovery WSC - ID-DAP WSC - ID-DAP WSP 12.1 Project Layout ------------------- Following directory layout is used by the project. Many of the specified directories are used by intermediate outputs that are not distributed in tarball releases, but may or may no be present in CVS checkouts. zxid-0.xx | +-- Net The Net::SAML perl module +-- xsd XML schema descriptions of protocols (not distributed) +-- sg Schema Grammar (.sg) descriptions of protocols +-- c C code generated from the Schema Grammar descriptions +-- tex Temporary files for document generation using PlainDoc (not distributed) +-- html HTML documentation generated using PlainDoc +-- review Publicly released announcements and documents (not distributed) +-- t Test scripts and expected test outputs `-- tmp Temporary files, such as actual test outputs The Manifest file, that follows, explains each file in more detail. <> >> 12.2 Protocol Encoders and Decoders ----------------------------------- The protocol encoders and decoders are generated automatically from the schema grammar (.sg) descriptions. This ensures accurate protocol implementation. While the output is strictly schema driven and correct, the decoders have some provisions to accept some deviations from strict spec (e.g. out of order elements are tolerated). However, one should note that XMLDSIG does not tolerate very much deviation, thus even if decoder accepts a slightly illfomed message, it is likely to fail in signature verification. There are three outputs from generation 1. Data structures describing the data (xx.h) 2. Encoder that linearizes the data structure to wire protocol (xx-enc.c) 3. Decoder that converts wire protocol byte stream to a data structure (xx-dec.c) 12.3 Standards and Namespaces ----------------------------- ZXID uses consistently the same namespace prefixes throughout the project. The generated encoders and decoders support following schemas <> 13 Code Generation Tools ======================== Main work horse of code generation is xsd2sg.pl, which serves multiple purposes 1. Build hashes of all declarations in .sg input. Each hash element consists of array of elements and attributes, as well as groups and attribute groups. The type of array element sis determined from prefix, per .sg rules. 2. Expand groups and attribute groups 3. Evaluate each element wrt its type and generate a. C data structures b. Decoder grammar c. Token descriptions for perfect hash and lexical analyzer d. Encoder C code The code to build hashes is intervowen in the code that generates .xsd from .sg. The rest of the generation happens in a function called generate(). Typical command line (to generate SAML 2.0 protocol engine) ~/plaindoc/xsd2sg.pl -d -gen saml2 -p zx_ \ -r saml:Assertion -r se:Envelope \ -S \ sg/saml-schema-assertion-2.0.sg \ sg/saml-schema-protocol-2.0.sg \ sg/xmldsig-core.sg \ sg/xenc-schema.sg \ sg/soap11.sg \ >/dev/null </dev/null >> To generate SAML 2.0 Metadata engine you would issue ~/plaindoc/xsd2sg.pl -d -gen saml2md -p zx_ \ -r md:EntityDescriptor -r md:EntitiesDescriptor \ -S \ sg/saml-schema-assertion-2.0.sg \ sg/saml-schema-metadata-2.0.sg \ sg/xmldsig-core.sg \ sg/xenc-schema.sg \ >/dev/null </dev/null >> 13.1 Special Support for Specific Programming Languages ------------------------------------------------------- While C code generation is the main output, and this can always be converted to other languages using SWIG, sometimes a more natural language interface can be built by directly generating it. We plan to enhance the code generation to do something like this. At least direct hash-of-hashes-of-arrays-of-hashes type datastructure generation for benefit of some scripting languages is planned. 14 ZXID SP ========== *** warning: not checked lately, may be wrong! <> *** add description of CGI fields 15 Certificates =============== *** TBD - This chapter should be elaborated to be a certificate tutorial with following contents: * Intro to certs and private keys * Generating self signed cert * Generating certificate signing request and using it to obtain commercially issued cert * Installing root certs so you can recognize other people's certs * Client TLS considerations For the time being, the short answer is that ZXID uses OpenSSL and PEM format certificates. You can use same techniques as you would use for Apache / mod_ssl for acquiring certificates. You should NEVER password protect your private key. There will not be any opportunity to supply the password. You should insted protect your private key using Unix filesystem permissions. See OpenSSL.org or modssl.org FAQs for further information, including how to remove a password if you accidentally enabled it. 16 License ========== Copyright (c) 2006 Sampo Kellomäki (sampo@iki.fi), All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. While the source distribution of ZXID does not contain SSLeay or OpenSSL code, if you use this code you will use OpenSSL library. Please give Eric Young and OpenSSL team credit (as required by their licenses). And remember, you, and nobody else but you, are responsible for auditing ZXID and OpenSSL library for security problems, backdoors, and general suitability for your application. 17 FAQ ====== *** real user FAQs are still lacking. Maybe this stuff is perfect? 17.4 Vendor products -------------------- 17.4.1 Symlabs Federated Identity Access Manager (FIAM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Metadata import to IdP? What I usually do is cd /opt/SYMfiam/3.0.x/conf/symdemo-idpa echo 'sp: zxid-sp1$https://sp1.zxidsp.org:8443/zxid?o=B$$' >>cot.ldif Double check with text editor that the file is sensible. Note that the single quotes are essential as the dollars are to be interpretted literally, as separators. cd pem wget https://sp1.zxidsp.org:8443/zxid?o=B >zxid-sp1.xml Here the intent is to fetch the metadata from the SP and store it in a file whose name (without .xml extension) matches the first component of the sp: line. I am not 100% on the wget syntax. You can also use browser to fetch the metadata and simply Save as under the correct name. cd /opt/SYMfiam/3.0.x/conf/symdemo-idpa/start.sh restart This should restart the IdP server process and cause a refresh of the metadata it may have cached. You may want to tail -f /opt/SYMfiam/3.0.x/conf/symdemo-idpa/log/debug.log to see if its getting indigestion. 17.5 Known Bugs --------------- Following are known limitations. We document them here because we do not plan to fix them in foreseeable future. 1. Unknown XML attributes are not sorted according to rules of exc-c14n. Instead they appear always +after+ known XML attributes and in the order they happen to be in the linked list. *Work around:* Add the attribute to schema (.sg) and regenerate and rebuild. 17.6 Mysterious Error Messages ------------------------------ "Random number generator not seeded!!!" This warning indicates that randomize() was not able to read /dev/random or /dev/urandom, possibly because your system does not have them or they are differently named. You can still use SSL, but the encryption will not be as strong. Investigate setting up EGD (entropy gathering daemon) or PRNG (Pseudo Random Number Generator). Both are available on the net. "msg 123: 1 - error:140770F8:SSL routines:SSL23_GET_SERVER_HELLO:unknown proto" SSLeay error string. First number (123) is PID, second number (1) indicates the position of the error message in SSLeay error stack. You often see a pile of these messages as errors cascade. "msg 123: 1 - error:02001002::lib(2) :func(1) :reason(2)" The same as above, but you didn't call load_error_strings() so SSLeay couldn't verbosely explain the error. You can still find out what it means with this command: /usr/local/ssl/bin/ssleay errstr 02001002 Password is being asked for private key This is normal behaviour if your private key is encrypted. Either you have to supply the password or you have to use unencrypted private key. Scan OpenSSL.org for the FAQ that explains how to do this. 17.7 Author's Pet Peeves ------------------------ 1. What is Schema Grammar (.sg) and why are you using it? * Schema Grammar is a compact formal description of XML documents. It is mostly bidirectionally convertible to XML Schema (XSD) and captures the useful essence of most XML schemas. * Schema Grammars are intuitive and compact, often allowing the essence to be understood at glance, and even most complex cases being only about 50% of the volume of the corresponding XSD. * We use Schema Grammar descriptions because they are more human readable than XSD and still equally amenable to automated code generation. * Schema Grammar descriptions are usually converted using xsd2sg.pl, which is part of the PlainDoc distribution. * See http://mercnet.pt/plaindoc * N.B. You do not need xsd2sg.pl or PlainDoc if you just want to compile and use ZXID. 2. What is PlainDoc (.pd)? * PlainDoc is a document preparation system that uses intuitive plain text files with minimal markup to generate PDF and HTML outputs. * We use PlainDoc because it makes it easy to maintain documentation. * See http://mercnet.pt/plaindoc * N.B. You do not need PlainDoc if you just want to compile and use ZXID. 3. How come zxid is so heavy to compile? * SAML 2.0 and related specs have a lot of functionality and detail, even if you really only need 1% of it. We do not wish to arbitrate which functionality is best or most needed, so we simply provide it all. * A lot of the code is generated, thus the input for C compiler is well in excess of half a million lines of code (of which only about 6k were written by a human). * Some of the generated files are gigantic, e.g. Net/SAML/zxid_wrap.c is over 380k lines. Compiler has to process all of this as a single compilation unit. * gcc and gnu ld were, perhaps, not designed to process this large inputs efficiently. Often the implementation strategy of keeping everything in memory will cause a smaller machines to swap. * My 1GHz CPU, 256 MB RAM machine definitely swaps and thus takes about 45 minutes to compile all this stuff. * I recommend at least 1GB RAM and 3GHz CPU for development machine. On such machine, you should be able to build in about 10 min. 4. Why do you not use ./configure and GNU autoconf? * ~autoconf~ is not for everyone. World does not stop without ~autoconf~. Or indeed need ~autoconf~. It is Yet Another Dependency I Do Not Need (YADIDNN). * I find the GNU ~autoconf~ stuff much more difficult to understand than my own ~Makefile~. Why should I debug ~autoconf~ when I could spend the time debugging my ~Makefile~ or the actual code? * I find resolving problems much easier at source code and ~Makefile~ level than trying to debug a million line script generated by some system I do not understand (perhaps some hardcore ~autoconf~ advocate could try to convince me and educate me, but I doubt). * My policy is to only support systems I have first hand experience with, or I have trustworthy friends to rely on. It does not help me to have a system that tries to guess +gazillion irrellevant variables+ to an unpredictable state. It's much easier to stick to standards like POSIX and make sure you have predictable results from predictable inputs. * If the deterministic and predictable results are wrong, they can at least be debugged and fixed with a finite amount of work. * Supporting all relevant systems manually is not that much of work. The inhabitants of the irrelevant systems can support themselves, probably learning a great deal on the side. 17.8 What does ZXID aim at - an answer -------------------------------------- A recent conversation that touched on the aims of ZXID project: > So just generally, what are your goals for it, are you interested in making > it work well with what other people are producing (e.g. SAML -> WSF > cross-over), etc? I'm certainly assuming the answer's yes to that. I aim at full stack client side implementation. ID-FF, SAML 2.0, WSF (both versions). The generation technique I use will yield the encoders and decoders for both WSP and WSC, but the hand written higher level logic will at first be only written for SP and WSC. It is Apache licensed project, of course, so if someone contributes the IdP and WSP capabilities, I'll merge them into the distribution. I am interested to have it working with other people's code at 3 levels: 1. Over-the-wire iop 2. I have split the functionality of the SP from the WSC such that my SP could probably be used with oneone else's WSC and someone else's SP would reasonably easily be able to use my WSC. 3. Interfaces to non IdM parts of the complete system, typically used to implement the application layer, shall be plentiful: C/C++ API, Net::SAML/mod_perl, php - whatever you can SWIGify. One thing I am NOT interested in is "layered" stack. I strongly believe it's better each vertically integrated slice is implemented by one mind. Thus, except for lowest HTTP, TLS, and TCP/IP layers, my SP, or WSC, handles the whole depth of the stack - SOAP, signature, and app interface layers (of course the actual app should be its own layer and probably user written). That is by design. I have found in practise that if you attempt a layered stack, you have impedance mismatches between the modules at different layers because they were designed and written by different minds. By having vertical integration I avoid impedance mismatches. This is the reason why monolithic TCP/IP implementations tend to be better than explicitly layered, such as the streams approach. Now, if someone else wanted to take my generated encoders and decoders and use them as a "layer" in their layered stack, I guess I would not have any issue. If you do that, please let me know because I would have to commit to API stability at that layer. I am willing to do that once there are real projects that depend on it, but until then I still may redesign those APIs, after all, I am at revision 0.4 :-) In the end, it seems that ZXID is actually somewhat layered approach - what I mean by "vertical integration" is that all the layers are designed and controlled by the same mind. > BTW, I gather that it's SAML 2.0 at the moment, which I can't offer any test > capability for, but if you get to SAML 1.1, I'm happy to set up some kind of > IdP test capability for that. In SSO world SAML 1.1 and ID-FF 1.2 capabilities are definitely on the road map. In ID-WSF world, I'll probably start with 2.0 DS-WSC (don't we all) followed by ID-DAP WSC and then tackle 1.1 after that. 17.9 Annoyances and improvement ideas ------------------------------------- There is a lot of commonality that is not leveraged, especially in the way service end points are chose given the metadata. The descriptors are nearly identical so casting them to one should work. Many of the SAML2 responses are nearly identical. Rather than construct them fully formally, we could have just one "SAML any response" function. Perhaps this could be supported by some schema grammar level aliasing feature: if an element derives from base type without adding anything at all of its own, we might as well only generate code for the base type. Namespace aliasing scheme would allow us to consider two versions of schema the same. It seems to be fairly common that the schema changes are so minor that there is no justification for two different decoding engines. 98 Support ========== 98.1 Mailing list and forums ---------------------------- Mail the author until we get the list set up. Or volunteer a list :-) 98.2 Bugs --------- Mail the author until we get bug tracking set up. Or volunteer. 98.3 Developer access --------------------- We use CVS, but access needs to be manually configured and is not anonymous. If you contribute significantly, I will bother. Others can send patches (good way to show you are worthy of CVS access) to me. I've heard some mixed experiences about open source sites like sourceforge. If you run such site and want to host ZXID Project, please contact me. If you just always want the latest source: get the tar ball from the downloads section. Trust me, this is still so much in flux that only the tar ball snapshots are in any usable state. CVS access just to get latest source would be pointless. 98.9 Commercial Support ----------------------- Following companies provide consultancy and support contracts for ZXID: * symlabs.com 99 Appendix: Schema Grammars ============================ Large parts of ZXID code are generated from +schema grammars+ which are a convenient notation for describing XML schmata. This appendix contains the schema grammars that are currently implemented and distributed in the ZXID package. <> 99.1 SAML 2.0 ------------- 99.1.1 saml-schema-assertion-2.0 (sa) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.1.2 saml-schema-protocol-2.0 (sp) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.1.4 saml-schema-metadata-2.0 (md) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.2 SAML 1.1 ------------- 99.2.1 oasis-sstc-saml-schema-assertion-1.1 (sa11) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.2.2 oasis-sstc-saml-schema-protocol-1.1 (sp11) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.3 Liberty ID-FF 1.2 ---------------------- 99.3.1 liberty-idff-protocols-schema-1.2 (ff12) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.3.2 liberty-metadata-v2.0 (m20) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.3.3 liberty-authentication-context-v2.0 (ac) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.4 Liberty ID-WSF 1.1 ----------------------- 99.4.1 liberty-idwsf-soap-binding-v1.2 (b12) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.4.2 liberty-idwsf-security-mechanisms-v1.2 (sec12) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.4.3 liberty-idwsf-disco-svc-v1.2 (di12) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.4.5 liberty-idwsf-interaction-svc-v1.1 (is12) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.5 Liberty ID-WSF 2.0 ----------------------- 99.5.1 liberty-idwsf-utility-v2.0 (lu) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.5.2 liberty-idwsf-soap-binding (no version, sbf) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.5.3 liberty-idwsf-soap-binding-v2.0 (b) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.5.4 liberty-idwsf-security-mechanisms-v2.0 (sec) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.5.5 liberty-idwsf-disco-svc-v2.0 (di) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.5.6 liberty-idwsf-interaction-svc-v2.0 (is) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.6 SOAP 1.1 Processors ------------------------ 99.6.1 saml20-soap11 (se) ~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.6.2 wsf-soap11 (e) ~~~~~~~~~~~~~~~~~~~~~ <> >> 99.6.3 ds-soap11 (dise) ~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.7 XML and Web Services Infrastructure ---------------------------------------- 99.7.1 xmldsig-core (ds) ~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.7.2 xenc-schema (xenc) ~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.7.3 ws-addr-1.0 (a) ~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.7.4 wss-secext-1.0 (wsse) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> 99.7.5 wss-util-1.0 (wsu) ~~~~~~~~~~~~~~~~~~~~~~~~~ <> >> <> <README ZXID

README ZXID

>> <>