Installers Guide

This guide supports version 1.2 of the Dataverse Network application.

This installation guide is intended for anyone who chooses to install the Dataverse Network (DVN) software. If you encounter any problems during installation, maintenance or upgrades, please contact the DVN development team at dvn_support@help.hmdc.harvard.edu.

This guide describes the following:

Before You Begin Installation

Tools required for installation include a minimum of the following:

Before you install the DVN application, make sure that your installation platform meets the system requirements needed to support installation. System requirements for installation of the DVN software are:

Installation Checklist

Use the following checklist to plan installation of the DVN. Detailed descriptions of each entry are provided in the next topics:

  1. Prepare the system:
    1. Set up the server TCP ports.
    2. Install the required applications and tools: GlassFish, PostgreSQL, and JDBC driver.
    3. Configure GlassFish server.
  2. Install the basic DVN:
    1. Install the DVN core code.
    2. Install the DSB services.
  3. Set up optional components:
    1. Configure Handle.net to enable use UNF registration.
    2. Configure Z39.50 protocol to support additional access to your Network.
    3. Configure Google Analytics to enable use of web statistics.
  4. Set up optional clustering to provide greater resources for the Network.

Prepare the System

Deployment of the DVN core is dependent on a minimum installation of the following:

Deployment of the DSB component is dependent on the following minimum installation:

Note: Red Hat Linux is the recommended OS, because it is the system in use at the Project and is the distribution tested most thoroughly. However, after the required components are installed, the DSB component should work under any Linux or UNIX system. All four systems are standard and well-supported packages.

See the following topics for detailed information about how to prepare the system:

Set Up Server Ports

To install the DVN software, make sure that all necessary ports are open and unrestricted. If access is not available, installation or configuration can fail without obvious cause.

Ports that are required to be open locally on the server consist of the following:

  • TCP 2641
  • TCP 8000
  • TCP 8080
  • TCP 8081
  • TCP 8686
  • TCP 80 (for web service)
  • TCP 443 (for secure web service)
  • TCP 4848 (for GlassFish web admin)
  • TCP 5432, open only to hosts in the server room (for PostgreSQL)

Install Required Components

Install the dependent applications:

Install GlassFish

To install GlassFish, go to https://glassfish.dev.java.net/public/downloadsindex.html to download the current application.

  1. Get the appropriate binary build installer for your platform.
    The DVN uses V2-UR1 at this time; therefore, for a Linux OS the installer to use is glassfish-installer-v2ur1-b09d-linux.jar.
  2. Set the JAVA_HOME variable to your JDK 6 directory, /usr/java/<jdk 1.6 directory>.
  3. To initiate the GlassFish installation, ensure that you have an X-Windows server on a host that can receive the licensing agreement. The recommended host is a VM running Linux and X-Windows on a PC.
    If you do not have an X-Windows server running, the installation cannot provide feedback.
  4. Type the command java -Xmx256m -jar <filename>.jar.
    For example, type java -Xmx256m -jar glassfish-installer-v2ur1-b09d-linux.jar.
  5. When the glassfish directory is created by the jar file, change to that directory.
    Type cd glassfish.
  6. Use the Ant tool to execute the setup script. You can use the version in the GlassFish distribution located in the folder glassfish/lib/ant.
    Type the following:
    set ANT_HOME=glassfish
    chmod -R +x lib/ant/bin
    lib/ant/bin/ant -f setup.xml

Install PostgreSQL

To install PostgreSQL, perform the short version of the installation instructions at the PostgreSQL website. On the Documentation page, select the Manuals option, and then choose the appropriate version to read.

  1. Open the pgAdmin tool that comes with PostgreSQL, /usr/local/pqsql/bin.
    Note: If you choose to use the grapical pgadminIII tool, it requires access to port 5432 on the server or through a secure shell (SSH) tunnel. Instructions for this tool are not provided here.
  2. Log in as super user and create a new login role for the owner of the Dataverse Network's database.
    1. Type:
      su - postgres
      cd /usr/local/pgsql/bin
      ./createuser -lPE dvnApp
    2. Enter the password, and then enter it again.
    3. Respond no to the superuser prompt.
    4. Respond yes to the create databases prompt.
    5. Respond yes to the create new roles prompt.
  3. Create the Dataverse Network database with UTF8 encoding and the new login role (dvnApp) as the owner.
    For example, type the following: ./createdb dvnDb --owner=dvnApp
    Note: If you create the database and user within the psql interactive shell, you must use quotes around the names to preserve case, as in "dvnApp".
  4. Configure PostgreSQL to listen on all available interfaces, to make it accessible from outside the server.
    Edit the file /usr/local/pgsql/data/postgresql.conf.
    Uncomment the lines listen_addresses and port, and change the first line to listen_addresses='*'.
    Be sure to save and exit the configuration file.
  5. Restart PostgreSQL for the new configuration value to take effect:
    1. Stop GlassFish.
      Type /usr/local/glassfish/bin/asadmin stop-domain domain1.
    2. Restart PostgreSQL.
      Type /usr/local/pgsql/bin/pg_ctl restart -D /usr/local/pgsql/data.
    3. To confirm external access to the PostgreSQL port (5432), telnet to <servername> 5432.

Install JDBC Driver

To install the JDBC driver:

  1. Put postgresql-8.3-603.jdbc4.jar in the <GlassFish directory>/lib folder.
  2. Start GlassFish to invoke the driver.
    Type /usr/local/glassfish/bin/asadmin start-domain domain1.

Configure GlassFish

Log in to the GlassFish admin console to configure GlassFish.
Type http://<hostname>:4848. The default user name is admin, and the default password is adminadmin.

Perform each of the following configuration tasks. The order in which you configure these components is not important:

Configure JDBC Connections

Paths to JDBC configuration items are specified by using the menu tree on the left side of the admin console.

To configure JDBC connection pools and resources:

  1. Select the Resources menu JDBC submenu, and then choose the Connection Pools option.
    Add a new pool with the following characteristics:

    • Name - dvnDbPool
    • ResourceType - javax.sql.DataSource
    • Database Vendor - PostGreSQL
  2. Click Next and then edit or add the following parameters:
    • Edit DataSource ClassName - org.postgresql.ds.PGPoolingDataSource
    • Confirm ResourceType - javax.sql.DataSource
    • Additional Properties:
      • connectionAttributes - ;create=true
      • user - dvnApp
      • portNumber - 5432 (Port 5432 is the PostgreSQL default.)
      • password - <dvnApp's password>
      • databaseName - dvnDb
      • serverName - localhost
      • JDBC30DataSource - true

    Click Finish to complete the configuration.

  3. To verify connectivity to the database, click Ping on the General tab.
    If the Ping succeeds, you see the message Ping Succeeded.
    If the Ping does not succeed, you see an error message. Verify the configuration (specifically, the database set up), database, and user account creation, and confirm access to port 5432.
  4. Select the Resources menu JDBC submenu, and then choose the JDBC Resources option.
    Add a new resource with the following characteristics:

    • JNDIName - jdbc/VDCNetDS
    • PoolName - dvnDbPool

Configure JMS Resources

To configure JMS resources for DSBIngest and indexing:

  1. Select the Resources menu JMSResources option.
  2. Add a new Connection Factory for the DSB Queue.
    Set the following:

    • JNDI Name - jms/DSBQueueConnectionFactory
    • Type - javax.jms.QueueConnectionFactory
  3. Add a new Destination Resource for the DSB Queue.
    Set the following:

    • JNDI Name - jms/DSBIngest
    • Physical Destination Name - DSBIngest
    • Resource Type - javax.jms.Queue.
  4. Add a new Connection Factory for the Index Message.
    Set the following:

    • JNDI Name - jms/IndexMessageFactory
    • Type - javax.jms.QueueConnectionFactory.
  5. Add a new Destination Resource for the Index Message.
    Set the following:

    • JNDI Name - jms/IndexMessage
    • Physical Destination Name - IndexMessage
    • Resource Type - javax.jms.Queue

Configure JavaMail

To configure JavaMail:

  1. Select the Resources menu JavaMail Sessions option, and then click the New button.
  2. Set JNDI Name to mail/notifyMailSession.
  3. Set the mail host to the mail host that you choose to use.
    The recommended set up is to install your mail server on the same machine as GlassFish and use localhost for this entry.
  4. Set Default User to dataversenotify (this does not need to be a real mail account).
  5. Set Default Return Address to do-not-reply@<your mail server hostname>.
  6. Leave the remaining fields set to the defaults.

Configure JVM Options

Configure the following JVM options for the production machine:

  1. In the admin console, go to Application Server.
    Select the JVM Settings tab, and then select the JVM Options subtab.
  2. Change -client to -server.
  3. Remove the following settings. Click the settings' check box and then click the Delete button:
    • -Dsun.rmi.dgc.server.gcInterval=3600000
    • -Dsun.rmi.dgc.client.gcInterval=3600000
  4. Change -Xmx512m to whatever size you can allot for Java heap space.
    The maximum for a 2-gigabyte installation is -Xmx2048m.
  5. Click the Add JVM Option button to add the following:
    • Set the heap space minimum to the same value as the maximum. For a 2-gigabyte installation, add -Xms2048m.
    • Also add:
      -XX:MaxPermSize=128m or -XX:MaxPermSize=192m
      -XX:+AggressiveHeap
      -Xss128k
      -XX:+DisableExplicitGC
      -Dcom.sun.enterprise.ss.ASQuickStartup=false
    • To identify the location of the Search index files, add the following:
      -Ddvn.index.location=/usr/local/glassfish/domains/domain1/config
    • If you are installing on a multi-processor machine, also add the following:
      -XX:+UseParallelOldGC
    • To enable the Google Analytics option on the Network Options page and provide access to site usage reports, add the following:
      -Ddvn.googleanalytics.key=<googleanalyticsTrackingCode>
    • To customize the error logging level and add more information to your log files, add the following JVM option and edit the file to change WARNING to INFO:
      -Djava.util.logging.config.file= /usr/local/glassfish/domains/domain1/config/logging.properties

Configure JVM for Files and DSB

To configure JVM options for files locations and DSB communications:

  1. Using the admin console tool, go to the Application Server.
  2. Select the JVM Settings tab, and then select the JVM Options subtab.
  3. Enter the following JVM Options.
    Note: Add the -Dvdc.temp.file.dir, -Dvdc.export.log.dir, and -Dvdc.import.log.dir values as presented here. The four remaining values are examples only; make the right side of the entry specific to the file setup at your location.

    • Permanent File storage (data and documentation files uploaded to sudies) - Set to the following:
      -Dvdc.study.file.dir=/nfs/iqss/DVN/data
      A minimal default setup on the server's local file system is /usr/local/glassfish/domains/domain1/config/data. This works because it is under the config directory and is in the path.
    • Temporary location used in file upload - Set to the following:
      -Dvdc.temp.file.dir=${com.sun.aas.instanceRoot}/config/files/temp
    • Calls to DSB (used for analysis and file upload) - Set to the following:
      -Dvdc.dsb.host=<DSB server hostname>
      This setting requires either a preconfigured DSB server or the DSB server must be an Apache web server.
      The hostname alone is adequate if using the default, port 80. Otherwise, set
      -Dvdc.dsb.port=<DSB server host port>.
      This setting must be configured for subsettable file uploads, downloads, and subsetting and analysis to work.
    • Export and Import logs (used for Harvesting and for importing studies from VDC to DVN and synchronized back) - Set the following:
      • -Dvdc.export.log.dir=${com.sun.aas.instanceRoot}/logs/export
      • -Dvdc.import.log.dir=${com.sun.aas.instanceRoot}/logs/import
    • Import VDC studies to DVN (used only to import existing studies in the legacy VDC application to the DVN) - Set the following:
      • -Dvdc.legacy.file.dir=/nfs/mra/VDC/data/
      • -Dvdc.repository.url=http://vdc.hmdc.harvard.edu/VDC/Repository/0.1/

Configure HTTP

The values mentioned here are suggested defaults. These settings are very important.

If your server becomes so busy that it drops connections, adjust the Thread Counts to improve the performance.

There are no right values to recommend; the values depend on the specifics of your web traffic, how many requests you get, how long they take to process on average, and the hardware. For more information, refer to the Java Application Server Administration Guide, available at the Sun Microsystems Documentation website.

To configure HTTP and HTTP threads, use the admin console and set up the following:

  1. On the Configuration menu HTTP Service submenu, choose the HTTP Listeners option http-listener-1.
  2. In the Edit HTTP Listener tab set the Listener Port to 80.
  3. In the Advanced section, the recommended setting for the Acceptor Threads is the number of CPUs (or cores, if multi-core) on your server.
  4. On the Configuration menu HTTP Service submenu, choose the RequestProcessing option.
  5. Start with the recommended settings:
    • Thread Count - Twice the number of CPUs (cores) on your server
    • Initial Thread Count - The number of CPUs (cores)
    • Thread Increment - 1
  6. Click Save.

Restart the server to make the configuration take effect.

Install Basic DVN

Installation of the core code in general consists of copying a .jar or .tar archive file to the server, and then extracting the archive. The installation script creates or installs to the correct file structure. After installation, you can remove the original .jar or .tar archive file.

General steps to install the DVN software are described in the following topics:

Install the DVN Core Code

Download the DVN Package

Before you can install the DVN core code, you must download the download the latest
DVN-EAR.ear package. Go to the official DVN SourceForge.net project site and download the latest package.
Then, in a file browser, copy the files in <DVN download directory>/working_directory and paste them in <GlassFish install directory>/domains/domain1/config.

Install the DVN Package

To install the DVN core code:

  1. At the admin console (http://<hostname>:4848/), select the Applications menu Enterprise Applications option.
  2. Click Deploy to add a new application.
  3. In the Location section of the page, navigate to the location of DVN-EAR.ear to choose the file.
    Then, click the OK button on the upper-right side of the page.
  4. When the application is deployed, the required database tables are created in PostgreSQL.
    In the pgAdmin tool, open the database and use the query tool to load and execute the referenceData.sql script.
  5. Open the application at http://<hostname>/dvn.
    Log in by using networkAdmin as both the user name and password.
  6. Change the default password and default e-mail address.
    Click the networkAdmin name on the top-right corner of the main menu, and then click Update Account.

To change any other default settings (banner, footer, about page, and so on), or to create dataverses and start uploading studies and data files, refer to the user guides available at http://thedata.org/guides.

Install the DSB Services

Download the DSB Package

Before you can install the DVN core code, you must download the download the latest
DVN-DSB.rpm package. Go to the official DVN SourceForge.net project site and download the latest package.

You also can download the package from the HMDC web repository at http://porkchop.hmdc.harvard.edu/dvn-dsb/.

Install the DSB Package

To install the DSB package bundle:

  1. Install the rpm.
    For example, type rpm –ivh DVN-DSB-1.1-16a.i386.rpm.
    The rpm installs automatically into /usr/local/VDC.
  2. Apache (httpd) must be configured to work with the DSB components installed.
    For ease of maintenance this configuration is isolated in a standalone file located in the main DSB installation tree. This configuration must be included in the main Apache configuration. The recommendation is that you create a file called
    /etc/httpd/conf.d/00-vdc.conf with the following line in it:
    Include /usr/local/VDC/etc/vdc.conf
    Then, restart httpd.
    A sample copy of 00-vdc.conf is included in the etc directory of the DVN distribution bundle.
  3. After the DSB components are installed and Apache is configured, verify that everything required is present and functioning properly by checking the DSB Diagnose verb.
    Type the command http://<hotname>:[port]/VDC/DSB/1.0/Diagnose.
    This command reports any missing or malfunctioning components.

Set Up Optional Components

You can set up the following optional components to use in the DVN:

Configure Handle.Net

This material is not developed yet.

Configure Z39.50

This capability is not yet packaged with the DVN core code. Check back after future releases to find out if it is part of the downloadable packages at that time.

Configure Google Analytics

Network Admins can use the Google Analytics tools to view DVN website usage statistics.
Note: It take about 24 hours for Google Analytics to validate tracking of your website after registration of the DVN with Google Analytics. Your data is not available until that validation takes place.

To enable the use of Google Analytics from the Network Options page:

  1. Go to the Google Analytics homepage at http://www.google.com/analytics/indexu.html.
  2. Set up a Google account to access the Google Analytics website.
  3. Set up a Google Analytics account to receive a tracking code for your DVN installation.
  4. Copy the tracking code you are assigned and paste it into the content already provided by the DVN software. This ensures that your installation is tracked.
    Use the Google Analytics Help Center to find your tracking code and determine where to copy that code into your DVN content.
  5. Make sure that the GlassFish server configuration includes the JVM option Ddvn.googleanalytics.key assignment for the tracking code. See Configure JVM Options for details.

Set Up Optional Clustering

This material is not developed yet.

Maintain Installation

Maintenance includes the following tasks:

Check GlassFish

Perform the following when needed:

  • Check the GlassFish logs periodically.
    Log files are located in <GlassFish directory>/domains/domain1/logs. The main log file is server.log.
  • Restart GlassFish.
    The asadmin utility is located in <GlassFish directory>/bin. You can use this utility to perform the following:

    • If you are in the top level of the GlassFish directory, the command to stop GlassFish is:
      bin/asadmin stop-domain domain1
    • To restart GlassFish, type:
      ulimit -n 32768
      bin/asadmin start-domain domain1
    • An initialization script is provided for installation on Red Hat.
      Save the script as /etc/rc.d/init.d/glassfish, and then type the following:
      chmod +x /etc/rc.d/init.d/glassfish
      chkconfig --add glassfish

      You then can stop and start GlassFish by using the service command as follows:
      service glassfish stop
      service glassfish start

      Note: If you installed GlassFish anywhere other than /usr/local/glassfish, you must change the following line in the GlassFish script to the location in which you installed the application:
      ASADMIN=/usr/local/glassfish/bin/asadmin

Check PostgreSQL and Apache

Check the following components regularly:

  1. To make changes in the PostgreSQL database, you must restart GlassFish to refresh the cache.
    The Project plans to add an administration function that enables you to do that without restarting GlassFish, but that is not available at this time.
  2. If the DSB is down, make sure that Apache is running. Restart Apache if needed.
  3. Be sure to check the Apache server logs regularly to ensure reliable service to DVN data.

Back Up Data

The DVN software does not provide automatic back ups of your application data in the database or of your files in the file system.

Plan to do regular back ups of all data and files, to ensure that users have reliable access to current data.

Upgrade Installation

To upgrade the full DVN installation, you must do each of the following:

Upgrade DVN Core Code

Upgrading to Version 1.2

If you upgrade from an existing installation of the DVN core code to version 1.2 of the code, check the following GlassFish configuration file:

<glassfish directory>/config/asenv.conf

Ensure that the AS_JAVA setting contains the <JVM V1.6 (JDK 5) install directory>.

Upgrading the DVN Core Code

To upgrade the DVN core code:

  1. Get the latest DVN-EAR.ear and buildupdate.sql database script from SourceForge.net.
  2. At the admin console, select the Applications menu Enterprise Applications option.
  3. Select Redeploy next to DVN-EAR in the Action column on the right.
    A file selection page is displayed, which is similar in function to the initial deployment page.
  4. Use the Location section to navigate to and choose the new VDCNet-EAR.ear file, and then click OK.
    The application is deployed. Do not be alarmed by SQL warnings that state basically that the required tables exist already in the database.
  5. Any required changes to the database are incorporated into the buildupdate.sql file.
    Open the database in pgAdmin, and then open and run buildupdate.sql in the query tool in the same way as referenceData.sql.
    After you run the script, you can log in to the homepage.

If there were no changes to the database, you do not need to run the buildupdate.sql script.

Upgrade DSB Services

Download the new DSB package RPM from SourceForge.net.

Then, follow the steps described in the section Install DSB Services.

Upgrade Dependent Components

Install new versions of GlassFish, PostgreSQL, and JVM applications in your production environment only after you test them for the DVN build you are running. Some DVN builds might require a new version of a dependent application.