Prepare the System

Before you install the DVN software, prepare the environment as follows:

Set Up Server Ports

Before you install the DVN software, make sure that all necessary ports are open and unrestricted. If access is not available, installation or configuration can fail without obvious cause.

Configure the system firewall to open the following ports:

  • TCP 2641
  • TCP 8000
  • TCP 8080
  • TCP 8081
  • TCP 8686
  • TCP 80 (for web service)
  • TCP 443 (for secure web service)
  • TCP 4848 (for GlassFish web admin)
  • TCP 5432 (for PostgreSQL)

Install Required Components

Install the dependent applications:

Install Java SE

Install the minimum Java platform version on your installation system. Go to http://java.sun.com/javase/downloads/index.jsp and download and install this version.

The default directory in which the JDK is installed is <jdk>=/usr/local/jdk<version>.

Install GlassFish

To install GlassFish, go to https://glassfish.dev.java.net/public/downloadsindex.html to download the current application.

  1. Get the appropriate binary build installer for your platform.
    The DVN uses V2-UR1 at this time; therefore, for a Linux OS the installer to use is glassfish-installer-v2ur1-b09d-linux.jar.
  2. Set the JAVA_HOME variable to your JDK 6 directory, /usr/java/<jdk 1.6 directory>.
  3. To initiate the GlassFish installation, ensure that you have an X-Windows server on a host that can receive the licensing agreement. The recommended host is a VM running Linux and X-Windows on a PC.
    If you do not have an X-Windows server running, the installation cannot provide feedback.
  4. Type the command java -Xmx256m -jar <filename>.jar.
    For example, type java -Xmx256m -jar glassfish-installer-v2ur1-b09d-linux.jar.
  5. When the glassfish directory is created by the jar file, change to that directory.
    Type cd glassfish.
  6. Use the Ant tool to execute the setup script. You can use the version in the GlassFish distribution located in the folder glassfish/lib/ant.
    Type the following:
    set ANT_HOME=glassfish
    chmod -R +x lib/ant/bin
    lib/ant/bin/ant -f setup.xml

Install PostgreSQL

To install PostgreSQL, perform the short version of the installation instructions at the PostgreSQL website. On the Documentation page, select the Manuals option, and then choose the appropriate version to read.

  1. Open the pgAdmin tool that comes with PostgreSQL, /usr/local/pqsql/bin.
    Note: If you choose to use the grapical pgadminIII tool, it requires access to port 5432 on the server or through a secure shell (SSH) tunnel. Instructions for this tool are not provided here.
  2. Log in as super user and create a new login role for the owner of the Dataverse Network's database.
    1. Type:
      su - postgres
      cd /usr/local/pgsql/bin
      ./createuser -lPE dvnApp
    2. Enter the password, and then enter it again.
    3. Respond no to the superuser prompt.
    4. Respond yes to the create databases prompt.
    5. Respond yes to the create new roles prompt.
  3. Create the Dataverse Network database with UTF8 encoding and the new login role (dvnApp) as the owner.
    For example, type the following: ./createdb dvnDb --owner=dvnApp
    Note: If you create the database and user within the psql interactive shell, you must use quotes around the names to preserve case, as in "dvnApp".
  4. Configure PostgreSQL to listen on all available interfaces, to make it accessible from outside the server.
    Edit the file /usr/local/pgsql/data/postgresql.conf.
    Uncomment the lines listen_addresses and port, and change the first line to listen_addresses='*'.
    Be sure to save and exit the configuration file.
  5. Restart PostgreSQL for the new configuration value to take effect:
    1. Stop GlassFish.
      Type /usr/local/glassfish/bin/asadmin stop-domain domain1.
    2. Restart PostgreSQL.
      Type /usr/local/pgsql/bin/pg_ctl restart -D /usr/local/pgsql/data.
    3. To confirm external access to the PostgreSQL port (5432), telnet to <servername> 5432.

Install R

To install the R packages:

  1. Obtain the R rpm.
    You can get it from the official CRAN web site at http://cran.r-project.org/ or from a mirror. If the version recommended by the Project is no longer on CRAN, please contact the DVN team. Also, it is important to get the rpm that matches your architecture. Most of the tasks performed by R are CPU-intensive, and if you have 64-bit hardware, you will observe significant performance gains by using the native 64-bit version.
    Depending on your hardware, download one of the following rpms:

    • R--x.x.i386.rpm
    • R--x.x.i686.rpm
    • R--x.x.x86_64.rpm or amd64
  2. Install the rpm by typing the following command:
    rpm –ivh R-<version>-<subversion>.rpm
  3. Obtain and build all third-party modules.
    The DSB installation comes with a script that builds these modules through the R package's built-in mechanism for resolving such dependencies:

    1. Make sure your system has all the necessary build components installed. (Building R modules involves compiling binaries from sources). 
The following packages are available as RedHat rpms, however, they are not installed with RedHat's standard server configuration, so make sure you obtain and install them:

      • tk-devel
      • tcl-devel
      • xorg-x11-devel
      • libpng-devel
      • gcc-g77 (Fortran 77)
    2. To install these packages on a RedHat 4 system, use the following command:
      up2date <build components>
      To install these packages on a RedHat 5 system, use the new RedHat yum framework.
  4. Download and build the modules:
    cd /usr/local/VDC/R
    ./installR.sh

    This produces a significant amount of debugging info, which also is saved in the file /tmp/RINSTALL.<PID>.LOG. You probably will see a few warning messages about the following packages at the end of the debugging output: rgenoud, anchors, and MCMCpack. You can ignore these specific warnings. If, however, you see other warnings or errors, please report them to the DVN team.
    Note: For this release, you must install the VDCutil package manually, because it is not availble through CRAN. An addendum README file that describes this installation is included in the SourceForge downloads.

The following is the default directory in which the R packages are installed:
<R>=/usr/local/VDC/R

Install RServe

Download the RServe service from the following RServe web site:
http://www.rforge.net/Rserve/files/

Refer to the installation documentation at http://www.rforge.net/Rserve/doc.html for detailed information about how to install the service.

Note: RServe and the DSB component must be installed on the same server. Also, do not install RServe from CRAN by using the R install.packages command, because the version on CRAN often is out of date.

To install RServe:

  1. Download the Rserve_0.5-<version>.tar.gz package, place it in the current R directory and execute the following command as the root user:
    R CMD INSTALL Rserve_0.5-X.tar.gz
  2. Create a configuration file to define the following:
    • Assign the directory in which RServe will store temporary files.
    • Assign the communication port on which Rserve will accept connections.
    • Assign the user account under which the daemon will run. 

      Make sure that the user account has write permission in both the temp directory and the DSB temp directory tree, /tmp/VDC.
    • Choose the account password.

    See the example RServe /etc/Rserv.conf configuration file.
    The file /etc/Rserv.pwd stores the RServe access username and password. Note that these do not constitute a UNIX login username and password; these credentials are used by RServe only and cannot be used to open a shell on the system if compromised. An example /etc/Rserv.pwd file is:
    <account> <password>
    For more information on RServe configuration parameters, refer to the RServe documentation at http://www.rforge.net/Rserve/doc.html.

  3. Start the RServe daemon. 
Type the following:
    R CMD Rserve
  4. Create a startup file to make sure that RServe starts at boot time.

The Rserv.conf File

workdir /tmp/Rserv
pwdfile /etc/Rserv.pwd
remote enable
auth required
plaintext disable
fileio enable

port <port number>
maxinbuf 262144

maxsendbuf 0
uid 48

The RServe Startup File

#! /bin/sh
# chkconfig: 2345 99 01
# description: Rserve, /etc/init.d/rserve

case "$1" in
  start)
        echo -n "Starting Rserve daemon: "
        R CMD Rserve
        echo "."
        ;;
  stop)
        echo -n "Stopping Rserve daemon: "
        killall -s 9 Rserve
        echo "."
        ;;
  restart)
        echo -n "Stopping Rserve daemon: "
        killall -s 9 Rserve
        echo "."
        echo -n "Starting Rserve daemon: "
        R CMD Rserve
        echo "."
        ;;
  *)
        echo "Usage: /etc/init.d/rserve {start|stop|restart}"
        exit 1
esac

exit 0

Configure GlassFish

Use the GlassFish Admin Console to configure the server:

  1. Start the server.
  2. Open a browser on the server and go to the following URL:
    http://<hostname>:4848
  3. Log in to the console.
    The default user name is admin, and the default password is adminadmin.
  4. After you complete all configuration tasks, restart GlassFish server to make the configuration take effect.

Perform the following configuration from the Admin Console. The order in which you configure these components is not important:

Configure Resources for JDBC Connections

Configure new Resources for the JDBC with the following settings:

  1. Add a new Connection Pools entry:
    • Name: dvnDbPool
    • Resource Type: javax.sql.DataSource
    • Database Vendor: PostgreSQL
    • DataSource ClassName: org.postgresql.ds.PGPoolingDataSource
    • Additional Properties:
      • ConnectionAttributes: ;create=true
      • User: dvnApp
      • PortNumber: 5432 (Port 5432 is the PostgreSQL default port.)
      • Password: <DVN application database password>
      • DatabaseName: <your database name>
      • ServerName: localhost
      • JDBC30DataSource: true
  2. Configure a new JDBC Resources entry:
    • JNDI Name: jdbc/VDCNetDS
    • Pool Name: dvnDbPool
  3. To verify connectivity to the database, for the new Connection Pool use the
    General functions and perform a Ping operation.
    If the Ping succeeds, you see the message Ping Succeeded.
    If the Ping does not succeed, you see an error message. Verify the database
    configuration, existence, and user account, and confirm access to port 5432.

Configure Resources for JMS Resources

Configure new Resources for the JMS Resources:

  1. Add a new Connection Factory for the DSB Queue:
    • JNDI Name: jms/DSBQueueConnectionFactory
    • Resource Type: javax.jms.QueueConnectionFactory
  2. Add a new Connection Factory for the Index Message:
    • JNDI Name: jms/IndexMessageFactory
    • Resource Type: javax.jms.QueueConnectionFactory
  3. Add a new Destination Resource for the DSB Queue:
    • JNDI Name: jms/DSBIngest
    • Physical Destination Name: DSBIngest
    • Resource Type: javax.jms.Queue
  4. Add a new Destination Resource for the Index Message:
    • JNDI Name: jms/IndexMessage
    • Physical Destination Name: IndexMessage
    • Resource Type: javax.jms.Queue

Configure Resources for JavaMail Sessions

Configure a new Resource for the JavaMail Sessions:

  • JNDI Name: mail/notifyMailSession
  • Mail Host: <your mail server>
    Note: The Project recommends that you install a mail server on the same
    machine as GlassFish and use localhost for this entry.
  • Default User: dataversenotify
    This does not need to be a real mail account.
  • Default Return Address: do-not-reply@<your mail server>

Configure Application Server for JVM Options

For the Application Server, configure the JVM Settings to add, change, or delete the following JVM Options:

  1. Delete the following options:

    -Dsun.rmi.dgc.server.gcInterval=3600000
    -Dsun.rmi.dgc.client.gcInterval=3600000
  2. Change the following options’ settings:
    • Change -client to -server.
    • Change -Xmx512m to whatever size you can allot for the maximum Java heap space.
    • Set –Xms512m to the same value to which you set –Xmx512m.
  3. Add the following options:
    • Add all of the following:


      -XX:MaxPermSize=192m 

      -XX:+AggressiveHeap

      -Xss128k

      -XX:+DisableExplicitGC
      
-Dcom.sun.enterprise.ss.ASQuickStartup=false

      -Djhove.conf.dir=${com.sun.aas.instanceRoot}/config
      
-Ddvn.inetAddress=<address of server on which DVN runs>
    • To install on a multi-processor machine, add the following:

      -XX:+UseParallelOldGC
    • To enable the optional Google Analytics option on the Network Options page and provide access to site usage reports, add the following (see Configure Google Analytics for details): 

      -Ddvn.googleanalytics.key=<googleAnalyticsTrackingCode>
    • To configure permanent file storage (data and documentation files uploaded to studies) set the following: 

      -Dvdc.study.file.dir=${com.sun.aas.instanceRoot}/config/files/studies
    • To configure the temporary location used in file uploads add the following:
      -Dvdc.temp.file.dir=${com.sun.aas.instanceRoot}/config/files/temp
    • To configure export and import logs (harvesting and importing), add the following:
      -Dvdc.export.log.dir=${com.sun.aas.instanceRoot}/logs/export
      -Dvdc.import.log.dir=${com.sun.aas.instanceRoot}/logs/import
    • To manage calls to RServe and the R host (analysis and file upload), add the following:
      -Dvdc.dsb.host=<DSB server hostname>
      -Dvdc.dsb.rserve.user=<account>
      -Dvdc.dsb.rserve.pwrd=<password>
      -Dvdc.dsb.rserve.port=<port number>
      See Install RServe for information about configuring these values in the Rserv.conf and Rserv.pwd files.
      These settings must be configured for subsettable file uploads, downloads, and subsetting and analysis to work. The Dvdc.dsb.host setting requires either a preconfigured DSB server or the DSB server must be an Apache web server. The hostname alone is adequate if using the default, port 80. Otherwise, set the following:
      -Dvdc.dsb.port=<DSB server host port>
  4. The following options configure locations for files that you download when you install the DVN application. You can configure these settings before you download the files, but you must copy the configuration files into the appropriate location after download for these options to function:
    • To configure search index files set the following:

      -Ddvn.index.location=${com.sun.aas.instanceRoot}/config
    • To use the optional customized error logging and add more information to your log files, set the following: 

      -Djava.util.logging.config.file= ${com.sun.aas.instanceRoot}/config/logging.properties
      Note: To customize the logging, edit the logging.properties file to change WARNING to INFO.

Set Up Configuration for HTTP Service

The HTTP Service configuration settings described in this section are suggested defaults. These settings are very important. There are no right values to define; the values depend on the specifics of your web traffic, how many requests you get, how long they take to process on average, and your hardware. For detailed information, refer to the Java Application Server Administration Guide, available at the Sun Microsystems Documentation web site at the following URL:
http://docs.sun.com/app/docs

Note: If your server becomes so busy that it drops connections, adjust the Thread Counts to improve performance.

To configure the GlassFish server’s HTTP Service, set the server’s Configuration options as follows:

  1. Configure the HTTP Service option for HTTP Listeners with the following settings for http-listener-1:
    • General Settings option for Listener Port - 80
    • Advanced option for Acceptor Threads - The number of CPUs (or cores, if multicore) on your server
  2. Configure the HTTP Service option for RequestProcessing with the following initial recommended settings:
    • Thread Count - Twice the number of CPUs (cores) on your server
    • Initial Thread Count - The number of CPUs (cores)
    • Thread Increment - 1

Set Up Configuration for EJB Container

The EJB Container timer service potentially can get corrupted if the Derby database files that store timer data get out of sync. To avoid this problem, use PostgreSQL to store the timer data.

To configure this timer, set the following GlassFish server’s Configuration option:

  1. Go to the EJB Container option for EJB Timer Service.
  2. Configure the Timer Datasource to the following:
    jdbc/VDCNetDS
  3. Save the configuration.