Cluster / Large Scale Computing

 

Cluster / Large Scale Computing

We’ve all been there. You know the moment: sitting at your computer just wondering when it would finish the current task so you could move on with your life. If you have a new computer, this probably does not happen to you very often. That is, of course, unless you wish to run complex mathematical operations or simulations on your computer. To effectively run such software, one must rely upon the power of super computers.

Simply put, a super computer is a computer with access to vast amounts of resources. This could mean more processors, faster processor speed, more memory, or larger hard drive space. Here at NC State, there are three main high performance computing (HPC) facilities. They are the ITD’s High Performance Center, LSF, the PAMS Beowulf cluster, and Wolfgrid.

OIT High Performance Computing Services

NC State University Office of Information Technology provides High Performance Computing (HPC) services available to all NC State faculty members. The HPC service includes compute resources, scratch storage using high performance parallel file system, low latency networks, licensed system and application software, and scientific support for use of the resources.

The primary computational resource is a Linux cluster with approximately 1,000 nodes and approximately 10,000 cores. Most nodes are purchased by faculty partners. Partners have dedicated queue with access to resources they have added and priority access to the general queues. When not in use by partners, partner nodes are available to the general community. There are a number of specialized compute nodes with GPUs or large memory (up to 512GB). More than 4 petabytes of storage is attached to the cluster, including scratch, home, and archive storage areas. Scratch storage space uses the high performance IBM Spectrum Scale file system (formerly GPFS).  Archive and home storage space is backed up to an off campus location. Distributed memory applications use either 10Gb/sec Ethernet or InfiniBand networks dedicated for message passing. A separate 10Gb/sec network provides access to storage, resource management, and job scheduling.

C, C++, and Fortran compilers from Gnu, Portland Group, and Intel are available on HPC Linux cluster along with a suite of licensed applications (such as Abaqus, Amber, Ansys, and Gaussian) and many open source applications and libraries. Resource management and job scheduling on the cluster is done by IBM Platform Load Sharing Facility (LSF).

A group of computational scientists support use of the HPC resources. They assist with issues running jobs; code porting, optimization, and debugging; consulting; and collaboration on research projects. Limited amount of computational scientist effort is available to all HPC service as consulting. Projects that require more extensive computational scientist effort are expected to include direct contract or grant support for the collaboration.

The HPC cluster is located in secure campus data center space that is shared with university enterprise systems. Redundant power and cooling systems are provided for the data center using a UPS with backup diesel generator and onsite emergency chiller. The data center is monitored 24 hours a day, 7 days a week by on campus operators. Data center access is controlled by card readers and monitored by video surveillance. The physical security of the data center complies with NIST 800-171.

ECE HYDRA Farm

Provide details.

CSC ARC Cluster

1728 cores on 108 compute nodes integrated by Advanced HPC. All machines are 2-way SMPs with AMD Opteron 6128 (Magny Core) processors with 8 cores per socket (16 cores per node). For details, including how to get access: http://moss.csc.ncsu.edu/~mueller/cluster/arc/