My research is in the intersection of Parallel & Distributed Computing and Data-driven Science and Engineering, including:
My research is in the intersection of Parallel & Distributed Computing and Data-driven Science and Engineering, including:
The Virtual Data Collaboratory (NSF Data Infrastructure Building Blocks) is a federated data cyberinfrastructure that is designed to drive data-intensive, interdisciplinary and collaborative research, and enable data-driven science and engineering discoveries. VDC accomplishes this by providing seamless access to data and tools to researchers, educators, and entrepreneurs across a broad range of disciplines and scientific domains as well as institutional and geographic boundaries. VDC is federated and coordinated across three geographically distributed Rutgers University campuses in New Jersey and multiple campuses in Pennsylvania and New York. Central to the VDC vision are three infrastructural innovations, a regional science data science DMZ network that provides services to enable efficient and transparent access to data and computing capabilities, an expandable and scalable architecture for data-centric infrastructure federation, and a data services layer to support research workflows that utilize cutting-edge semantic web technologies, support interdisciplinary research, expand access, and increase the impact of data-science. VDC builds on and integrates existing data repositories, including NSF-funded repositories (e.g., OOI), and has the potential to leverage national ACI investments and interoperate with multiple data repositories.
This project develops a real-time processing system capable of handling a large mix of sensor observations. The focus of this system is automation of the detection of natural hazard events using machine learning, as the events are occurring. A four-organization collaboration (UNAVCO, University of Colorado, University of Oregon, and Rutgers University) develops a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research. This work will support rapid analysis and understanding of data associated with hazardous events (earthquakes, volcanic eruptions, tsunamis). GeoSCIFramework proposes an innovative approach that looks at the world with a fly’s eye perspective where a composite of thousands of harmonized high-rate real-time GNSS, seismic, pressure and other sensors will continuously stream data into an integrated framework and combined with a background of satellite radar time series generated at a 100 meter pixel level across the globe and made available through XSEDE.
Dynamo will allow atmospheric scientists and hydrologists to improve short- and long-term weather forecasts, and aid the oceanographic community to improve key scientific processes like ocean-atmosphere exchange, turbulent mixing etc., both of which have direct impact on our society. The Dynamo project develops innovative network-centric algorithms, policies and mechanisms to enable programmable, on-demand access to high-bandwidth, configurable network paths from scientific data repositories to national cyberinfrastructure facilities, and help satisfy data, computational and storage requirements of science workflows. This will enable researchers to test new algorithms and models in real time with live streaming data, which is currently not possible in many scientific domains. Through enhanced interactions between Pegasus, the network-centric platform, and new network-aware workflow scheduling algorithms, science workflows will benefit from workflow automation and data management over dynamically provisioned infrastructure. The system will transparently map application-level, network Quality of Service expectations to actions on programmable software defined infrastructure.
https://sites.google.com/view/dynamo-nsf/
NSF funded an experimental green datacenter called Parasol, which has previously demonstrated that the combination of green design and intelligent software management systems can lead to significant reductions in energy consumption, carbon emission, and cost. The enhanced version of this project will update energy sources, network technologies and management software. Running real experiments in live conditions using Parasol led to findings that were not possible in simulation. This research seeks to update and enhance Parasol with current and next generation power-efficient servers, improve network connectivity and integrate software-defined networking (SDN) and Wi-Fi capabilities, increase solar energy generation capacity, add a low emission fuel cell power source, diversify energy storage, and improve the cooling system to advance green computing.
The SUSTAM associate team focuses on the joint design of a multi-criteria orchestration framework dealing with resources, data and energy management in an sustainable way. The SUSTAM associated team enables a long-term collaboration between the Inria Avalon team and the Rutgers Discovery Informatics Institute (RDI2).
ENVRI-FAIR is the connection of the ESFRI Cluster of Environmental Research Infrastructures (ENVRI) to the European Open Science Cloud (EOSC). Participating research infrastructures (RI) of the environmental domain cover the subdomains Atmosphere, Marine, Solid Earth and Biodiversity/Ecosystems and thus the Earth system in its full complexity. The overarching goal is that at the end of the proposed project, all participating RIs have built a set of FAIR data services which enhances the efficiency and productivity of researchers, supports innovation, enables data- and knowledge-based decisions and connects the ENVRI Cluster to the EOSC. This goal is reached by: (1) well defined community policies and standards on all steps of the data life cycle, aligned with the wider European policies, as well as with international developments; (2) each participating RI will have sustainable, transparent and auditable data services, for each step of data life cycle, compliant to the FAIR principles. (3) the focus of the proposed work is put on the implementation of prototypes for testing pre-production services at each RI; the catalogue of prepared services is defined for each RI independently, depending on the maturity of the involved RIs; (4) the complete set of thematic data services and tools provided by the ENVRI cluster is exposed under the EOSC catalogue of services. Coordinated by FORSCHUNGSZENTRUM JULICH GMBH, Germany.
EMSO-Link is a 3-year project underpinning the long-term sustainability of EMSO ERIC, the pan-European distributed Research Infrastructure (RI) composed of fixed point open ocean observatories for the study and monitoring of European seas. EMSO pursues the long-term objective to be part of the upcoming European Ocean Observing System (EOOS), which is expected to integrate multiple platforms and data systems, including other ERICs, to achieve the first sustained, standardized and permanent observatory network of the European seas. EMSO ERIC coordinates the access to the facilities and supports the management of data streams from EMSO observatories.
EMSO-Link will accelerate the establishment of EMSO ERIC governance rules and procedures and will facilitate the coordination of EMSO infrastructure construction, operation, extension and maintenance. Coordinated by EMSO ERIC, Italy.
Although the Ocean is a fundamental part of the global system providing a wealth of resources, there are fundamental gaps in ocean observing and forecasting systems, limiting our capacity in Europe to sustainably manage the ocean and its resources. Ocean observing is “big science” and cannot be solved by individual nations; it is necessary to ensure high-level integration for coordinated observations of the ocean that can be sustained in the long term. EuroSea brings together key European actors of ocean observation and forecasting with key end users of ocean observations, responding to the Future of the Seas and Oceans Flagship Initiative. Our vision is a truly interdisciplinary ocean observing system that delivers the essential ocean information needed for the wellbeing, blue growth and sustainable management of the ocean. EuroSea will strengthen the European and Global Ocean Observing System (EOOS and GOOS) and support its partners. EuroSea will increase the technology readiness levels (TRL) of critical components of ocean observations systems and tools, and in particular the TRL of the integrated ocean observing system. EuroSea will improve: European and international coordination; design of the observing system adapted to European needs; in situ observing networks; data delivery; integration of remote and in-situ data; and forecasting capability. EuroSea will work towards integrating individual observing elements to an integrated observing system, and will connect end-users with the operators of the observing system and information providers. EuroSea will demonstrate the utility of the European Ocean Observing System through three demonstration activities focused on operational services, ocean health and climate, where a dialogue between actors in the ocean observing system will guide the development of the services, including market replication and innovation supporting the development of the blue economy.
The GreenHPC initiative at Rutgers is a research and educational initiative aiming at addressing several efforts in the intersection of energy efficiency, scalable computing and high performance computing. Key focus areas include (1) Energy efficiency of scientific data analysis pipelines at scale, (2) In-situ data analytics and co-processing at extreme scales and (3) Application-aware cross-layer power management for High Performance Computing systems.
GreenHPC also acts as a forum for researchers and the educational community to exchange ideas and experiences on energy efficiency by disseminating research results, educational activities at different levels (PhD, MS, undergraduate – REU, K12) and organizing editorial and event activities.
The overarching goal of this project is to develop mechanisms and techniques based on novel software and technologies to accelerate medical image processing algorithms and enable their execution at scale on high performance and distributed computing systems. In collaboration with the Center for Biomedical Imaging & Informatics at Rutgers Robert Wood Johnson Medical School and the Cancer Institute of New Jersey.
The NSF OOI is an integrated infrastructure project composed of science-driven platforms and sensor systems that measure physical, chemical, geological and biological properties and processes from the seafloor to the air-sea interface. The OOI network was designed to address critical science-driven questions that will lead to a better understanding and management of our oceans, enhancing our capabilities to address critical issues such as climate change, ecosystem variability, ocean acidification, and carbon cycling. OOI serves data from 57 stable platforms and 31 mobile assets, carrying 1227 instruments (~850 deployed), providing over 25,000 science data sets and over 100,000 scientific and engineering data products. The OOI has been built with an expectation of operation for 25 years.
Architected, delivered and operated a robust CI infrastructure system for the NSF Ocean Observatories Initiative with established processes based on best practices, which has provided and continues providing extremely high uptime and quality of service, has collected and curated half a Petabyte of data and has served over 150 million user requests, and delivered over 100TB of data to user from over 100 distinct countries across the globe. This CI architecture is being used as model by other NSF facilities.
Architected and deployed Caliburn, the largest Supercomputer in NJ, at Rutgers, ranked #2 among Big Ten Universities and #8 among US Academic Institutions (June 2016 Top500 List). With over 23,000 cores, Caliburn can perform over 800 trillion floating point operations per second and it is among the first clusters to use the Intel Omni-Path fabric and equip its compute notes with NVMe (non-volatile memory express) devices, making Caliburn a unique asset. Lead a team of IT engineers for the systems’ operation and user support and delivered hundreds of millions of computing hours to researchers and students across New Jersey, driving research and innovation in all areas of science, engineering and medicine.
Caliburn report: https://rdi2.rutgers.edu/sites/default/files/inline-files/Caliburn_ACI_Report_201906.pdf
CAPER is a unique and flexible instrument funded by The National Science Foundation that combines high performance Intel Xeon processors with a complete deep memory hierarchy, latest generation co-processors, high performance network interconnect and powerful system power instrumentation. This hardware configuration is unprecedented in its flexibility and adaptability as it can combine multiple components into a smaller set of nodes to reproduce specific configurations. This platform also mirrors key architectural characteristics of high-end systems, such as XSEDE’s Stampede system at TACC, and provides several unique features to support critical research goals such as software/hardware co-design. CAPER provides a platform to validate models and investigate important aspects of data-centric and energy efficiency research, including application-aware cross-layer power management, energy/performance tradeoffs of data-centric workflows, software/hardware co-design for in-situ data analytics and thermal implications of proactive virtual machine management.
In March 2013 a joint team of researchers from Rutgers Discovery Informatics Institute and Computational Physics and Mechanics Laboratory at Iowa State University launched a large scale computational experiment to gather the most comprehensive to date information on the effects of pillars on microfluid channel flow. The experiment is unique as it demonstrated that a single user operating entirely in a user-space can federate multiple, geographically distributed and heterogeneous HPC resources, to obtain a platform with cloud-like capabilities able to solve large scale computational engineering problems. This work was presented to the Uber-Cloud Experiment (Round 2 – Team 53). Further details of the experiment, results and findings can be found at http://nsfcac.rutgers.edu/CometCloud/uff/.
The mission of co-design within the Center for Exascale Simulation of Combustion in Turbulence (ExaCT) is to absorb the sweeping changes necessary for exascale computing into software and ensure that the hardware is developed to meet the requirements to perform these real-world combustion computations. The group developed cutting-edge statistical and information technologies and to bring quantitative rigor and efficiency to scientific investigations. My research was focused on the analysis and exploration of data, the collection and organization of data, and decisions based on data, which includes scientific data management and analytics for in situ uncertainty quantification and topological analysis, to explore hardware tradeoffs with combustion proxy applications (e.g., S3D and AMR codes) representing the workload to the exascale ecosystem. The group included Oak Ridge National Laboratory, Sandia National Laboratories, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Pacific Northwest National Laboratory, and University of Utah.
http://exactcodesign.org
The overarching goal of this project is understanding science and engineering application formulations that are meaningful in a hybrid federated cyberinfrastructure that includes clouds, and to explore programming and middleware support that can enable these applications, including application formulations, programming models, abstractions and systems, and middleware stacks and services. Outcomes of these efforts include cloud-based cyberinfrastructure federation models, contributions to CometCloud, an autonomic computing cloud engine (http://cometcloud.org) e.g., a system for accelerating asynchronous replica exchange on large-scale distributed heterogeneous HPC resources as part of the IEEE International Scalable Computing Challenge (SCALE 2012).
The I/UCRC focuses on multi university research on improving the design and engineering systems that are capable of funning themselves, adapting their resources and operations to current workloads and anticipating the needs of their users. The project aims at improving hardware, networks and storage, middleware, service and information layers used by modern industry.
Research on architectures, compilers, operating systems, tools and algorithms.
Spanish Ministry of Science and Technology (CICYT), TIN2007-60625
Principal Investigator: Mateo Valero
MareIncognito is a cooperative project between IBM and the Barcelona Supercomputing Center (BSC) targeting the design of relevant technologies on the way towards exascale. The initial challenge of the project was to study the potential design of a system based on a next generation of Cell processors. Even so, the approaches pursued are general purpose, applicable to a wide range of accelerator and homogeneous multicores and holistically addressing a large number of components relevant in the design of such systems. Contributions to the work package on load balancing.
XtreemOS is a grid operating system based on Linux. The main particularity of XtreemOS is that it provides for Grids what a traditional operating system offers for a single computer: hardware transparency and secure resource sharing between different users. It thus simplifies the work of users by giving them the illusion of using a traditional computer while removing the burden of complex resource management issues of a typical Grid environment. When a user runs an application on XtreemOS, the operating system automatically finds all resources necessary for the execution, configures user’s credentials on the selected resources and starts the application. Contributions to WP3.3: application execution management.
Funding: EU IST-FP6, grant agreement ID 033576
LA Grid is an international multi-disciplinary research community and virtual computing grid enabling institutions and industry to extend beyond their individual reach to facilitate collaborative IT research, education and workforce development. LA Grid is the first-ever comprehensive computing grid to link faculty, students, and researchers from institutions across the United States, Latin America and Spain to collaborate on complex industry applications for business and societal needs in the context of healthcare, life sciences and disaster mitigation. Contributions to the Meta-scheduling and workflow project in partnership with IBM. Further information can be found at http://lagrid.fiu.edu.
The CoreGRID Network of Excellence (NoE) aims at strengthening and advancing scientific and technological excellence in the area of Grid and Peer-to-Peer technologies. To achieve this objective, the Network brings together a critical mass of well-established researchers (161 permanent researchers and 164 PhD students) from forty-one institutions who have constructed an ambitious joint program of activities. This joint program of activity is structured around six complementary research areas that have been selected on the basis of their strategic importance, their research challenges and the recognized European expertise to develop next generation Grid middleware. The Network is operated as a European Research Laboratory (known as the CoreGRID Research Laboratory) having six institutes mapped to the areas that have been identified in the joint program of activity. Contributions to the institute on resource management and scheduling. Further information can be found at http://coregrid.ercim.eu.
Funding: EU FP6, grant agreement ID 004265 (budget: €8.2M)
This is a national initiative to coordinate the different activities regarding Grid technologies which are taken at different research institutions and universities in Spain.
Funding: Spanish Ministry of Science and Technology, TIN2002-12422-E / TIN2005-25849-E
Research on architectures, compilers, operating systems, tools and algorithms.
Principal Investigator: Mateo Valero.
Funding: Spanish Ministry of Science and Technology (CICYT), TIN2007-60625
The ‘Pan-European research infrastructure on high performance computing for the science of 21st century’ (HPC-Europa) project, an Integrated Infrastructure Initiative, was established to provide the European research community with advanced computational services in an integrated way. To this end, the EU-funded project team concentrated on delivering a wide range of services as well as access to first-class high-performance computing (HPC) platforms and an advanced computational environment. Developed a single point of access to a federated grid infrastructure composed of high-performance computing systems across Europe as part of the JRA2 (http://www.hpc-europa.org).
Funding: EU FP6-2002-INFRASTRUCTURES-1, grant agreement ID 506079 (budget: €13M)
This research investigated job scheduling and resource allocation policies for high-performance computing systems, distributed computing systems (i.e., Grids and Clouds) and the combination/federation of both types of computing systems. Meta-scheduling and meta-broking policies were developed to enable the interoperability of computing systems. A cross-layer approach from the meta-scheduling policy definition to the performance characterization of high-performance systems and applications enables an efficient allocation of resources while delivering higher quality of service to the users (e.g., reduced response time).
With Avirtek and University of Arizona.
With FifthGen Corp.
With Intel Corp.
With Xerox Corp.
With IBM T.J. Watson Research Center.
With IBM T.J. Watson Research Center.