Victor Dudarev, Alfred Ludwig
The accumulation, management, and processing of research data require a robust and sustainable IT infrastructure that can flexibly adapt to evolving scientific requirements and heterogeneous data types. At the same time, such an infrastructure must ensure reliable long-term operation in compliance with the requirements of the German Research Foundation (DFG) and the generally accepted FAIR principles (Findable, Accessible, Interoperable, Reusable).
Within CRC/TRR 247, the INF project therefore focuses on establishing and maintaining a dedicated software and hardware infrastructure that supports continuous research data management, ensures data integrity and availability, and provides sufficient redundancy and disaster recovery capabilities.
During the first funding period of CRC/TRR 247, no dedicated INF project existed. At that time, research data management was based on a Django-based RDMS hosted on a virtual machine at the University of Duisburg–Essen (UDE). In autumn 2022, this system became unavailable following a successful cyberattack on the IT infrastructure of UDE and was effectively lost (see, e.g., contemporary reports in the press 1, 2).
Only after nearly one year and repeated requests, the IT service of UDE was able to provide files of the virtual machine that had been damaged as a result of the attack (see screenshot of the e-mail dated 08.11.2023).

Even though the image was damaged and the virtual machine did not boot due to operating system corruption, through the concerted efforts of the INF project, it was possible to extract the most recent state of the research databases and associated documents as they existed at the time of the attack. The recovered information was fully restored and subsequently migrated to the MatInf system, which was already under active development and operation by the INF project on its own dedicated infrastructure.
This incident highlighted several critical lessons that directly informed the design and implementation of the current infrastructure:
Against this background, the INF project pursues the goal of sustainable development and efficient long-term operation of a robust research data infrastructure for CRC/TRR 247. The concrete implementation of this strategy is described in the following sections of this report.
In accordance with the second funding period (FP2) proposal, a dedicated hardware infrastructure has been established and continuously expanded during FP2 in order to ensure infrastructural flexibility for the development, operation, and maintenance of information systems for research data acquisition, processing, and storage.
All core components are based on enterprise-grade server hardware provided by DELL and are operated within data centers of Ruhr University Bochum (RUB).
The central component of the infrastructure is a 2-unit DELL PowerEdge R750 server, acquired at the end of 2022 and installed in the IC Ost server room. The system is primarily used to host virtual machines supporting the core data infrastructure of CRC/TRR 247. Disk storage capacity has been expanded in subsequent years in response to increasing data volumes.
The server runs a Hyper-V hypervisor and hosts, in particular, a virtual machine that ensures the continuous 24/7 operation of the web-based Research Data Management System (RDMS) of CRC/TRR 247, including all associated services: crc247.mdi.ruhr-uni-bochum.de
This system represents the primary operational backbone of the INF data infrastructure.

Current hardware configuration (DELL PowerEdge R750):
- 2 × Intel Xeon Gold 6326 CPUs (16 cores each, 2.90 GHz)
- 512 GB DDR4 RAM (3200 MHz)
- BOSS controller, RAID 1: 240 GB SATA SSD (2 × M.2 Micron 240 GB)
- PERC H755 controller with:
• RAID 1: 3.6 TB SATA SSD (2 × 2.5" SKhynix 3.6 TB)
• RAID 5: 44 TB SATA HDD (4 × 3.5" Seagate 16 TB)
• RAID 5: 110 TB SATA HDD (6 × 3.5" WD Gold 24 TB)
- 2 × 10 Gbit LAN
- iDRAC 9
- Host operating system: Windows Server 2022 Standard
For additional local redundancy, an external WD My Book Duo USB storage device is directly connected to the main server. This device is used exclusively for storing backup copies of databases and research data and provides an additional protection layer independent of the internal disk subsystems of the server.

Current hardware configuration (WD My Book Duo):
22 TB (RAID 0: 2 × 11 TB)
To ensure disaster resilience and geographical redundancy, a dedicated backup server (DELL PowerEdge R6615, 1 unit) was acquired at the end of 2023. The system is located in the central RUB data center and operates in a different IP subnet and physical environment than the main server. Storage capacity and main memory have been expanded in subsequent years.
The physical separation of the backup server from the main infrastructure ensures that, in the event of a complete failure or loss of the primary data center, essential services and data can be restored and put into operation using the backup infrastructure.

Current hardware configuration (DELL PowerEdge R6615):
- 1 × AMD EPYC 9124 CPU (16 cores, 3.0 GHz) - 320 GB DDR5 RAM (4800 MHz; 4 × 16 GB + 4 × 64 GB Micron) - BOSS controller, RAID 1: 480 GB SATA SSD (2 × M.2 SKhynix 480 GB) - RAID 5: 60 TB SATA HDD (4 × 3.5" WDC 22 TB) - 2 × 10 Gbit LAN - iDRAC 9 - Host operating system: Windows Server 2022 Standard
In order to support computationally intensive research tasks, a 2-unit DELL PowerEdge R770 compute server was acquired at the end of 2025 and installed in the IC Ost server room. The system includes an NVIDIA H100 NVL accelerator and is intended for advanced numerical simulations and data-driven research.
Typical use cases include:
The acquisition of the required software licenses is planned to be financed from INF project funds in the first half of 2026.

Current hardware configuration (DELL PowerEdge R770):
- 2 × Intel Xeon 6515P CPUs (16 cores each, 2.3 GHz)
- 256 GB DDR5 RAM (6400 MHz; 16 × Hynix 16 GB)
- BOSS controller, RAID 1: 480 GB SATA SSD (2 × M.2 Micron 480 GB)
- 1.92 TB NVMe SSD (Samsung)
- NVIDIA H100 NVL GPU (94 GB HBM)
- 2 × 10 Gbit LAN
- iDRAC 10
- Operating systems:
• Windows Server 2025 Standard (180-day trial; license purchase planned for Q1 2026)
• Linux Mint 22.2
High operational reliability is ensured through multiple layers of hardware redundancy and monitoring:
All infrastructure components are operated within secure data centers of Ruhr University Bochum and are distributed across two geographically separated locations:
This deployment strategy provides both physical security and resilience against site-specific failures.
To ensure long-term data preservation and operational continuity, a multi-stage, automated backup strategy has been implemented. This strategy ensures the existence of multiple, temporally and geographically distributed backup copies.

On the main virtual machine (WebVM) hosted on the main server (see Section 1.1), automated backup tasks are executed via the Windows Task Scheduler and dedicated scripts:
The path \\host.mdi.ruhr-uni-bochum.de\Shared_Backup resides on the external backup storage connected to the main server (see Section 1.2).
As a result, an up-to-date copy of both databases and research documents is always available on an independent storage medium, even in the event of a complete failure of the main server.
The backup server (Section 1.3) performs a weekly synchronization every Sunday at 11:00, copying the entire directory \\host.mdi.ruhr-uni-bochum.de\Shared_Backup from the external storage to the local directory E:\Shared_Backup on the backup server, which is itself based on a RAID 5 storage system.
If this synchronization fails, an automatic email notification is sent to the system administrators. The incident is then analyzed using the corresponding task logs, ensuring traceability and timely remediation.
In addition to automated backups of databases and documents, a full manual backup of the main virtual machine image hosted on the main server (see Section 1.1) is performed approximately twice per year. The VM image is copied both to the external disk (see Section 1.2) and to the backup server (see Section 1.3).
Since the backup server operates an active Hyper-V role and hosts a preconfigured virtual machine environment, the entire system can be restored and brought back into operation even in the event of a complete loss of the infrastructure located in the IC Ost data center.
To provide a flexible and sustainable solution for research data management, CRC/TRR 247 employs the MatInf RDMS, a system developed within the INF project. Developing and maintaining an in-house solution enables the consortium to fully meet its requirements for the acquisition, processing, exchange, and publication of research data. MatInf is an open-source system (MIT license) available at: https://gitlab.ruhr-uni-bochum.de/vic/infproject. Further information and full documentation are available at MatInf.pro.
Since March 2023, research data acquisition, processing, sharing, and publication within CRC/TRR 247 have been carried out on a dedicated MatInf tenant: https://crc247.mdi.ruhr-uni-bochum.de.
The system contains all research data accumulated during the two funding periods (8 years in total). Data from the previous RDMS were fully migrated to ensure continuity and long-term availability.
Data are managed using a flexible system of access rights. By default, data are accessible to all registered members of the consortium (User role). Objects associated with publicly available publications may be explicitly released as public, making them accessible to anonymous users. The documentation provides a detailed description of the access model.
The system currently hosts 170 registered users, representing all participating PIs and research groups of CRC/TRR 247. Detailed user statistics are available at: https://crc247.mdi.ruhr-uni-bochum.de/report/users/:

All information in MatInf is represented as typed objects. The list of supported object types is continuously evolving and currently comprises 202 types: https://crc247.mdi.ruhr-uni-bochum.de/adminobject.
Object creation has been continuous over both funding periods: https://crc247.mdi.ruhr-uni-bochum.de/report/objectsbymonth.


These figures reflect the central role of MatInf as the primary research data infrastructure for CRC/TRR 247.
Although the RDMS supports more than 200 object types, deep integration—which includes standardization, automated validation, structured data extraction, and visualization—requires dedicated software components (as defined in the UDT Support API).
As of early 2026, deep integration has been completed for 13 major data types, covering 29 document formats. Supported types include:
This deep integration enables automated data handling, improved data quality, and rich visualization capabilities aligned with FAIR principles.
The Differential Tafel Analysis application is available for deeply integrated LSV data, including cyclic voltammograms with cycle tracking. Example dataset. App interface.
The machine-learning-based Continuous Properties Prediction System currently supports more than 40 regression algorithms, including standard scikit-learn methods and custom models: paris.matinf.pro.
During the second funding period (FP2) of CRC/TRR 247, the INF project has successfully established a dedicated, robust, and scalable software and hardware infrastructure that fully supports the consortium’s research data management needs. The MatInf RDMS, together with the underlying server landscape and backup strategy, fulfills modern requirements for reliability, security, and fault tolerance and provides a sustainable technical foundation for the long-term operation of research data services within CRC/TRR 247.
This infrastructure ensures:
As a result, the infrastructure created within the INF project not only guarantees secure daily operation but also offers strong prospects for future expansion and deeper integration with other research infrastructures. It provides a sustainable, flexible, and forward-looking basis for the management, sharing, and reuse of research data within CRC/TRR 247, thereby supporting the continuity, transparency, and reproducibility of the consortium’s scientific activities.
Last updated on 04.02.2026