Master's Thesis: Agentic AI Debug Support for Fault Injection in Hardware Test Environments
Introduction
IBM Infrastructure is a catalyst that makes the world work better because our clients demand it. Heterogeneous environments, the explosion of data, digital automation, and cybersecurity threats require hybrid cloud infrastructure that only IBM can provide.
Your ability to be creative, a forward-thinker and to focus on innovation that matters, is all support by our growth minded culture as we continue to drive career development across our teams. Collaboration is key to IBM Infrastructure success, as we bring together different business units and teams that balance their priorities in a way that best serves our client's needs.
IBM's product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
Your role and responsibilities
Background and Motivation
Modern processor and hardware designs require robust mechanisms to detect, classify, and recover from soft errors. Traditional pre-silicon test plans are often extensive and resource-intensive, lacking adaptability to evolving design complexities. Fault tolerance has become a key quality criterion, and fault injection techniques-whether targeted or statistical-are used to simulate errors and observe system behavior under stress. However, analyzing these tests is time-consuming and requires deep technical expertise. This thesis aims to explore AI-assisted debugging and fault analysis to enhance fault coverage and streamline validation workflows.
Objectives
• Familiarize with existing simulation and fault injection environments.
• Extend a framework for statistical fault injection in simulation environments (e.g., Golden Run vs. Injected Run).
• Develop a concept for the automated analysis of fault injections.
• Build a fault database with classifications (e.g., Detected Error, Undetected Error but Testcase failure or Fatal Error, Sim-Environment Issue).
• Integrate AI/ML agents to analyze fault propagation, classify error types, and identify critical failure points.
• Evaluate the effectiveness of agent-based debugging in identifying undetected logic errors and environmental issues.
• Compare traditional and AI-enhanced methods in terms of coverage, efficiency, and actionable insights.
• Propose a scalable methodology for prioritizing test cases and reducing redundant validation effort.
• Optionally: Visualize fault distribution and impact.
Research Questions
1. How can hardware test/simulation environments be supported and classified?
2. What information is relevant for AI-supported analysis?
3. How can an agent be trained to automatically detect and evaluate faults?
4. What design insights can be derived from the fault analysis?
Methodology
• Analyze existing test data (Golden vs. injected runs).
• Develop a prototype for fault classification using Python, TensorFlow, or PyTorch.
• Apply statistical methods to assess test coverage.
• Perform instruction-level simulation and trace analysis.
• Classify error types (e.g., logic undetected, monitor errors, environmental noise).
• Aggregate data and apply pattern recognition in failure analysis.
• Integrate with existing RAS (Reliability, Availability, Serviceability) flows.
• Collaborate with existing teams (e.g., Logic Design, Verification).
Expected Outcomes
• A prototype tool or methodology for agentic debug support.
• A comparative study of fault injection strategies.
• Recommendations for test plan optimization in future chip projects.
Supervision and Environment
The thesis will be conducted in close collaboration with an interdisciplinary team. Access to real test data and existing tools will be provided. Results can directly contribute to ongoing projects.
Timeline
Planned start: flexible (e.g., Winter Semester 2025/26)
Duration: 6 months (full-time)
Required education
Bachelor's Degree
Required technical and professional expertise
- Knowledge of hardware design and verification (e.g., VHDL, Verilog, SystemVerilog).
- Experience with Python and machine learning.
- Interest in test automation and fault tolerance.
- Knowledge in digital design and computer architecture.
- Experience with simulation tools and scripting (e.g., Python, SystemVerilog).
- Familiarity with AI/ML concepts is a plus.
Want more jobs like this?
Get jobs in Böblingen, Germany delivered to your inbox every week.

ABOUT BUSINESS UNIT
IBM Systems helps IT leaders think differently about their infrastructure. IBM servers and storage are no longer inanimate - they can understand, reason, and learn so our clients can innovate while avoiding IT issues. Our systems power the world's most important industries and our clients are the architects of the future. Join us to help build our leading-edge technology portfolio designed for cognitive business and optimized for cloud computing.
YOUR LIFE @ IBM
In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.
Being an IBMer means you'll be able to learn and develop yourself and your career, you'll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.
Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.
Are you ready to be an IBMer?
ABOUT IBM
IBM's greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.
Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we're also one of the biggest technology and consulting employers, with many of the Fortune 50 companies relying on the IBM Cloud to run their business.
At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it's time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.
IBM is proud to be an equal-opportunity employer. All qualifiedapplicants will receive consideration for employment without regard to race,color, religion, sex, gender, gender identity or expression, sexualorientation, national origin, caste, genetics, pregnancy, disability,neurodivergence, age, veteran status, or other characteristics. IBM is alsocommitted to compliance with all fair employment practices regardingcitizenship and immigration status.
OTHER RELEVANT JOB DETAILS
For additional information about location requirements, please discuss with the recruiter following submission of your application.
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion
Company Videos
Hear directly from employees about what it is like to work at IBM.