Read

Raising FHIR Conversion Accuracy with GPT-4o

Ashkan Khodaverdi

.5 min read

.18 August, 2025

Software Engineer

.5 min read

.18 August, 2025

How Prompt Engineering Increased Success Rate from 20% to 90%

Background

Healthcare software systems often need to comply with HL7 standards, particularly FHIR (Fast Healthcare Interoperability Resources.) However, FHIR’s extensive documentation makes onboarding and integration a time-consuming process for developers. The goal of this project was to refine an existing open-source service we own and make it more reliable. This service uses AI to generate a converter layer that assists developers in transforming their software models into FHIR-compliant formats.

Initial Prompt and Its Challenges

I began by testing the current prompt designed to generate FHIR-compliant converter functions using GPT-4o. The prompt was intended to output a valid FHIR “Patient” resource from a custom model provided by the user. During evaluation, two main issues emerged:

TypeScript compile-time errors on fields such as “name” and “telecom”, caused by improper type inference or incompatible structures.
FHIR-specific validation errors, which repeatedly occurred:
- No empty arrays allowed
- No symbols other than “.” and “-” permitted in the “id” field.

These issues indicated that the prompt lacked precision and did not account for common FHIR validation constraints.

Phase 1: Pruning and Restructuring

Key modifications included:

Switching to Chat Completion Mode for better control.
Tone adjustments: Removed conversational phrases like “Please” or “Imagine” and adopted a concise, instructional style.
Few-Shot Learning:
- Supplied a complex legacy object.
- Supplied the expected FHIR-complaint object.
- Instructed the model: “I want your output to include a “Person” interface and a “convert” function that transforms the input object into this format.”

Phase 2: Testing and Validation

The revised prompt was tested in a realistic pipeline:

Entered the prompt as the system message in the chat completion API.
Submitted a legacy JSON object as the user message, and ran the thread.
The model generated a TypeScript “convert” function.
Ran the generated function using the user’s legacy object as its input.
Validated the output against FHIR using an external validator service.
Reviewed the validation report.

Additional improvements during testing included:

Data typing enforcement: Clear instructions specifying fields with type constraints, including valid string values from FHIR enumerations.
Post-processing logic: Added a hard coded function to remove any empty arrays after conversion, with instructions for the model to include this step at the end of the function.

Evaluation Criteria

Each test case was assessed on:

Compilation: Does the output code compile in TypeScript?
FHIR Validation Warnings: Non-blocking issues from the validator.
FHIR Validation Errors: Blocking issues that fail the FHIR spec.
Pass/Fail: Passes only if there are zero “validation errors”.

Test Cases

Ten legacy JSON objects were tested:

1 from the MIMIC open-source Patient dataset
1 from a Touchstone Life Care User object
8 synthetic samples created with ChatGPT

Result Summary

Prompt Version	Total Cases	Pass	Fail	Compile Errors	FHIR Errors	Warnings
Ver 0.1	10	2 (20%)	8 (80%)	13	3	13
Ver 0.2	10	9 (90%)	1 (10%)	0	1	0

Conclusion

After prompt optimization and testing, our service is now far more reliable. The new prompt increased the pass rate from 20% to 90%, eliminated all compile errors, reduced FHIR validator errors from 3 to 1, and removed all warnings.

This improvement demonstrates that the Few-Shot Technique, combined with targeted instructions, is still highly effective for this type of problem. As a result, the Whitefox FHIR Converter can now reliably generate a conversion layer for any healthcare organization’s software, enabling seamless HL7 compliance in a fraction of the time.

(

Simplify Your FHIR Integration with Our Free Tools

FHIR Data Converter

Effortlessly transform and map custom data to FHIR standard formats.