CDA FHIR mapping - hl7

I am trying to add support for sidecar applications in EHR platforms. I am taking a pure implementer's approach to build an intermediate representation (such as an XML) for mapping CDA<--->FHIR. I am using the smart-on-fhir as the reference implementation for this. The CDA I am trying to use is the Australian extension - ereferral (www.digitalhealth.gov.au/implementation-resources/clinical-documents/EP-0936-2012/NEHTA-0967-2012).
Is it possible to create such an intermediate representation using the smart-on-fhir (or any other FHIR) reference implementation? Has anyone else tried this?
While searching for actual implementations I came across these repos:
github.com/jmandel/sample_ccdas
github.com/amida-tech/fhir2ccda
github.com/amida-tech/cda-fhir
The FHIR group has some hand crafted examples. Are there any equivalent CDA examples for these FHIR resources?
I have read couple of web articles and white paper documents regarding the challenges between the transforms, such as:
David Hay's blog says "FHIR document is that it is like an object graph, rooted in the composition resource", so is their an equivalent representation for CDA?
Rene Spronk's article about whether HL7 v3 is a message or a document. What are the implications for an implementer who has to handle and validate representations across both CDA and FHIR
Lantana Group position paper - "If or when FHIR can accommodate the full CDA use case, the future holds the promise of seamless integration and information sharing between clinical documents and APIs". Does this mean that CDA<--->FHIR transform is not possible at this stage of the FHIR standard?
Apologies for cross posting it in both SO and FHIR community forums: http://community.fhir.org/t/cda-fhir-mapping-implementations/211/1

CDA to FHIR is fairly straightforward and it looks like you've already found some repositories that do that.
FHIR to CDA is also fairly straightforward - assuming all you care about is something that is valid. You may need to adjust those libraries for use in the AU/NZ locales.
That said - Keith Boone has some blog posts about some of the challenges in mapping between the two, as there will always be quirks.
The biggest hurdle you will have is loss of fidelity. A CDA (at least a C-CDA here in the states) comes with so much HL7v3 cruft that by turning it into FHIR, you're instantly not going to be able to re-create the original C-CDA from FHIR. The CCD template has evolved so much that you find documents that contain C32 elements, C-CDA 1.1 templateIds, and maybe someday C-CDA 2.1 templateIds. So while you can reasonably make a valid C-CDA out a FHIR bundle, you will have to pick a specific implementation and version to target.
My startup https://www.redoxengine.com has a JSON version of CDA that yo can see on the docs page. It's designed for simplicity, but can be mapped back and forth to FHIR resources.

Related

Does it make sense to interrogate structured data using NLP?

I know that this question may not be suitable for SO, but please let this question be here for a while. Last time my question was moved to cross-validated, it froze; no more views or feedback.
I came across a question that does not make much sense for me. How IFC models can be interrogated via NLP? Consider IFC models as semantically rich structured data. IFC defines an EXPRESS based entity-relationship model consisting of entities organized into an object-based inheritance hierarchy. Examples of entities include building elements, geometry, and basic constructs.
How could NLP be used for such type of data? I don't see NLP relevant at all.
In general, I would suggest that using NLP techniques to "interrogate" already (quite formally) structured data like EXPRESS would be overkill at best and a time / maintenance sinkhole at worst. In general, the strengths of NLP (human language ambiguity resolution, coreference resolution, text summarization, textual entailment, etc.) are wholly unnecessary when you already have such an unambiguous encoding as this. If anything, you could imagine translating this schema directly into a Prolog application for direct logic queries, etc. (which is quite a different direction than NLP).
I did some searches to try to find the references you may have been referring to. The only item I found was Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques:
... the authors propose a new method for extending the IFC schema to incorporate CC-related information, in an objective and semiautomated manner. The method utilizes semantic natural language processing techniques and machine learning techniques to extract concepts from documents that are related to CC [compliance checking] (e.g., building codes) and match the extracted concepts to concepts in the IFC class hierarchy.
So in this example, at least, the authors are not "interrogating" the IFC schema with NLP, but rather using it to augment existing schemas with additional information extracted from human-readable text. This makes much more sense. If you want to post the actual URL or reference that contains the "NLP interrogation" phrase, I should be able to comment more specifically.
Edit:
The project grant abstract you referenced does not contain much in the way of details, but they have this sentence:
... The information embedded in the parametric 3D model is intended for facility or workplace management using appropriate software. However, this information also has the potential, when combined with IoT sensors and cognitive computing, to be utilised by healthcare professionals in Ambient Assisted Living (AAL) environments. This project will examine how as-constructed BIM models of healthcare facilities can be interrogated via natural language processing to support AAL. ...
I can only speculate on the following reason for possibly using an NLP framework for this purpose:
While BIM models include Industry Foundation Classes (IFCs) and aecXML, there are many dozens of other formats, many of them proprietary. Some are CAD-integrated and others are standalone. Rather than pay for many proprietary licenses (some of these enterprise products are quite expensive), and/or spend the time to develop proper structured query behavior for the various diverse file format specifications (which may not be publicly available in proprietary cases), the authors have chosen a more automated, general solution to extract the content they are looking for (which I assume must be textual or textual tags in nearly all cases). This would almost be akin to a search engine "scraping" websites and looking for key words or phrases and synonyms to them, etc. The upside is they don't have to explicitly code against all the different possible BIM file formats to get good coverage, nor pay out large sums of money. The downside is they open up new issues and considerations that come with NLP, including training, validation, supervision, etc. And NLP will never have the same level of accuracy you could obtain from a true structured query against a known schema.

ADT vs. CCDA data gap

We are developing a provide and register web service for CCDAs. Our vendor requires ADT as the patient registration portion. I can create a bare ADT message from the information provided to me in the CCDA in order to simplify the onboarding process (eliminate a dedicated ADT feed) and reduce the cost. BUT there are data elements (NK1, IN1, GT) that are either not included in the CCDA or not as robust.
I wanted to know if there are any documented data gaps between these two message (CCDA vs. ADT).
I wanted to get feedback to my approach.
I wanted to know the governing process for CCDA, as it makes sense to eventually include some of these ADT data points in the CCDA.
Thanks!
I don't think there is any specific documentation on data gaps between C-CDA and HL7 V2.x ADT messages. Generally it's fine to extract content from C-CDA and use that to construct an ADT message, but obviously you won't get everything. Governance is handled by the Structured Documents workgroup; anyone is welcome to join and submit change proposals.
May be you can find the additional information at CDA sections entries. C-CDA does not requires, for example, a CDA document to contain an immunizations sections with entries, but yes it defines how to include this information. If your CDA includes that information, that may be a good option.
Martí
Remember that CDA/CCDAs are not a replacement for clinical or administrative messages. Your approach is fine, but StrucDoc may push back on adding content that is directed toward workflow concerns. CDAs are static objects, they are not intended to trigger action.
As Marti points out, consider what information is possible the specific document you are using ... Or in the base CCDA specification. As long as your document template does not exclude a base specification section, that section can be included in a instance of that document template.
Without appropriate details it's hard to say for certain.
Does the system requiring ADT need encounters? In that case, you're going to need an encounters section from the CDA, which then needs to be turned into multiple A08s.
Do they just need demographics? That's probably do-able.
I would ask for specs around what event types they expect and what fields are required (or at least will bomb out on their side), and just go through the list a sample C-CDA or two on your side.

Are the HL7-FHIR, HL7 CDA, CIMI, openEHR and ISO13606 approaches aiming to solve the same health data exchange problems?

Are the HL7-FHIR, HL7 CDA, CIMI, openEHR and ISO13606 approaches aiming to solve the same health data exchange problems?
FHIR, CDA, 13606, CIMI, and openEHR all offer partial and overlapping approaches to 'solving health data exchange problems'. They each have strengths and weaknesses, and can work together as well as overlapping each other.
FHIR is an API exchange spec that's easy to adopt
CDA is a document format that's widely supported
CIMI is a community defining formal semantic models for content
openEHR does agreed semantic models and an application infrastructure
13606 is for EHR extract exchange
CIMI is clearly an initiative to think about the content of archetypes and recurring patterns
FHIR is a specification for API's including a limited set of content models
openEHR is a community and an open-source specification with standardised Reference Model, and Archetype Object Model, Data types and Termlist
CEN/ISO 13606 is a community using its formal, public(?), and CEN and ISO standardised Reference Model, and Archetype Object Model, Data types and Termlist
Scopes of all overlap. The most overlap is between openEHR and 13606. And to a lesser extend with CIMI.
Two level Modeling Paradigm. CIMI, openEHR and 13606 have a lot of interactions and adhere to the Two level Modeling Paradigm.
Archetypes can be used by FHIR. CIMI is creating archetypes as are the openEHR and 13606 communities.
I see a future for 13606 in context of cloud EHR, were the exact location of data is not always known, but what matters is how to get access to them.
13606 can provide a standard for interfacing to the cloud, and provide features as queries and requests for detailed information, instead of precooked general purpose message-formats, like patient summaries, etc.
Erik, you write: "I do not understand why you call the openEHR specification proprietary (it is CC-BY-ND licenced and freely available online) and you call the ISO 13606 more open (it is copyrighted and behind a paywall)"
The point is is that in case of an ISO standard third parties should not claim IP. You must pay for the information, and you may not distribute the copyrighted text, but you may use the information without risk of being confronted with excessive claims afterwards.
There is a policy on ISO deliverables regarding to patents, which gives insurance for not having to deal with excessive patent-claims afterwards. See for more information:
http://isotc.iso.org/livelink/livelink/fetch/2000/2122/3770791/Common_Policy.htm?nodeid=6344764&vernum=-2
Resume: There can be IP claims on ISO deliverables, except from the copyrighted text, but those claims must be handled in a non-discriminatory and reasonable way. So, no excessive claims are possible.
In a patent-related legal case, the judge will find it important that the deliverable was published at ISO.
Two equitable defenses to patent infringement that may arise from a patent owner's delay in taking action are laches and equitable estoppel. Delays give rise to a presumption that a delay is unreasonable, inexcusable, and prejudicial. This is certainly true when it concerns an ISO deliverable.
This policy does not exist in the case CC-BY-ND licenced work. This work gives no guarantee at all. The user of CC-BY-ND licenced work is not safe for claims.
Therefore it is important that AOM2.0 will be submitted to ISO. It can only be submitted to ISO in context of the 13606 Renewal. That is why the OpenEHR community, for its own sake, must work on a Reference Model agnostic standard in all parts, to help and convince the ISO13606 renewal committee to implement it.
AOM1.4 has been an ISO standard for years, so we can be pretty sure that there is no hidden IP on that.
I would say the only standard which aim is NOT to solve data exchange problems is openEHR.
openEHR defines a complete EHR platform architecture to manage clinical data structure definitions (archetypes, templates), including constraints and terminology/translations, manage clinical information (canonical information model), access clinical information (standard query language AQL), define rules for clinical decision support (standard rule language GDL), and defines a service model (REST API is close to be approved).
So looking at openEHR, it tries to solve the all the interoperability problems that come before any data exchange but are needed to make data exchanged interpreted and used correctly, in short openEHR allows interoperability but doesn't define how data is technically exchanged.

How to implement OData federation for Application integration

I have to integrate various legacy applications with some newly introduced parts that are silos of information and have been built at different times with varying architectures. At times these applications may need to get data from other system if it exists and display it to the user within their own screens based on the business needs.
I was looking to see if its possible to implement a generic federation engine that kind of abstracts the aggregation of the data from various other OData endpoints and have a single version of truth.
An simplistic example could be as below.
I am not really looking to do an ETL here as that may introduce some data related side effects in terms of staleness etc.
Can some one share some ideas as to how this can be achieved or point me to any article on the net that shows such a concept.
Regards
Kiran
Officially, the answer is to use either the reflection provider or a custom provider.
Support for multiple data sources (odata)
Allow me to expose entities from multiple sources
To decide between the two approaches, take a look at this article.
If you decide that you need to build a custom provider, the referenced article also contains links to a series of other articles that will help you through the learning process.
Your project seems non-trivial, so in addition I recommend looking at other resources like the WCF Data Services Toolkit to help you along.
By the way, from an architecture standpoint, I believe your idea is sound. Yes, you may have some domain logic behind OData endpoints, but I've always believed this logic should be thin as OData is primarily used as part of data access layers, much like SQL (as opposed to service layers which encapsulate more behavior in the traditional sense). Even if that thin logic requires your aggregator to get a little smart, it's likely that you'll always be able to get away with it using a custom provider.
That being said, if the aggregator itself encapsulates a lot of behavior (as opposed to simply aggregating and re-exposing raw data), you should consider using another protocol that is less data-oriented (but keep using the OData backends in that service). Since domain logic is normally heavily specific, there's very rarely a one-size-fits-all type of protocol, so you'd naturally have to design it yourself.
However, if the aggregated data is exposed mostly as-is or with essentially structural changes (little to no behavior besides assembling the raw data), I think using OData again for that central component is very appropriate.
Obviously, and as you can see in the comments to your question, not everybody would agree with all of this -- so as always, take it with a grain of salt.

Comparison of OData and Semantic Web/Linked Data

I'm trying to get my head around two very different approaches to data sharing: OData and Semantic Web/Linked Data. Is there a good comparison of the two?
As I understand it, OData combines syndication/CRUD (AtomPub), serialisation formats (XML, JSON), a data model, a query language, and some semantics/conventions governing use of those existing technologies. It's primarily intended for exposing data from one system so that others can consume it.
Linked Data is a data model, a rigorous commitment to URIs, an (optional?) serialisation format (RDF/XML), but (correct me if I'm wrong) doesn't say anything about transport, CRUD, etc. It seems intended to allow inferencing across lots of little chunks of data drawn from a wide variety of sources. (Not something of major importance to us right now - we would be synchronising large slabs of data between a small number of sources, and wanting to preserve provenance information).
I'm interested in technologies for sharing data between certain data management platforms, some of which I work on directly. OData seems more appealing as it's very straightforward to explain to developers: implement this API, follow that Atom standard, serialise the data like this. We're already doing something very similar for one platform: sharing XML-serialised data on an Atom feed, with URL parameters used to filter.
By contrast, my past experiences working with RDF have given me an impression of brittle, opaque (massive slabs of RDF/XML), inaccessible (using SPARQL vs SQL) technology - but perhaps I'm confusing the experience of working with a triplestore like Jena with simply exposing an existing database via a linked data API.
Any pointers, comments etc on the differences and similarities between these two approaches in terms of scope, technologies, ease, future potential etc would be great.
I think discussing this in depth is not really what Stackoverflow is meant for, but just to give you some pointers to interesting discussions about differences and overlap:
Oh - it is data on the Web
Microsoft, OData and RDF
One of the key differences seems to be that OData has no means to link data from different sources to each other. Essentially, you're still stuck in a silo.
It might also be interesting to check out various attempts to convert data between the two approaches. See a.o. http://answers.semanticweb.com/questions/1298/has-anyone-written-a-mapping-from-odata-to-rdf .
OData may be easier, but its not better, by any means. SPARQL and RDF (forget RDF/XML, better to look at Turtle) satisfies everything in OData along with providing many more cutting edge features such as:
Federation Extensions
Linked Data
Reasoning and Inference (for the more brave)
Equally, the software supporting the standards is actually quite sophisticated. Most people interested in OData generally come from a Microsoft background, so take a look at dotNetRdf
Here's a comparison matrix:
http://uoccou.wordpress.com/2011/02/17/linked-data-odata-gdata-datarss-comparison-matrix/
Unfortunately the table formatting is pretty horrible, but the content is useful.

Resources