How we mitigated web plugin feature without republishing to webstore on production
A couple of months back, we encountered something very strange with our plugin integrated into an existing Electronic Health Record. The notion for our plugin was pretty simple :
Plugin traverses the DOM, finds a specific patient ID and triggers an API call to our backend servers.
Sounds pretty simple, right?
Wrong!
There is a simple assumption that we could not cater : Our traversal of the DOM assumed that the DOM structure of the EHR could not change. Also, we did not control the EHR!
What happend?
A seemingly simple day turned chaotic when all of sudden, we started getting customer calls about an outage from the plugin as the plugin patient ID detection was not working on production. The horrorsome thing was that we hadn't pushed any changes on production for the past 6 months.
What we found out? 😱
On deeper investigation, we found that a certain span element which had the patient ID had very peekingly been moved inside another div element. No visual changes, no logical changes. Just a simple shift, and all it took was to bring our plugin down.
The problem was not that the hotfix could be done on the code, but republishing the plugin could take another 3-5 days, essentially making our plugin obselete for those days.
Our simple solution 🙂
We shifted our entire DOM traversal logic on the backend and decided to create a waffle switch for the entire feature to turn it on or turn it off as per will. Our flow was pretty simple
Our react application loads as an iframe inside the EHR. This simple react application would pass backend data to the plugin using chrome's CustomEvent. The data would contain the waffle switch that determines whether the patient ID detection would work or not.
We would create a static configuration of the DOM traversal for patient ID detection as a JSON and store it in our database. This static configuration would be a JSON iike below
{ "version": "1.0", "name": "config-name", "entry": { "context": "document", "steps": [...] }, "result": { "onSuccess": [...], "onFailure": [...] }, "timing": { "waitFor": [...] } }Each element in the steps key would determine a DOM query with necessary information about the DOM element to be queried.
Each element in the same key would contain a next key, which would tell the plugin parser which element to query next in the flow.
The result key would be all the callback functions which would execute after the patient ID was successfully detected. This could include all the post success invocations and audit log triggers. This also gave us to ability to configure it based on clients need.
We also introduced a timing key, which would determine the retry logic of the patient ID detection, as each of our client's EHR's response timings were very different.
Thus by implementing this feature flow, we were able to create a backend driven frontend which would help us exercise plugin behavior change if any day the EHR's DOM structure changed. This also gave us to ability to store different configurations for different clients, and to turn off the feature for some time in case we wanted to recalibarate the feature.