Putting personal data to use

Impressions

The following are a collection of posts about individualized approaches to data analysis, highlighting some general considerations as well as specific applications.


2. User Interface

1*iaSvAm4v2OncAmGZVyOsRQ.png

Arguably the greatest challenge to the individualized approach to data analysis is that, unfortunately, we cannot simply assign a statistician to each person to analyze his or her data.  As a result, we have no choice but to develop algorithms capable of analyzing and presenting the data back to each user in a manner than is understandable and meaningful.  These algorithms need a platform on which to run and interact with the user, or in other words, a user interface (UI). This UI can consist of a web site connected to a server, or an app for a mobile device, but either way, its development requires skills, and often separate computer languages, that are outside the purview of clinicians, clinician-scientists, or statisticians.  Software development follows a set of principles that we don't usually consider in the purely scientific world, and yet if the UI is not easy to understand and use then people will not be able to analyze their own data on an individual level.  

In this part, we'll explore some of the broad concepts of UI development, with an aim toward building a system capable of delivering individualized medicine in a manner that is facile for both patients and providers.  Obviously, many of these subjects go far deeper and wider than any level we will touch, but the hope is that this can at least lay the groundwork for future reading if desired.  Ultimately, a successfully implemented iOS or Android app is going to require help from a professional software engineer or developer, but hopefully this section will provide enough information to get a non-CS expert off the ground.  

The data model

Most apps that collect user information are server-based, meaning that data is collected by the phone or some other wearable device and transmitted to a central location, where it can then be stored, analyzed, and returned back to the user.  Users generally select a username and password so that the data can be transmitted securely and privately, and stored under the username so that he or she can access data and/or results at a later date.  

An alternative approach, which our group in particular has been exploring in our PM App Lite, has been termed edge-based, meaning that the entire data, storage, and analysis takes place at the client end, generally on his or her iPhone.  A username/password are not needed (other than those needed to unlock the phone), and data is not transferred anywhere other than the user's device.  

The upside of a server-based model is that, because data is stored centrally, it is secure if the user loses his or her device, or switches companies, or does anything that could compromise the data already collected.  Another upside is that because everyone's data is being stored together, more advanced analytical algorithms, based on pooled samples, can be developed to improve estimates, remove noise, or allow comparison between individuals.  The downside is that there is a privacy risk with storing data in a central location.  Once data is compromised, it can be used for nefarious purposes by anyone who comes across it.  We have seen this issue broadly with the recent hacking of multiple financial servers and loss of social security numbers, identifying information, and credit cards.  

Developmental concepts

Object-Oriented Programming

I like to think of the spectrum of sophistication of a data analyst or statistician based on the platform they employ to perform their work.  At the lowest level are the Excel users, where there is no script involved, and little to no documentation of the steps taken in conducting an analysis.  Because Excel is really not designed for complex data analysis, there are also few statistical functions available, although there are enough that many low-level researchers (i.e., medical students and residents) still use Excel for the majority of their research.   The next level up is the command-prompt approach.  Whether using the Terminal command prompt for Unix commands (for example to use the PLINK genetic analysis package) or the command line of a statistical package, such as Stata or SPSS, this approach offers a broader range of analytical functions and tools, but again without a script or formal documentation of the analysis.  (One suggestion by a former mentor was to keep a separate 'lab-notebook' of my commands for documentation).  Implicit in taking this step is that you no longer can just look at your dataset while performing the analysis.  For genetic data, this is relevant because the over one million genetic variants tested on each person could crash your computer if you tried to open a file with this data in Excel.   

The next level is the scripting level, at which point you realize that an analysis can be run automatically from a 'do' file (e.g. Stata) or script.  Programs like R and SAS generally require a script, although there are IDEs (integrated development environments) that allow the command-line approach to some of these languages as well.  For me, this was a key breakthrough because I realized that by keeping a script, I could repeat an analysis easily on a new or updated dataset without needing to go back and type in the individual commands separately.  I also realized that by having my entire analysis in a script, I could leave it to run on my computer (or a server) and go off and do some other task and then come back with my data analysis complete.  

Scripting essentially opens the door for increasing structure of our analysis code.  First, by writing a separate 'function' if there is a part of our analysis that we are running repeatedly throughout the script, and then calling the function each time we need it (basically, one line of code rather than copy and pasting the entire function each time).  Then later, by writing a separate file containing a group of functions that we will use later.  These advances allow us to keep better track of our functions, but really, we're still essentially working on flat files of data, similar to what we would look at on an Excel spreadsheet.   We haven't really changed the structure of the data at this point, simply cleaned up the code we use to analyze it.  

This all changes with object-oriented programming (OOP).  With OOP, we go even deeper to the very structure of the code and data we plan to analyze, in which we stop working with actual data, and start creating the blueprint into which that data will be placed so that we can analyze it with tools that we designate.  For example, instead of just assigning a given variable, x, a value, 42, as we might do in R, we create an abstraction, X, and then create an object of X, x, to which we assign the value, 42, when we initialize it.  If this makes your head spin, you're not alone.  For most of us who don't come from the world of computer science, OOP seems like a very bulky, non-intuitive way of writing code.  However, since many apps are written in OOP languages, such as Swift or Objective-C, we need to understand it if we want to use these languages for our UI.  Beyond that, understanding OOP is useful in understanding the machine-user interface when we start looking to develop complex models, where consideration about parallel processing or cloud processing are needed due to the size and complexity of the analysis.    

There are several conceptual models for using OOP to develop applications that are worth at least delving into briefly.  Each of these can be significantly expanded upon, through internet-based courses and posts, as well as formal training in computer science and engineering.  Where possible, we will link to some of these resources, but further understanding will likely require additional study.  We will review three broad conceptual models for app development using OOP: the model-view-controller (MVC) model, the SOLID principles, and test-driven development.  MVC and test-driven development do not necessarily require OOP for app creation, although since many apps use OOP languages, we will include them here.  

Model-View-Controller

Figure 1 from Patrick's Software blog is one of several representations of the model-view-controller (MVC) architecture that can be found online and essentially describe the relationship between these three parts of an interactive application.  One can imagine these as the three starting classes that you might use when building an OOP app for an interactive website, in which one class controls the model, one controls the viewer, and one controls the connection between the two (called the Controller class).  

Figure 1.  Illustration of the MVC model.  I like this model because it shows how the Controller is the interface between the View and the Model, which interact with the user and the data/database, respectively (Note that other representat…

Figure 1.  Illustration of the MVC model.  I like this model because it shows how the Controller is the interface between the View and the Model, which interact with the user and the data/database, respectively (Note that other representations inaccurately show the Controller as directly interacting with the User, which is not really how MVC works.  

Basically, the View is the part of the application that interacts with the user.  For a webpage, this is the HTML/CSS code that is shown in your browser, or for a mobile app, this is what you see on the screen when you load the app.  For iOS app development (on Xcode), the View is largely manipulated separately using the Storyboard, which transmits GUI manipulations into code.  The user can manipulate the View (e.g., pressing a button, entering data), which then interfaces with the Controller part (class), which puts that action into meaning for the app.  The Controller is like the go-between between the View and the Model, which is where the data is manipulated, analyzed, retrieved and/or stored in a database.  The Controller updates the Model, which after analysis or retrieval of data, then notifies the Controllers, which then displays this information on the View.  

This fairly straightforward architecture can be quite useful in app development, where (as we will discuss shortly) it helps to keep code organized so that it can be modified, upgraded or replaced, without significantly rewriting or crashing the app.  The simple question of 'what part is this function I'm developing used in?', can go a long way to preventing you from writing an app that doesn't work when you decide that you want to use a new analysis approach, but wrote it under a View class, and thus cannot present the results from this analysis for the user.  

SOLID Principles

It took me a fair amount of effort to wrap my head around the SOLID principles, which on a high level describe a general strategy to encapsulate code to prevent unintended consequences that changes applied to one part of code can have on another.  Broadly, the goal of using SOLID is to allow you to build an app for which components can be upgraded or modified without crashing other parts of the code.  An analogy is like an appliance in your house: If I want to upgrade my toaster, I want to be able to replace my old toaster with a new model without worrying about it preventing my refrigerator from working or my door from opening. If this sounds simple, then that's the goal.  The SOLID principles themselves take some thought to grasp individually, but coming back to this big picture has helped me.  

The Five SOLID PRINCIPLES are:

SRP - Single Responsibility Principle

OCP - Open-closed Principle

LSP - Liskov Substitution Principle

ISP - Interface Segregation Principle

DIP - Dependency Inversion Principle

There are many online resources that do better or worse jobs of explaining the SOLID principles.  Among the ones I've used most often include this one, which includes code examples, although there are many excellent descriptions available for your coding language of choice.  As above, we won't go into extreme detail for each, but provide a general example. 

1) The Single Responsibility Principle (SRP) basically says that a class should have one, and only one responsibility.  I've also heard this phrased as a class should have only one reason to 'change', indicating that a class can be extended to include other responsibilities (see below), but if I want to go into the class in my code to change the description it should only be because the reason that the class was created has changed.  The goal of SRP is to prevent you from writing a super-class, which has many characteristics and functions, which if changed could alter the function and overall responsibility for that class.    For our toaster, we want it only to have the function to heat items that are inserted.  If we designed a toaster that did this, but also selected items for dinner at night then we'd be in big trouble figuring out what to eat for dinner after we purchased a new toaster.

2) The Open-Closed Principle (OCP) says that classes should be open for extension, but closed for modification.  As above, the idea here is to create simple classes designed for one purpose, and that if I want my class to do some additional things beyond what it currently does, then I should extend it by writing a new class based on the older one.  For example, let's say that I do decide that I want my toaster to pick out dinner for me, but also toast my bread, then I should write a new class with the old 'toaster' as a subclass to include this behavior.  Each language has it's own protocols (some are actually called 'protocol') for extending a class.  

3) The Liskov Substitution Principle (LSP) is based on a talk given by Barbara Liskov in 1987 in which she outlined this principle that states that good OOP code allows parent classes to be substituted for their child classes without crashing the program.  For our toaster example, this basically means that my kitchen should allow me to substitute my old toaster for the new one if I want to make toast, without the kitchen no longer working.  Of course, I wouldn't necessarily have the ability to have my toaster select dinner (more on this in a bit), but the original responsibility should be kept.  

4) The Interface Segregation Principle (ISP) is based on the notion that we want each new class to interact with the program separately, so that separate classes are not dependent on a single interface to work.  In contrast, we want each class to interface with other classes using separate interfaces, so they can be 'plugged in' without worry about whether that connection meets criteria that are irrelevant to the task at hand.  For example, we want our toaster to be able to plug directly into the kitchen separately from other appliances, so that we can use it without worrying about whether those other appliances work or not.  If our outlet only worked by plugging into the stovetop, then we would need to worry about whether the stove was also working when we turned on the toaster.  

5) The Dependency Inversion Principle (DIP) was, for me, one of the toughest concepts to wrap my head around of the SOLID principles. It specifies that classes should depend on abstraction, not concretions, and gets at the heart of what OOP does that non-OOP approaches do not do, which is to create a blueprint for a system.  Unlike a toaster, which is a solid object, OOP designs the blueprint for multiple future toasters meeting certain specifications, and these rules are designed to guide the blueprint.  As above, if I want a toaster that selects my dinners for me, then I would want to extend the blueprint for toasters to include this additional class.  I wouldn't want to modify my prior toaster to have this function because if I later changed my mind about the dinners being selected, I would have modified my existing toaster irreversibly and couldn't even toast my food afterwards.  There are several coding strategies to adhere a program to DIP, which are well-beyond this post.  

Test-Driven Development

We will discuss test-driven development (TDD) here not because this is something that is critically important to developing applications capable of delivering individualized medicine to people, but because it highlights a key component often overlooked by investigators focused primarily on data collection and analysis, which is function.  If my R script crashes after I try to add a new analysis at the end of it, all it means for me as an analyst is that I need to go through and find out where the bug is so that I can fix it and it won't crash again.  If later on, it crashes doing a separate analysis, again, I just go and find the bug and re-run the analysis.  My goal overall is to get my R script to run on the data that I intend to analyze.  I don't care so much if it doesn't work on someone else's data. 

On the other hand, if I'm using an app on my iPhone that suddenly crashes, I'm not going to go into the code and figure out what I did that made it crash.  I'm just going to stop using it (well, I might try to use it a couple of times first, and then give up if it keeps crashing...).  For an approach to work at scale, it needs to be 'bulletproof' in the sense that it should be tested and resistant to as many glitches as possible, knowing that most of these bugs don't arise from standard use of the app, but from the outlier or nonstandard use.  

There are many strategies to developing an app that is bulletproof, or at least can be tested for failure by a range of users, not only the developer.  TDD is one simple approach that seems like a lot of extra work, but has been shown time and again to pay dividends down the road as the number of features and complexity of an app expands over time.  The basic idea of TDD is that instead of starting out writing code to complete the task we want it to perform, we instead start out by writing a test that our program will fail, that is, until we then modify the program.  These tests, called unit tests, are often contained in a separate program which can be run to test for bugs in the app.  The advantage of this approach is that in writing a separate set of tests at the same time that I'm writing my code, when I'm finished I will not only have the app, but a system for testing it.  Then, when I later go to expand the app to add or modify features, I will have a program to run to make sure that in the process of expanding my app I didn't mess up any existing functions, which would be identified by the unit test.  

Of course, in the real world, unit tests can be developed at any time, and I know of a programmer who was hired by a large company specifically to write unit tests for their code.  The argument for TDD is that debugging can take a long time, and by building the testing into the app development, we can ultimately shorten development time by removing much of the time needed for debugging.  In practice, I will say that applying TDD is not entirely intuitive, and oftentimes we're faced with the question of whether a given test is too simple to write (many times this is code caught by the IDE) or whether we've covered all possible bugs in our unit test.  It can also be hard to resist pushing ahead on the app itself to develop a feature and leaving the testing part behind, which essentially defeats the purpose.  It takes discipline and restraint to apply TDD, but most developers attest that it will lead to better and more efficient code in the end.

Product Implementation (Consumer Engagement)

I had a conversation with an employee for one of the wearable monitor companies where the topic of designing better analytical algorithms for the activity data came up.  She mentioned that her working group in the company had developed some exciting new things to do with the data, and was at a meeting with the leadership to discuss how to implement these changes into the next generation of devices.   She said that after some consideration, the leadership told her that the new analytical approaches were exciting, but that they had also done some market analysis of different device designs and the color yellow polled particularly favorably, and so they were going to go in that direction instead.  

In this last section, we will discuss some of the issues surrounding product implementation, also called consumer engagement.  As a scientist and investigator, this subject is among the most nauseating and frustrating for me to discuss, for reasons highlighted in the above example, but also because it delves into the world of consumer psychology and advertising, which many of us in the scientific world prefer to avoid.  Understanding what makes people buy product A and not product B would seem to be completely irrelevant to the world of scientific investigation, and yet if we ever want to develop an individualized approach to healthcare it is imperative that our design be sound and appealing to a depth and breadth of consumers.  

There are of course other potential advantages to venturing into the world of product development from scientific investigation than this ultimate goal.  For example, much of the rigid bureaucracy of grant writing and review, and sources of funding, are far greater and more efficient in the private or commercial sector than in the government sector (i.e, the NIH, which funds much of medical research).  A convincing white paper or elevator pitch can get you thousands to millions of dollars of venture capital investment if the idea is sound; an NIH grant application often runs in the hundreds of pages, even for basic ideas, and often for far less money.  In fact, for some of us research cynics, the most 'fundable' ideas seem to be those that were exciting years ago, are simple enough the be presented to an aging reviewer in a manner that he or she can understand (i.e., leave out the details about deep learning networks), and are presented by someone who has already basically published on the same idea.  In the business world, a junior person with a good pitch can get funding for a brand new idea with little to no preliminary data, as long as the target market is right.   Ah, the grass is always greener...

On the other hand, by stepping into the world of commercial app development we enter into a world of 'just-in-time' development of the 'minimum viable product' (more on this in part #5); a world where the latest approaches generally involve creating an inferior product and using consumer feedback to decide what direction to take it.  We may start out wanting to develop a user interface so that patients can apply individualized analytical approaches to their health and wellness, but if the early users decide they prefer instead we focus on connecting to their social media accounts, then the data analysis goes out the window.  In addition, validation and accuracy may be of little importance in development of many entertainment monitors and applications (my Fitbit gives me credit for steps while riding around in a golf cart), but if we plan to use our approach for making real healthcare decisions, it will need to be validated in clinical studies before we 'rush to market.'  The FDA is increasingly recognizing this later fact, and is pushing to have its jurisdiction extended beyond the standard medical devices and drugs into the world of app development.  Wearing my developers hat, this move seems to stifle innovation, requiring more and more bureaucracy and regulations (i.e., paperwork) before we can even get to the point of testing an app in a consumer population.  On the other hand, were I a user, I would probably want to know that the app I'm using hasn't been entirely validated, and could have adverse effects were I to change my lifestyle or medical treatment plan inappropriately.  We discuss this situation in the next section on Clinical applications.  It is an evolving challenge to those of us in this space who seek to develop apps that are solidly designed and appealing to users, but also clinically valid and tested.  

Finally, we cannot complete this section without a focus on the role of the physicians and care providers in terms of app development and usage.  We will discuss this area further in the next section, but it is worth mentioning here as this is probably the number one reason for lack of adoption of medical apps by consumers.  As a physician, nothing makes me roll my eyes more than when a patient walks in and tells me that she's worried about her heart because of a reading she got from her wearable heart rate monitor.  The developer in me tries to explain how the heart rate recorded by the device tends to record pulse rate, and often with excessive noise, so that the beat-to-beat heart rates that they provide can be inaccurate and not necessarily reflective of a clinically meaningful change in heart rate.  However, more often than not, I find myself simply telling the patient that she should just ignore the readout, or at least consider it outside of the other medical data we use in making clinical decisions.  This is unfortunate as there is probably some legitimately good information contained in the data from these monitors, but the methods for incorporation are largely lacking.  Some of this is the fault of the companies who design them, who in following the 'just-in-time' approach are focused only on getting their product out to the market rather than validating accuracy.  However, it is also the fault of the medical community, who haven't yet figured out how to incorporate this information, and often haven't even taken steps to figuring it out.  In the next section, we will discuss this situation in more detail.

Next section (#3 Clinical application)

 

 

 

Michael RosenbergComment