Yaqian Han

Weill Cornell Medicine – M.S. in Biostatistics & Data Science

Introduction

When I began my internship at EMR Technical Solutions (EMRTS), I expected to work with healthcare data, but I didn’t anticipate how much ownership and independence I would be given from day one. This was the first time I worked on systems that people actually relied on everyday—not assignments, not class projects, but real tools powering real decisions.

Looking back, this internship not only strengthened my technical skills but also changed the way I think about data work. I gained a deeper appreciation for data engineering, especially the parts that are invisible to end users but absolutely essential for everything else to work. More importantly, I learned a lot about how I work under pressure, how I communicate when something is unclear, and how to handle projects that don’t have a neat, pre-defined answer.

My Responsibilities at EMRTS

Working with a 9M+ Provider Database

One of my main responsibilities was helping reorganize the healthcare provider database. Before this internship, I had never worked with a dataset anywhere near this size. When I ran my first query and waited several seconds for it to finish, I realized the scale of what I was dealing with.

Over time, I learned how redesigning linking tables, reorganizing relationships, and adding the right indexes could cut query times dramatically. Watching a query go from several seconds to under a second felt surprisingly rewarding. It was the moment I realized that database design isn’t just “backend stuff”—it directly affects how people experience a system.

Rebuilding the Search Interface

I also worked on improving EMRTS’s internal Django search interface. The original version crashed or lagged when results were large, so I added server-side pagination and optimized the filtering logic. This was my first time truly connecting backend data architecture with a user-facing tool.

Later, this experience gave me the confidence to build a separate Django interface for catalog
pricing (NASPO/Oklahoma catalog files). This was a project I scoped mostly on my own—everything from parsing messy price files to designing the search layout. It was the first time I felt like I created a tool from scratch that someone else could actually use.

Handling Historical Medicaid Data

Another major responsibility was cleaning and standardizing over 40 years of Medicaid expenditure and enrollment data. The data was messy—column names changed over time, formats were inconsistent, and certain fields disappeared entirely in some years.

It was frustrating at first. Nothing matched the way I expected. But eventually I learned how to build pipelines that could adapt to these inconsistencies. I became more patient and systematic, and I learned to accept that real-world public health data rarely looks like what we see in textbooks.

Forecasting Medicaid Trends

Once the dataset was cleaned, I built forecasting models (ARIMA and Prophet) to simulate long-term Medicaid utilization patterns. This part of the internship felt more familiar because it related to my academic training, but applying these models in a real policy context made it much more meaningful. Instead of focusing only on accuracy metrics, I had to think about what types of patterns or inflection points would matter to decision-makers.

Skills and Growth

Technical Growth

While I definitely improved my Python, SQL, and Django skills, the biggest technical lesson I
learned was the importance of structure.

At EMRTS, everything connected:

  • poorly structured data → unstable pipelines
  • unstable pipelines → wrong results
  • wrong results → poor decisions

It made me appreciate the “boring” parts of data work—documentation, naming conventions,
table structures—because I saw firsthand how they reduce chaos.

I tackled website design using Django, a framework previously unfamiliar to me. This involved handling server interactions as well as managing HTML files, providing a holistic view of web development. Although the zip code calculator project was a simple showcase, it is a good practical experience to start with.

Communication and Initiative

Something I didn’t expect was how much I would need to communicate. Working remotely meant I couldn’t casually ask someone a quick question, so I had to learn how to express uncertainty clearly and concisely.

I also became more proactive. If something didn’t make sense in the data, I didn’t just fix it—I learned to ask why it happened and whether it would affect downstream analysis. This shift in mindset is something I’m grateful for, because it made me feel more like an actual contributor, not just an intern waiting for instructions.

Confidence

Before this internship, I wasn’t sure if I could handle large-scale data engineering tasks. Now, I feel much more confident. I learned that I enjoy problem-solving at the structural level and that I can take on projects that require both analytical thinking and patient debugging.

Conclusion

This internship was a turning point for me—not because of any single project, but because of how much responsibility I was trusted with. I learned how to manage technically complex tasks, how to communicate my reasoning, and how to build tools that people actually use.

More importantly, I gained clarity about my future path. I discovered that I genuinely enjoy the intersection of healthcare, data engineering, and problem-solving. The experience at EMRTS made me more confident in pursuing a career that combines technical depth with real-world health impact.

I’m grateful for the opportunity and for the challenges that pushed me to grow, both technically and personally. This internship reinforced why I chose this field and gave me a stronger sense of direction as I move toward graduation and beyond.

Finally, I want to thank Verbus for his guidance and trust throughout my internship. His feedback and support made a real difference in my learning experience and helped me grow both technically and professionally.