Data science and machine learning professionals now how to seek answers in data: that’s probably the central pillar of their work. Things get murkier when we look at some of the thornier issues surrounding our data, from its built-in biases to the ways it can be leveraged for questionable ends.
As we enter the final stretch of the year, we invite our readers to explore some of these big-picture issues that have sparked crucial discussions in recent years, and are all but guaranteed to continue to shape the field in 2024 and beyond.
Our highlights this week dig into a broad range of topics, from the nature of data-backed knowledge itself to its application in specific fields like healthcare; we hope they inspire further reflection and draw new participants into these essential conversations.
- Bias, Toxicity, and Jailbreaking Large Language Models (LLMs)
The rapid rise and evolution of LLMs has made it difficult for practitioners to pause and take stock of their inherent risks. Rachel Draelos, MD, PhD’s detailed overview of recent research provides a timely look into their ability to perpetuate—and even exacerbate—bias and toxicity at scale. - Philosophy and Data Science — Thinking Deeply About Data
How do major concepts in epistemology—like deductive and inductive reasoning, skepticism, and pragmatism—play into the work of data scientists? Jarom Hulet’s latest post examines the (sometimes unexpected) overlaps between the two fields. - Cognitive Biases in Data Science: The Category-Size Bias
In a new series on cognitive biases, Maham Haroon unpacks all the many ways in which our brains can lead us astray when analyzing and drawing insights from data. The first installment zooms in on the category-size bias, and explains how it can seep into the assumptions data scientists make in their everyday work.
- What Role Should AI Play in Healthcare?
The biases we’ve covered thus far can wreak havoc on models, businesses, and bottom lines. As Stephanie Kirmer stresses, though, they become even more acute in fields like healthcare, where life-and-death situations are common and “the risks of failure are so catastrophic.” - A Requiem for the Transformer?
In a rapidly changing field, it’s tempting to think of a 6-year-old concept as essential and timeless. Transformers have been around since 2017 and have played an important role in the mainstream adoption of AI tools; as Salvatore Raieli points out, though, they too likely have a shelf life, and it’s perhaps a good time to ask what comes next.
Big questions are great, but mid-sized and compact ones are useful, too! Don’t miss some of our recent standouts on career changes, data engineering, and other timely topics:
- How can we make models forget information we no longer want them to keep? Evgeniya Sukhodolskaya presents a data-driven approach to machine “unlearning” for generative language models.
- Thinking of a role switch in the new year? Thu Vu recently shared a detailed roadmap for anyone interested in transitioning into data analytics.
- For cutting-edge research in the field of graph neural networks, look no further than Michael Bronstein’s latest article (with coauthors Ben Finkelshtein, Ismail Ceylan, and Xingyue Huang).
- If you’re a data engineer who dreads the process of backfilling, Xiaoxu Gao’s guide outlines efficient implementation methods that will help you streamline this workflow.
- New to scikit-learn? Yoann Mocquin recently launched a beginner-friendly series on sklearn that walks us through the popular ML library’s different modules and features.
- What would SQL for LLMs look like? Mariya Mansurova presents a detailed overview of Language Model Query Language, or LMQL, which allows users to combine multiple calls in one prompt, control outputs, and reduce costs.
Thank you for supporting the work of our authors! If you enjoy the articles you read on TDS, consider becoming a Friend of Medium Member: it’s a brand-new membership level that offers your favorite authors bigger rewards for their top-notch writing.
Until the next Variable,
TDS Editors
When Humans Need to Answer Tough Questions About Data was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
When Humans Need to Answer Tough Questions About Data
Go Here to Read this Fast! When Humans Need to Answer Tough Questions About Data