As large language model-based workflows become both more sophisticated and more widespread, we’re seeing a growing number of novel approaches that help practitioners tailor (and improve) the models’ performance to specific projects and use cases. Many of our best-read articles in the past month zoomed in on this trend, with excellent guides for both novices and experiences users.
Our monthly highlights go beyond the exciting world of LLMs to explore other topics that remain top of mind for many data and ML professionals—from solidifying their math skills to streamlining error messages in Python. We hope you carve out some time over the next few days to discover (or revisit) some of our most popular articles from March. Let’s dive in!
Monthly Highlights
- Intro to DSPy: Goodbye Prompting, Hello Programming!
Few recent tools have generated as much excitement as DSPy, a powerful open-source framework for algorithmically optimizing prompts and weights. Leonie Monigatti brought her signature clarity and practical approach to this topic, and her beginner-friendly guide attracted the largest readership on TDS this month. - How to Learn the Math Needed for Data Science
How much math knowledge should data scientists accumulate in order to do well on their job? The yearslong debate rages on, but for anyone who’s still in the process of building their fundamental skills, Egor Howell’s primer—which comes with ample resources and tips—is a great place to start. - Why LLMs Are Not Good for Coding
AI-assisted programming is not exactly new, but talk about the imminent disappearance of developers has become a lot more common in the past year or so. Depending on your perspective, Andrea Valenzuela’s assessment of LLMs’ current limitations will be either sobering or comforting; testing ChatGPT’s abilities, she concludes that “it often struggles to generate efficient and high-quality code.”
- Visualize your RAG Data — Evaluate your Retrieval-Augmented Generation System with Ragas
Evaluating the performance of retrieval-augmented generation (RAG) systems is essential, but often tricky. In his TDS debut, Markus Stoll walks readers through the basics of working with Ragas, a framework that facilitates RAG pipeline evaluations, and pays particular attention to visualizing the results effectively. - Intro to LLM Agents with Langchain: When RAG Is Not Enough
New to working with LLM agents? Follow along Alex Honchar’s hands-on tutorial, which guides us through the steps of planning, building, and implementing agents by leveraging the power of LangChain’s LangSmith platform. - Building Your First Desktop Application using PySide6 [A Data Scientist Edition]
For anyone in the mood for tinkering, but less passionate about LLMs, why not try a different type of project? Arunn Thevapalan presented a step-by-step guide to building a functional desktop app with PySide6, a skill that can prove useful for data professionals in a wide range of contexts—especially when sharing your work with other stakeholders is crucial. - How to Generate Instruction Datasets from Any Documents for LLM Fine-Tuning
We’re not quite done with LLMs just yet! Collecting data for fine-tuning these models can be time-consuming and costly; as a potential workaround, Yanli Liu proposes an innovative approach: automating the creation of instruction datasets from various documents with the aid of Bonito, an open-source library. - What I Learned in My First 3 Months as a Freelance Data Scientist
“It really comes down to this: I get to pick what I work on, when I work on it, and for whom I am working.” After a long data science career at a wide range of companies, CJ Sullivan decided to switch tracks and become a freelancer; her latest article offers insightful reflections and pragmatic pointers for anyone else who might be considering a similar transition. - Say Goodbye to Confusing Python Error Messages
Spending less time debugging your code is a perennial goal for developers and data scientists alike. One element that can make a real difference on that front is working with clearer and more actionable error messages, something you can achieve by exploring Christopher Tao’s detailed guide on the open-source PrettyErrors library.
Our latest cohort of new authors
Every month, we’re thrilled to see a fresh group of authors join TDS, each sharing their own unique voice, knowledge, and experience with our community. If you’re looking for new writers to explore and follow, just browse the work of our latest additions, including Tahreem Rasul, Benoît Courty, Kabeer Akande, Riddhisha Prabhu, Markus Stoll, Davide Ghilardi, Dr. Leon Eversberg, Stephan Hausberg, Eden B., Volker Janz, Chris Taylor, Lior Sidi, Yuval Zukerman, Geoffrey Williams, Krzysztof K. Zdeb, Ryan O’Sullivan, Jimmy Wong, Thauri Dattadeen, Eric Frey, Bill Chambers, Tianyi Li, Marlon Hamm, Sebastian Bahr, Florent Pajot, Mark Chang, Pierre Lienhart, Thierry Jean, Tiddo Loos, G. Jay Kerns, Amirarsalan Rajabi, Hussein Jundi, Saikat Dutta, Nidhi Srinath, Ophelia P Johnson, Antonio Grandinetti, Vedant Jumle, Julia Winn, Dusko Pavlovic, Srijanie Dey, PhD, Melanie Hart Buehler, Siq Sun, Lukasz Kowejsza, Sandi Besen, Tula Masterman, Saar Berkovich, Maggie Ma, Georg Ruile, Ph.D., and Amine Raji, among others.
Thank you for supporting the work of our authors! If you’re feeling inspired to join their ranks, why not write your first post? We’d love to read it.
Until the next Variable,
TDS Team
Coding with LLMs, Learning Math, Data Science Freelancing, and Other March Must-Reads was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Coding with LLMs, Learning Math, Data Science Freelancing, and Other March Must-Reads