Author: CAT

  • What’s new in pandas 2.2

    What’s new in pandas 2.2

    Patrick Hoefler

    What’s New in Pandas 2.2

    The most interesting things about the new release

    Photo by Zoe Nicolaou on Unsplash

    pandas 2.2 was released on January 22nd 2024. Let’s take a look at the things this release introduces and how it will help us to improve our pandas workloads. It includes a bunch of improvements that will improve the user experience.

    pandas 2.2 brought a few additional improvements that rely on the Apache Arrow ecosystem. Additionally, we added deprecations for changes that are necessary to make Copy-on-Write the default in pandas 3.0. Let’s dig into what this means for you. We will look at the most important changes in detail.

    I am part of the pandas core team. I am an open source engineer for Coiled where I work on Dask, including improving the pandas integration.

    Improved PyArrow support

    We have introduced PyArrow backed DataFrame in pandas 2.0 and continued to improve the integration since then to enable a seamless integration into the pandas API. pandas has accessors for certain dtypes that enable specialized operations, like the string accessor, that provides many string methods. Historically, list and structs were represented as NumPy object dtype, which made working with them quite cumbersome. The Arrow dtype backend now enables tailored accessors for lists and structs, which makes working with these objects a lot easier.

    Let’s look at an example:

    import pyarrow as pa

    series = pd.Series(
    [
    {"project": "pandas", "version": "2.2.0"},
    {"project": "numpy", "version": "1.25.2"},
    {"project": "pyarrow", "version": "13.0.0"},
    ],
    dtype=pd.ArrowDtype(
    pa.struct([
    ("project", pa.string()),
    ("version", pa.string()),
    ])
    ),
    )

    This is a series that contains a dictionary in every row. Previously, this was only possible with NumPy object dtype and accessing elements from these rows required iterating over them. The struct accessor now enables direct access to certain attributes:

    series.struct.field("project")

    0 pandas
    1 numpy
    2 pyarrow
    Name: project, dtype: string[pyarrow]

    The next release will bring a CategoricalAccessor based on Arrow types.

    Integrating the Apache ADBC Driver

    Historically, pandas relied on SqlAlchemy to read data from an Sql database. This worked very reliably, but it was very slow. Alchemy reads the data row-wise, while pandas has a columnar layout, which makes reading and moving the data into a DataFrame slower than necessary.

    The ADBC Driver from the Apache Arrow project enables users to read data in a columnar layout, which brings huge performance improvements. It reads the data and stores them into an Arrow table, which is used to convert to a pandas DataFrame. You can make this conversion zero-copy, if you set dtype_backend=”pyarrow” for read_sql.

    Let’s look at an example:

    import adbc_driver_postgresql.dbapi as pg_dbapi

    df = pd.DataFrame(
    [
    [1, 2, 3],
    [4, 5, 6],
    ],
    columns=['a', 'b', 'c']
    )
    uri = "postgresql://postgres:postgres@localhost/postgres"
    with pg_dbapi.connect(uri) as conn:
    df.to_sql("pandas_table", conn, index=False)

    # for round-tripping
    with pg_dbapi.connect(uri) as conn:
    df2 = pd.read_sql("pandas_table", conn)

    The ADBC Driver currently supports Postgres and Sqlite. I would recommend everyone to switch over to this driver if you use Postgres, the driver is significantly faster and completely avoids round-tripping through Python objects, thus preserving the database types more reliably. This is the feature that I am personally most excited about.

    Adding case_when to the pandas API

    Coming from Sql to pandas, users often miss the case-when syntax that provides an easy and clean way to create new columns conditionally. pandas 2.2 adds a new case_when method, that is defined on a Series. It operates similarly to what Sql does.

    Let’s look at an example:

    df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))

    default=pd.Series('default', index=df.index)
    default.case_when(
    caselist=[
    (df.a == 1, 'first'),
    (df.a.gt(1) & df.b.eq(5), 'second'),
    ],
    )

    The method takes a list of conditions that are evaluated sequentially. The new object is then created with those values in rows where the condition evaluates to True. The method should make it significantly easier for us to create conditional columns.

    Copy-on-Write

    Copy-on-Write was initially introduced in pandas 1.5.0. The mode will become the default behavior with 3.0, which is hopefully the next pandas release. This means that we have to get our code into a state where it is compliant with the Copy-on-Write rules. pandas 2.2 introduced deprecation warnings for operations that will change behavior.

    df = pd.DataFrame({"x": [1, 2, 3]})

    df["x"][df["x"] > 1] = 100

    This will now raise a FutureWarning.

    FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
    You are setting values through chained assignment. Currently this works in certain cases, but when
    using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to
    update the original DataFrame or Series, because the intermediate object on which we are setting
    values will behave as a copy. A typical example is when you are setting values in a column of a
    DataFrame, like:

    df["col"][row_indexer] = value

    Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and
    ensure this keeps updating the original `df`.

    I wrote an earlier post that goes into more detail about how you can migrate your code and what to expect. There is an additional warning mode for Copy-on-Write that will raise warnings for all cases that change behavior:

    pd.options.mode.copy_on_write = "warn"

    Most of those warnings are only noise for the majority of pandas users, which is the reason why they are hidden behind an option.

    pd.options.mode.copy_on_write = "warn"

    df = pd.DataFrame({"a": [1, 2, 3]})
    view = df["a"]
    view.iloc[0] = 100

    This will raise a lengthy warning explaining what will change:

    FutureWarning: Setting a value on a view: behaviour will change in pandas 3.0.
    You are mutating a Series or DataFrame object, and currently this mutation will
    also have effect on other Series or DataFrame objects that share data with this
    object. In pandas 3.0 (with Copy-on-Write), updating one Series or DataFrame object
    will never modify another.

    The short summary of this is: Updating view will never update df, no matter what operation is used. This is most likely not relevant for most.

    I would recommend enabling the mode and checking the warnings briefly, but not to pay too much attention to them if you are comfortable that you are not relying on updating two different objects at once.

    I would recommend checking out the migration guide for Copy-on-Write that explains the necessary changes in more detail.

    Upgrading to the new version

    You can install the new pandas version with:

    pip install -U pandas

    Or:

    mamba install -c conda-forge pandas=2.2

    This will give you the new release in your environment.

    Conclusion

    We’ve looked at a couple of improvements that will improve performance and user experience for certain aspects of pandas. The most exciting new features will come in pandas 3.0, where Copy-on-Write will be enabled by default.

    Thank you for reading. Feel free to reach out to share your thoughts and feedback.


    What’s new in pandas 2.2 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    What’s new in pandas 2.2

    Go Here to Read this Fast! What’s new in pandas 2.2

  • Insta mules and crypto mixers: How tech is transforming money laundering

    Thomas Macaulay


    On the shallow surface of Instagram, Ramon “Hushpuppi” Abbas was a quintessential influencer. The flamboyant Nigerian portrayed a lavish lifestyle of private jets, luxury cars, and designer clothes. His glamorous adventures had earned him over 2 million followers and millions of dollars. But his posts concealed a darker reality. The 41-year-old was “one of the most prolific money launderers in the world,” according to the FBI. By his own admission, Hushpuppi conspired to launder over $300mn in just 18 months. One of his alleged clients was a certain Kim Jong Un — the supreme leader of North Korea. With his…

    This story continues at The Next Web

    Go Here to Read this Fast! Insta mules and crypto mixers: How tech is transforming money laundering

    Originally appeared here:
    Insta mules and crypto mixers: How tech is transforming money laundering

  • OpenAI and CommonSense Media team up to curate family-friendly GPTs

    Mariella Moon

    You will soon find a kid-friendly section inside OpenAI’s newly opened store for custom GPTs. The company has joined forces with Common Sense Media, a nonprofit organization that rates media and technology based on their suitability for children, to minimize the risks of AI use by teenagers. Together, they intend to create AI guidelines and educational materials for young people, their parents and their educators. The two organizations will also curate a collection of family-friendly GPTs in OpenAI’s GPT store based on Common Sense’s ratings, making it easy to see which ones are suitable for younger users. 

    “Together, Common Sense and OpenAI will work to make sure that AI has a positive impact on all teens and families,” James P. Steyer, founder and CEO of Common Sense Media, said in a statement. “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”

    According to Axios, the partnership was announced at Common Sense’s kids and family summit in San Francisco, where OpenAI CEO Sam Altman shot down the idea that AI is bad for kids and should be kept out of schools. “Humans are tool users and we better teach people to use the tools that are going to be out in the world,” he reportedly said. “To not teach people to use those would be a mistake.” The CEO also said that future high school seniors would be able to operate at a higher level of abstraction and could achieve more that their predecessors with the help of artificial intelligence. 

    This article originally appeared on Engadget at https://www.engadget.com/openai-and-commonsense-media-team-up-to-curate-family-friendly-gpts-074228457.html?src=rss

    Go Here to Read this Fast! OpenAI and CommonSense Media team up to curate family-friendly GPTs

    Originally appeared here:
    OpenAI and CommonSense Media team up to curate family-friendly GPTs

  • Bitcoin tops $43k once again: Is it time to buy more Memeinator tokens?

    Hassan Maishera

    Key takeaways Bitcoin is up by 3% in the last 24 hours and is now trading above $43k once again. Memeinator is set to conclude the 13th stage of its presale and has raised nearly $4 million so far.  Bitcoin now trading above $43k The cryptocurrency market has been bearish over the last two weeks. […]

    The post Bitcoin tops $43k once again: Is it time to buy more Memeinator tokens? appeared first on CoinJournal.

    Go here to Read this Fast! Bitcoin tops $43k once again: Is it time to buy more Memeinator tokens?

    Originally appeared here:
    Bitcoin tops $43k once again: Is it time to buy more Memeinator tokens?

  • Polygon Labs proposes defi protocols as critical infrastructure in new regulatory framework

    Rony Roy

    Polygon Labs, in collaboration with Arktouros law firm, has put forward a new regulatory framework that proposes designating certain decentralized finance (defi) protocols as critical infrastructure crucial to the national and economic security of the US. The suggestion was published…

    Go here to Read this Fast! Polygon Labs proposes defi protocols as critical infrastructure in new regulatory framework

    Originally appeared here:
    Polygon Labs proposes defi protocols as critical infrastructure in new regulatory framework

  • Maker generates $14m in revenue, Ethereum’s earnings surge

    Wahid Pessarlay

    Maker Protocol tops the list in terms of revenue generated over the past month as Total Value Locked in the protocol witnesses swell. According to data provided by Defi Llama, Maker Protocol generated $14.22 million in revenue — also collected…

    Go here to Read this Fast! Maker generates $14m in revenue, Ethereum’s earnings surge

    Originally appeared here:
    Maker generates $14m in revenue, Ethereum’s earnings surge