Tag: technews

Building a Random Forest by Hand in Python

Matt Sosna

A deep dive on a powerful and popular algorithm

Continue reading on Towards Data Science »

Originally appeared here:
Building a Random Forest by Hand in Python

Go Here to Read this Fast! Building a Random Forest by Hand in Python

January 30, 2024
What’s new in pandas 2.2
Patrick Hoefler
What’s New in Pandas 2.2

The most interesting things about the new release

Photo by Zoe Nicolaou on Unsplash

pandas 2.2 was released on January 22nd 2024. Let’s take a look at the things this release introduces and how it will help us to improve our pandas workloads. It includes a bunch of improvements that will improve the user experience.

pandas 2.2 brought a few additional improvements that rely on the Apache Arrow ecosystem. Additionally, we added deprecations for changes that are necessary to make Copy-on-Write the default in pandas 3.0. Let’s dig into what this means for you. We will look at the most important changes in detail.

I am part of the pandas core team. I am an open source engineer for Coiled where I work on Dask, including improving the pandas integration.

Improved PyArrow support

We have introduced PyArrow backed DataFrame in pandas 2.0 and continued to improve the integration since then to enable a seamless integration into the pandas API. pandas has accessors for certain dtypes that enable specialized operations, like the string accessor, that provides many string methods. Historically, list and structs were represented as NumPy object dtype, which made working with them quite cumbersome. The Arrow dtype backend now enables tailored accessors for lists and structs, which makes working with these objects a lot easier.

Let’s look at an example:
```
import pyarrow as pa

series = pd.Series(
    [
        {"project": "pandas", "version": "2.2.0"},
        {"project": "numpy", "version": "1.25.2"},
        {"project": "pyarrow", "version": "13.0.0"},
    ],
    dtype=pd.ArrowDtype(
        pa.struct([
            ("project", pa.string()),
            ("version", pa.string()),
        ])
    ),
)
```
This is a series that contains a dictionary in every row. Previously, this was only possible with NumPy object dtype and accessing elements from these rows required iterating over them. The struct accessor now enables direct access to certain attributes:
```
series.struct.field("project")

0     pandas
1      numpy
2    pyarrow
Name: project, dtype: string[pyarrow]
```
The next release will bring a CategoricalAccessor based on Arrow types.

Integrating the Apache ADBC Driver

Historically, pandas relied on SqlAlchemy to read data from an Sql database. This worked very reliably, but it was very slow. Alchemy reads the data row-wise, while pandas has a columnar layout, which makes reading and moving the data into a DataFrame slower than necessary.

The ADBC Driver from the Apache Arrow project enables users to read data in a columnar layout, which brings huge performance improvements. It reads the data and stores them into an Arrow table, which is used to convert to a pandas DataFrame. You can make this conversion zero-copy, if you set dtype_backend=”pyarrow” for read_sql.

Let’s look at an example:
```
import adbc_driver_postgresql.dbapi as pg_dbapi

df = pd.DataFrame(
   [
       [1, 2, 3],
       [4, 5, 6],
   ],
   columns=['a', 'b', 'c']
)
uri = "postgresql://postgres:postgres@localhost/postgres"
with pg_dbapi.connect(uri) as conn:
   df.to_sql("pandas_table", conn, index=False)

# for round-tripping
with pg_dbapi.connect(uri) as conn:
   df2 = pd.read_sql("pandas_table", conn)
```
The ADBC Driver currently supports Postgres and Sqlite. I would recommend everyone to switch over to this driver if you use Postgres, the driver is significantly faster and completely avoids round-tripping through Python objects, thus preserving the database types more reliably. This is the feature that I am personally most excited about.

Adding case_when to the pandas API

Coming from Sql to pandas, users often miss the case-when syntax that provides an easy and clean way to create new columns conditionally. pandas 2.2 adds a new case_when method, that is defined on a Series. It operates similarly to what Sql does.

Let’s look at an example:
```
df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))

default=pd.Series('default', index=df.index)
default.case_when(
    caselist=[
        (df.a == 1, 'first'),
        (df.a.gt(1) & df.b.eq(5), 'second'),
    ],
)
```
The method takes a list of conditions that are evaluated sequentially. The new object is then created with those values in rows where the condition evaluates to True. The method should make it significantly easier for us to create conditional columns.

Copy-on-Write

Copy-on-Write was initially introduced in pandas 1.5.0. The mode will become the default behavior with 3.0, which is hopefully the next pandas release. This means that we have to get our code into a state where it is compliant with the Copy-on-Write rules. pandas 2.2 introduced deprecation warnings for operations that will change behavior.
```
df = pd.DataFrame({"x": [1, 2, 3]})

df["x"][df["x"] > 1] = 100
```
This will now raise a FutureWarning.
```
FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when 
using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to 
update the original DataFrame or Series, because the intermediate object on which we are setting 
values will behave as a copy. A typical example is when you are setting values in a column of a 
DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and 
ensure this keeps updating the original `df`.
```
I wrote an earlier post that goes into more detail about how you can migrate your code and what to expect. There is an additional warning mode for Copy-on-Write that will raise warnings for all cases that change behavior:
```
pd.options.mode.copy_on_write = "warn"
```
Most of those warnings are only noise for the majority of pandas users, which is the reason why they are hidden behind an option.
```
pd.options.mode.copy_on_write = "warn"

df = pd.DataFrame({"a": [1, 2, 3]})
view = df["a"]
view.iloc[0] = 100
```
This will raise a lengthy warning explaining what will change:
```
FutureWarning: Setting a value on a view: behaviour will change in pandas 3.0.
You are mutating a Series or DataFrame object, and currently this mutation will
also have effect on other Series or DataFrame objects that share data with this
object. In pandas 3.0 (with Copy-on-Write), updating one Series or DataFrame object
will never modify another.
```
The short summary of this is: Updating view will never update df, no matter what operation is used. This is most likely not relevant for most.

I would recommend enabling the mode and checking the warnings briefly, but not to pay too much attention to them if you are comfortable that you are not relying on updating two different objects at once.

I would recommend checking out the migration guide for Copy-on-Write that explains the necessary changes in more detail.

Upgrading to the new version

You can install the new pandas version with:
```
pip install -U pandas
```
Or:
```
mamba install -c conda-forge pandas=2.2
```
This will give you the new release in your environment.

Conclusion

We’ve looked at a couple of improvements that will improve performance and user experience for certain aspects of pandas. The most exciting new features will come in pandas 3.0, where Copy-on-Write will be enabled by default.

Thank you for reading. Feel free to reach out to share your thoughts and feedback.

What’s new in pandas 2.2 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
What’s new in pandas 2.2

Go Here to Read this Fast! What’s new in pandas 2.2
January 30, 2024
Predicting Population Decline with Python

Lee Vaughan

Why Thanos should’ve snapped twice

Continue reading on Towards Data Science »

Originally appeared here:
Predicting Population Decline with Python

Go Here to Read this Fast! Predicting Population Decline with Python

January 30, 2024
Insta mules and crypto mixers: How tech is transforming money laundering

Thomas Macaulay

On the shallow surface of Instagram, Ramon “Hushpuppi” Abbas was a quintessential influencer. The flamboyant Nigerian portrayed a lavish lifestyle of private jets, luxury cars, and designer clothes. His glamorous adventures had earned him over 2 million followers and millions of dollars. But his posts concealed a darker reality. The 41-year-old was “one of the most prolific money launderers in the world,” according to the FBI. By his own admission, Hushpuppi conspired to launder over $300mn in just 18 months. One of his alleged clients was a certain Kim Jong Un — the supreme leader of North Korea. With his…

This story continues at The Next Web

Go Here to Read this Fast! Insta mules and crypto mixers: How tech is transforming money laundering

Originally appeared here:
Insta mules and crypto mixers: How tech is transforming money laundering

January 30, 2024
Prime Video ads are here, but is that enough to cancel your membership?

Going ad-free will cost you an extra $2.99 on top of your $14.99 monthly Prime membership. Are you subscription fatigued yet?

Go Here to Read this Fast! Prime Video ads are here, but is that enough to cancel your membership?

Originally appeared here:
Prime Video ads are here, but is that enough to cancel your membership?

January 30, 2024
OpenAI and CommonSense Media team up to curate family-friendly GPTs

Mariella Moon

You will soon find a kid-friendly section inside OpenAI’s newly opened store for custom GPTs. The company has joined forces with Common Sense Media, a nonprofit organization that rates media and technology based on their suitability for children, to minimize the risks of AI use by teenagers. Together, they intend to create AI guidelines and educational materials for young people, their parents and their educators. The two organizations will also curate a collection of family-friendly GPTs in OpenAI’s GPT store based on Common Sense’s ratings, making it easy to see which ones are suitable for younger users.

“Together, Common Sense and OpenAI will work to make sure that AI has a positive impact on all teens and families,” James P. Steyer, founder and CEO of Common Sense Media, said in a statement. “Our guides and curation will be designed to educate families and educators about safe, responsible use of ChatGPT, so that we can collectively avoid any unintended consequences of this emerging technology.”

According to Axios, the partnership was announced at Common Sense’s kids and family summit in San Francisco, where OpenAI CEO Sam Altman shot down the idea that AI is bad for kids and should be kept out of schools. “Humans are tool users and we better teach people to use the tools that are going to be out in the world,” he reportedly said. “To not teach people to use those would be a mistake.” The CEO also said that future high school seniors would be able to operate at a higher level of abstraction and could achieve more that their predecessors with the help of artificial intelligence.

This article originally appeared on Engadget at https://www.engadget.com/openai-and-commonsense-media-team-up-to-curate-family-friendly-gpts-074228457.html?src=rss

Go Here to Read this Fast! OpenAI and CommonSense Media team up to curate family-friendly GPTs

Originally appeared here:
OpenAI and CommonSense Media team up to curate family-friendly GPTs

January 30, 2024
Zoom unveils immersive app for Apple’s Vision Pro headset

Trevor Mogg

Zoom has just unveiled its videoconferencing app designed especially for Apple’s Vision Pro mixed-reality headset, which launches this Friday.

Go Here to Read this Fast! Zoom unveils immersive app for Apple’s Vision Pro headset

Originally appeared here:
Zoom unveils immersive app for Apple’s Vision Pro headset

January 30, 2024
Forget Samsung’s Space Zoom – this OM System super telephoto zoom lens can shoot up to a staggering 2400mm

You can shoot handheld up to a staggering 2400mm with OM System’s new super telephoto zoom.

Go Here to Read this Fast! Forget Samsung’s Space Zoom – this OM System super telephoto zoom lens can shoot up to a staggering 2400mm

Originally appeared here:
Forget Samsung’s Space Zoom – this OM System super telephoto zoom lens can shoot up to a staggering 2400mm

January 30, 2024
OM System OM-1 II is a refresh of one of the world’s best wildlife photography cameras

Improved AF and buffer performance plus neat new computational tricks in OM System’s second-gen flagship model.

Go Here to Read this Fast! OM System OM-1 II is a refresh of one of the world’s best wildlife photography cameras

Originally appeared here:
OM System OM-1 II is a refresh of one of the world’s best wildlife photography cameras

January 30, 2024
4 Examples to Take Your PySpark Skills to Next Level

Soner Yıldırım

Get used to large-scale data processing with PySpark

Continue reading on Towards Data Science »

Originally appeared here:
4 Examples to Take Your PySpark Skills to Next Level

Go Here to Read this Fast! 4 Examples to Take Your PySpark Skills to Next Level

January 30, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Tag: technews

What’s New in Pandas 2.2

The most interesting things about the new release

Improved PyArrow support

Integrating the Apache ADBC Driver

Adding case_when to the pandas API

Copy-on-Write

Upgrading to the new version

Conclusion