Designing asynchronous pipelines for efficient data processing
Note. This article already assumes that you are familiar with callbacks, promises, and have a basic understanding of the asynchronous paradigm in JavaScript.
Introduction
The asynchronous mechanism is one of the most important concepts in JavaScript and programming in general. It allows a program to separately execute secondary tasks in the background without blocking the current thread from executing primary tasks. When a secondary task is completed, its result is returned and the program continues to run normally. In this context, such secondary tasks are called asynchronous.
Asynchronous tasks typically include making requests to external environments like databases, web APIs or operating systems. If the result of an asynchronous operation does not affect the logic of the main program, then instead of just waiting before the task will have completed, it is much better not to waste this time and continue executing primary tasks.
Nevertheless, sometimes the result of an asynchronous operation is used immediately in the next code lines. In such cases, the succeeding code lines should not be executed until the asynchronous operation is completed.
Note. Before getting to the main part of this article, I would like to provide the motivation for why asynchronicity is considered an important topic in Data Science and why I used JavaScript instead of Python to explain the async / await syntax.
# 01. Why to care about asynchronicity in Data Science?
Data engineering is an inseparable part of Data Science, which mainly consists of designing robust and efficient data pipelines. One of the typical tasks in data engineering includes making regular calls to APIs, databases, or other sources to retrieve data, process it, and store it somewhere.
Imagine a data source that encounters network issues and cannot return the requested data immediately. If we simply make the request in code to that service, we will have to wait quite a bit, while doing nothing. Would not it be better to avoid wasting precious processor time and execute another function, for example? This is where the power of asynchronicity comes into play, which will be the central topic of this article!
#02. Why JavaScript?
Nobody will deny the fact that Python is the most popular current choice for creating Data Science applications. Nevertheless, JavaScript is another language with a huge ecosystem that serves various development purposes, including building web applications that process data retrieved from other services. As it turns out, asynchronicity plays one of the most fundamental roles in JavaScript.
Furthermore, compared to Python, JavaScript has richer built-in support for dealing with asynchronicity and usually serves as a better example to dive deeper into this topic.
Finally, Python has a similar async / await construction. Therefore, the information presented in this article about JavaScript can also be transferable to Python for designing efficient data pipelines.
Asynchronous code in JavaScript
In the first versions of JavaScript, asynchronous code was mainly written with callbacks. Unfortunately, it led developers to a well-known problem named “callback hell”. A lot of times asynchronous code written with raw callbacks led to several nested code scopes which were extremely difficult to read. That is why in 2012 the JavaScript creators introduced promises.
// Example of the "callback hell" problem
functionOne(function () {
functionTwo(function () {
functionThree(function () {
functionFour(function () {
...
});
});
});
});
Promises provide a convenient interface for asynchronous code development. A promise takes into a constructor an asynchronous function which is executed at a certain moment of time in the future. Before the function is executed, the promise is said to be in a pending state. Depending on whether the asynchronous function has been completed successfully or not, the promise changes its state to either fulfilled or rejected respectively. For the last two states, programmers can chain .then()and .catch() methods with a promise to declare the logic of how the result of the asynchronous function should be handled in different scenarios.
Apart from that, a group of promises can be chained by using combination methods like any(), all(), race(), etc.
Shortcomings of promises
Despite the fact that promises have become a significant improvement over callbacks, they are still not ideal, for several reasons:
- Verbosity. Promises usually require writing a lot of boilerplate code. In some cases, creating a promise with a simple functionality requires a few extra lines of code because of its verbose syntax.
- Readability. Having several tasks depending on each other leads to nesting promises one inside another. This infamous problem is very similar to the “callback hell” making code difficult to read and maintain. Furthermore, when dealing with error handling, it is usually hard to follow code logic when an error is propagated through several promise chains.
- Debugging. By checking the stack trace output, it might be challenging to identify the source of an error inside promises as they do not usually provide clear error descriptions.
- Integration with legacy libraries. Many legacy libraries in JavaScript were developed in the past to work with raw callbacks, thus not making it easily compatible with promises. If code is written by using promises, then additional code components should be created to provide compatibility with old libraries.
Async / await
For the most part, the async / await construction was added into JavaScript as synthetic sugar over promises. As the name suggests, it introduces two new code keywords:
- async is used before the function signature and marks the function as asynchronous which always returns a promise (even if a promise is not returned explicitly as it will be wrapped implicitly).
- await is used inside functions marked as async and is declared in the code before asynchronous operations which return a promise. If a line of code contains the await keyword, then the following code lines inside the async function will not be executed until the returned promise is settled (either in the fulfilled or rejected state). This makes sure that if the execution logic of the following lines depends on the result of the asynchronous operation, then they will not be run.
– The await keyword can be used several times inside an async function.
– If await is used inside a function that is not marked as async, the SyntaxErrorwill be thrown.
– The returned result of a function marked with await it the resolved value of a promise.
The async / await usage example is demonstrated in the snippet below.
// Async / await example.
// The code snippet prints start and end words to the console.
function getPromise() {
return new Promise((resolve, reject) => {
setTimeout(() => {
resolve('end');
},
1000);
});
}
// since this function is marked as async, it will return a promise
async function printInformation() {
console.log('start');
const result = await getPromise();
console.log(result) // this line will not be executed until the promise is resolved
}
It is important to understand that await does not block the main JavaScript thread from execution. Instead, it only suspends the enclosing async function (while other program code outside the async function can be run).
Error handling
The async / await construction provides a standard way for error handling with try / catch keywords. To handle errors, it is necessary to wrap all the code that can potentially cause an error (including await declarations) in the try block and write corresponding handle mechanisms in the catch block.
In practice, error handling with try / catch blocks is easier and more readable than achieving the same in promises with .catch() rejection chaining.
// Error handling template inside an async function
async function functionOne() {
try {
...
const result = await functionTwo()
} catch (error) {
...
}
}
Promises vs async / await
async / await is a great alternative to promises. They eliminate the aforementioned shortcomings of promises: the code written with async / await is usually more readable, and maintainable and is a preferable choice for most software engineers.
However, it would be incorrect to deny the importance of promises in JavaScript: in some situations, they are a better option, especially when working with functions returning a promise by default.
Code interchangeability
Let us look at the same code written with async / await and promises. We will assume that our program connects to a database and in case of an established connection it requests data about users to further display them in the UI.
// Example of asynchronous requests handled by async / await
async function functionOne() {
try {
...
const result = await functionTwo()
} catch (error) {
...
}
}
Both asynchronous requests can be easily wrapped by using the await syntax. At each of these two steps, the program will stop code execution until the response is retrieved.
Since something wrong can happen during asynchronous requests (broken connection, data inconsistency, etc.), we should wrap the whole code fragment into a try / catch block. If an error is caught, we display it to the console.
Now let us write the same code fragment with promises:
// Example of asynchronous requests handled by promises
function displayUsers() {
...
connectToDatabase()
.then((response) => {
...
return getData(data);
})
.then((users) => {
showUsers(users);
...
})
.catch((error) => {
console.log(`An error occurred: ${error.message}`);
...
});
}
This nested code looks more verbose and harder to read. In addition, we can notice that every await statement was transformed into a corresponding then() method and that the catch block is now located inside the .catch() method of a promise.
Following the same logic, every async / await code can be rewritten with promises. This statement demonstrates the fact that async / await is just synthetic sugar over promises.
Code written with async / await can be transformed into the promise syntax where each await declaration would correspond to a separate .then() method and exception handling would be performed in the .catch() method.
Fetch example
In this section, we will have a look a real example of how async / await works.
We are going to use the REST countries API which provides demographic information for a requested country in the JSON format by the following URL address: https://restcountries.com/v3.1/name/$country.
Firstly, let us declare a function that will retrieve the main information from the JSON. We are interested in retrieving information regarding the country’s name, its capital, area and population. The JSON is returned in the form of an array where the first object contains all the necessary information. We can access the aforementioned properties by accessing the object’s keys with corresponding names.
const retrieveInformation = function (data) {
data = data[0]
return {
country: data["name"]["common"],
capital: data["capital"][0],
area: `${data["area"]} km`,
population: `{$data["population"]} people`
};
};
Then we will use the fetch API to perform HTTP requests. Fetch is an asynchronous function which returns a promise. Since we immediately need the data returned by fetch, we must wait until the fetch finishes its job before executing the following code lines. To do that, we use the await keyword before fetch.
// Fetch example with async / await
const getCountryDescription = async function (country) {
try {
const response = await fetch(
`https://restcountries.com/v3.1/name/${country}`
);
if (!response.ok) {
throw new Error(`Bad HTTP status of the request (${response.status}).`);
}
const data = await response.json();
console.log(retrieveInformation(data));
} catch (error) {
console.log(
`An error occurred while processing the request.nError message: ${error.message}`
);
}
};
Similarly, we place another await before the .json() method to parse the data which is used immediately after in the code. In case of a bad response status or inability to parse the data, an error is thrown which is then processed in the catch block.
For demonstration purposes, let us also rewrite the code snippet by using promises:
// Fetch example with promises
const getCountryDescription = function (country) {
fetch(`https://restcountries.com/v3.1/name/${country}`)
.then((response) => {
if (!response.ok) {
throw new Error(`Bad HTTP status of the request (${response.status}).`);
}
return response.json();
})
.then((data) => {
console.log(retrieveInformation(data));
})
.catch((error) => {
console.log(
`An error occurred while processing the request. Error message: ${error.message}`
);
});
};
Calling an either function with a provided country name will print its main information:
// The result of calling getCountryDescription("Argentina")
{
country: 'Argentina',
capital: 'Buenos Aires',
area: '27804000 km',
population: '45376763 people'
}
Conclusion
In this article, we have covered the async / await construction in JavaScript which appeared in the language in 2017. Having appeared as an improvement over promises, it allows writing asynchronous code in a synchronous manner eliminating nested code fragments. Its correct usage combined with promises results in a powerful blend making the code as clean as possible.
Lastly, the information presented in this article about JavaScript is also valuable for Python as well, which has the same async / await construction. Personally, if someone wants to dive deeper into asynchronicity, I would recommend focusing more on JavaScript than on Python. Being aware of the abundant tools that exist in JavaScript for developing asynchronous applications provides an easier understanding of the same concepts in other programming languages.
Resources
All images unless otherwise noted are by the author.
Intuitive Explanation of Async / Await in JavaScript was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Intuitive Explanation of Async / Await in JavaScript
Go Here to Read this Fast! Intuitive Explanation of Async / Await in JavaScript