Blog

  • Dogecoin’s market dominance hits 1% – Thank you whales, but what’s next?

    Michael Nderitu

    Dogecoin bulls showed up strong as the memecoin demonstrated its dominant position
    Whales have been accumulating, but exchange flows suggested that profit taking is happening

    Dogecoin appear

    The post Dogecoin’s market dominance hits 1% – Thank you whales, but what’s next? appeared first on AMBCrypto.

    Go here to Read this Fast!

    Dogecoin’s market dominance hits 1% – Thank you whales, but what’s next?

    Originally appeared here:

    Dogecoin’s market dominance hits 1% – Thank you whales, but what’s next?

  • Satoshi’s identity to be unveiled soon? What this means for Bitcoin

    Jacob Thomas

    PR firm claimed to unveil Satoshi Nakamoto’s identity on the 31st of October.
    Experts believe that uncovering Nakamoto could have wide-reaching effects on the crypto market.

    A PR firm announ

    The post Satoshi’s identity to be unveiled soon? What this means for Bitcoin appeared first on AMBCrypto.

    Go here to Read this Fast! Satoshi’s identity to be unveiled soon? What this means for Bitcoin

    Originally appeared here:
    Satoshi’s identity to be unveiled soon? What this means for Bitcoin

  • XRP’s price continues to lag behind as funding rates change their tune

    Adewale Olarinde

    XRP declined over the last 24 hours and on the weekly charts too
    Market sentiment has been lower, compared to other major assets

    While the broader cryptocurrency market has seen a notable su

    The post XRP’s price continues to lag behind as funding rates change their tune appeared first on AMBCrypto.

    Go here to Read this Fast!

    XRP’s price continues to lag behind as funding rates change their tune

    Originally appeared here:

    XRP’s price continues to lag behind as funding rates change their tune

  • Minimum Viable MLE

    Lenix Carter

    Building a minimal production-ready sentiment analysis model

    Photo by Stephen Dawson on Unsplash

    What is a production-ready model?

    We hear a lot about productionized machine learning, but what does it really mean to have a model that can thrive in real-world applications?There are plenty of things that go into, and contribute, to the efficacy of a machine learning model in production. For the sake of this article we will be focusing on five of them.

    • Reproducibility
    • Monitoring
    • Testing
    • Automation
    • Version Control

    Serving Inferences

    The most important part of building a production-ready machine learning model is being able to access it.

    For this purpose, we build a fastapi client that serves sentiment analysis responses. We utilize pydantic to ensure structure for the input and output. The model that we use is the base sentiment analysis pipeline from huggingface’s transformers library, allowing us to begin testing with a pre-trained model.

    # Filename: main.py
    from fastapi import FastAPI
    from pydantic import BaseModel
    from transformers import pipeline

    app = FastAPI()
    classifier = pipeline("sentiment-analysis")

    class TextInput(BaseModel):
    text: str

    class SentimentOutput(BaseModel):
    text: str
    sentiment: str
    score: float

    @app.post("/predict", response_model=SentimentOutput)
    async def predict_sentiment(input_data: TextInput):
    result = classifier(input_data.text)[0]
    return SentimentOutput(
    text=input_data.text,
    sentiment=result["label"],
    score=result["score"]
    )

    To ensure that our work is reproducible, we can use a requirements.txt file and pip.

    # Filename: requirements.txt
    # Note: This has all required packages for the final result.

    fastapi==0.68.1
    uvicorn==0.15.0
    transformers==4.30.0
    torch==2.0.0
    pydantic==1.10.0
    numpy==1.24.3
    sentencepiece==0.1.99
    protobuf==3.20.3
    prometheus-client==0.17.1

    To install this, initialize venv in your files and run:pip install -r requirements.txt.

    To host this API simply run: uvicorn main:app –reload.

    Now you have an api that you can query using:

    curl -X POST "http://localhost:8000/predict" 
    -H "Content-Type: application/json"
    -d '{"text": "I love using FastAPI!"}'

    or any API tool you wish (i.e. Postman). You should get a result back that includes the text query, the sentiment predicted, and the confidence of the prediction.

    We will be using GitHub for CI/CD later, so I would recommend initializing and using git in this directory.

    We now have a locally hosted machine learning inference API.

    Further Improving Reproducibility

    To allow our code to have more consistent execution, we will utilize Docker. Docker simulates a lightweight environment that allows applications to run in isolated containers, similar to virtual machines. This isolation ensures that applications can execute consistently across any computer with Docker installed, regardless of the underlying system.

    Firstly, set up Docker for your given operating system.

    # Filename: Dockerfile

    # Use the official Python 3.9 slim image as the base
    FROM python:3.9-slim

    # Set the working directory inside the container to /app
    WORKDIR /app

    # Copy the requirements.txt file to the working directory
    COPY requirements.txt .

    # Install the Python dependencies listed in requirements.txt
    RUN pip install -r requirements.txt

    # Copy the main application file (main.py) to the working directory
    COPY main.py .

    # Define the command to run the FastAPI application with Uvicorn
    CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

    At this point, you should have the directory as below.

    your-project/
    ├── Dockerfile
    ├── requirements.txt
    └── main.py

    Now, you can build the image and run this API using:

    # Build the Docker image
    docker build -t sentiment-api .

    # Run the container
    docker run -p 8000:8000 sentiment-api

    You should now be able to query just as you did before.

    curl -X POST "http://localhost:8000/predict" 
    -H "Content-Type: application/json"
    -d '{"text": "I love using FastAPI!"}'

    We now have a containerized, locally hosted machine learning inference API.

    Adding Basic Monitoring

    In machine learning applications, monitoring is crucial for understanding model performance and ensuring it meets expected accuracy and efficiency. Tools like Prometheus help track metrics such as prediction latency, request counts, and model output distributions, enabling you to identify issues like model drift or resource bottlenecks. This proactive approach ensures that your ML models remain effective over time and can adapt to changing data or usage patterns. In our case, we are focused on prediction time, requests, and gathering information about our queries.

    from fastapi import FastAPI
    from pydantic import BaseModel
    from transformers import pipeline
    from prometheus_client import Counter, Histogram, start_http_server
    import time

    # Start prometheus metrics server on port 8001
    start_http_server(8001)

    app = FastAPI()

    # Metrics
    PREDICTION_TIME = Histogram('prediction_duration_seconds', 'Time spent processing prediction')
    REQUESTS = Counter('prediction_requests_total', 'Total requests')
    SENTIMENT_SCORE = Histogram('sentiment_score', 'Histogram of sentiment scores', buckets=[0.0, 0.25, 0.5, 0.75, 1.0])

    class TextInput(BaseModel):
    text: str

    class SentimentOutput(BaseModel):
    text: str
    sentiment: str
    score: float

    @app.post("/predict", response_model=SentimentOutput)
    async def predict_sentiment(input_data: TextInput):
    REQUESTS.inc()
    start_time = time.time()

    result = classifier(input_data.text)[0]

    score = result["score"]
    SENTIMENT_SCORE.observe(score) # Record the sentiment score

    PREDICTION_TIME.observe(time.time() - start_time)

    return SentimentOutput(
    text=input_data.text,
    sentiment=result["label"],
    score=score
    )

    Utilizing a Custom Model

    While the process of building and fine-tuning a model is not the intent of this project, it is important to understand how a model can be added to this process.

    # Filename: train.py

    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    from datasets import load_dataset
    from torch.utils.data import DataLoader

    def train_model():
    # Load dataset
    full_dataset = load_dataset("stanfordnlp/imdb", split="train")
    dataset = full_dataset.shuffle(seed=42).select(range(10000))

    model_name = "distilbert-base-uncased"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

    optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)

    # Use GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    model.train()

    # Create a DataLoader for batching
    dataloader = DataLoader(dataset, batch_size=8, shuffle=True)

    # Training loop
    num_epochs = 3 # Set the number of epochs
    for epoch in range(num_epochs):
    total_loss = 0
    for batch in dataloader:
    inputs = tokenizer(batch["text"], truncation=True, padding=True, return_tensors="pt", max_length=512).to(device)
    labels = torch.tensor(batch["label"]).to(device)

    optimizer.zero_grad()
    outputs = model(**inputs, labels=labels)
    loss = outputs.loss

    loss.backward()
    optimizer.step()
    total_loss += loss.item()

    avg_loss = total_loss / len(dataloader)
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {avg_loss:.4f}")

    # Save the model
    model.save_pretrained("./model/")
    tokenizer.save_pretrained("./model/")

    # Test the model with sample sentences
    test_sentences = [
    "This movie was fantastic!",
    "I absolutely hated this film.",
    "It was just okay, not great.",
    "An absolute masterpiece!",
    "Waste of time!",
    "A beautiful story and well acted.",
    "Not my type of movie.",
    "It could have been better.",
    "A thrilling adventure from start to finish!",
    "Very disappointing."
    ]

    # Switch model to evaluation mode
    model.eval()

    # Prepare tokenizer for test inputs
    inputs = tokenizer(test_sentences, truncation=True, padding=True, return_tensors="pt", max_length=512).to(device)

    with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=1)

    # Print predictions
    for sentence, prediction in zip(test_sentences, predictions):
    sentiment = "positive" if prediction.item() == 1 else "negative"
    print(f"Input: "{sentence}" -> Predicted sentiment: {sentiment}")

    # Call the function to train the model and test it
    train_model()

    To make sure that we can query our new model that we have trained we have to update a few of our existing files. For instance, in main.py we now use the model from ./model and load it as a pretrained model. Additionally, for comparison’s sake, we add now have two endpoints to use, /predict/naive and predict/trained.

    # Filename: main.py

    from fastapi import FastAPI
    from pydantic import BaseModel
    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    from transformers import pipeline
    from prometheus_client import Counter, Histogram, start_http_server
    import time

    # Start prometheus metrics server on port 8001
    start_http_server(8001)

    app = FastAPI()

    # Load the trained model and tokenizer from the local directory
    model_path = "./model" # Path to your saved model
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    trained_model = AutoModelForSequenceClassification.from_pretrained(model_path)

    # Create pipelines
    naive_classifier = pipeline("sentiment-analysis", device=-1)
    trained_classifier = pipeline("sentiment-analysis", model=trained_model, tokenizer=tokenizer, device=-1)

    # Metrics
    PREDICTION_TIME = Histogram('prediction_duration_seconds', 'Time spent processing prediction')
    REQUESTS = Counter('prediction_requests_total', 'Total requests')
    SENTIMENT_SCORE = Histogram('sentiment_score', 'Histogram of sentiment scores', buckets=[0.0, 0.25, 0.5, 0.75, 1.0])

    class TextInput(BaseModel):
    text: str

    class SentimentOutput(BaseModel):
    text: str
    sentiment: str
    score: float

    @app.post("/predict/naive", response_model=SentimentOutput)
    async def predict_naive_sentiment(input_data: TextInput):
    REQUESTS.inc()
    start_time = time.time()

    result = naive_classifier(input_data.text)[0]

    score = result["score"]
    SENTIMENT_SCORE.observe(score) # Record the sentiment score

    PREDICTION_TIME.observe(time.time() - start_time)

    return SentimentOutput(
    text=input_data.text,
    sentiment=result["label"],
    score=score
    )

    @app.post("/predict/trained", response_model=SentimentOutput)
    async def predict_trained_sentiment(input_data: TextInput):
    REQUESTS.inc()
    start_time = time.time()

    result = trained_classifier(input_data.text)[0]

    score = result["score"]
    SENTIMENT_SCORE.observe(score) # Record the sentiment score

    We also must update our Dockerfile to include our model files.

    # Filename: Dockerfile
    FROM python:3.9-slim

    WORKDIR /app

    COPY requirements.txt .
    RUN pip install -r requirements.txt

    COPY main.py .
    COPY ./model ./model

    CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

    Importantly, if you are using git, make sure that you add the pytorch_model.bin file to git lfs, so that you can push to GitHub. git lfs allows you to use version control on very large files.

    Adding Testing and CI/CD

    CI/CD and testing streamline the deployment of machine learning models by ensuring that code changes are automatically integrated, tested, and deployed, which reduces the risk of errors and enhances model reliability. This process promotes continuous improvement and faster iteration cycles, allowing teams to deliver high-quality, production-ready models more efficiently. Firstly, we create two very basic tests to ensure that our model is performing acceptably.

    # Filename: test_model.py

    import pytest
    from fastapi.testclient import TestClient
    from main import app

    client = TestClient(app)

    def test_positive_sentiment():
    response = client.post(
    "/predict/trained",
    json={"text": "This is amazing!"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["sentiment"] == "LABEL_1"
    assert data["score"] > 0.5


    def test_negative_sentiment():
    response = client.post(
    "/predict/trained",
    json={"text": "This is terrible!"}
    )
    assert response.status_code == 200
    data = response.json()
    assert data["sentiment"] == "LABEL_0"
    assert data["score"] < 0.5

    To test your code, you can simply run pytest or python -m pytest while your endpoint is running.

    However, we will add automated testing CI/CD (continuous integration and continuous delivery) when pushed to GitHub.

    # Filename: .github/workflows/ci_cd.yml

    name: CI/CD

    on: [push]

    jobs:
    test:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
    uses: actions/checkout@v2
    with:
    lfs: true

    - name: Set up Python
    uses: actions/setup-python@v2
    with:
    python-version: '3.9'

    - name: Install dependencies
    run: |
    pip install -r requirements.txt
    pip install pytest httpx

    - name: Run tests
    run: pytest

    Our final project structure should appear as below.

    sentiment-analysis-project/
    ├── .github/
    │ └── workflows/
    │ └── ci_cd.yml
    ├── test_model.py
    ├── main.py
    ├── Dockerfile
    ├── requirements.txt
    └── train.py

    Now, whenever we push to GitHub, it will run an automated process that checks out the code, sets up a Python 3.9 environment, installs dependencies, and runs our tests using pytest.

    Conclusion

    In this project, we’ve developed a production-ready sentiment analysis API that highlights key aspects of deploying machine learning models. While it doesn’t encompass every facet of the field, it provides a representative sampling of essential tasks involved in the process. By examining these components, I hope to clarify concepts you may have encountered but weren’t quite sure how they fit together in a practical setting.


    Minimum Viable MLE was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

    Originally appeared here:
    Minimum Viable MLE

    Go Here to Read this Fast! Minimum Viable MLE

  • Building a PubMed Dataset

    Building a PubMed Dataset

    Diana Rozenshteyn

    Step-by-Step Instructions for Constructing a Dataset of PubMed-Listed Publications on Cardiovascular Disease Research

    Originally appeared here:
    Building a PubMed Dataset

    Go Here to Read this Fast! Building a PubMed Dataset

  • NYT Crossword: answers for Wednesday, October 30

    Sam Hill

    The New York Times crossword puzzle can be tough! If you’re stuck, we’re here to help with a list of today’s clues and answers.

    Go Here to Read this Fast! NYT Crossword: answers for Wednesday, October 30

    Originally appeared here:
    NYT Crossword: answers for Wednesday, October 30

  • NYT Mini Crossword today: puzzle answers for Wednesday, October 30

    Sam Hill

    The NYT Mini crossword might be a lot smaller than a normal crossword, but it isn’t easy. If you’re stuck with today’s crossword, we’ve got answers for you here.

    Go Here to Read this Fast! NYT Mini Crossword today: puzzle answers for Wednesday, October 30

    Originally appeared here:
    NYT Mini Crossword today: puzzle answers for Wednesday, October 30

  • NYT Connections: hints and answers for Thursday, October 31

    Sam Hill

    Connections is the new puzzle game from the New York Times, and it can be quite difficult. If you need a hand with solving today’s puzzle, we’re here to help.

    Go Here to Read this Fast! NYT Connections: hints and answers for Thursday, October 31

    Originally appeared here:
    NYT Connections: hints and answers for Thursday, October 31

  • NYT Strands today: hints, spangram and answers for Thursday, October 31

    Sam Hill

    Strands is a tricky take on the classic word search from NYT Games. If you’re stuck and cannot solve today’s puzzle, we’ve got help for you here.

    Go Here to Read this Fast! NYT Strands today: hints, spangram and answers for Thursday, October 31

    Originally appeared here:
    NYT Strands today: hints, spangram and answers for Thursday, October 31