getscipapers

Buy Me A Coffee Ko-fi Buy Me a Crypto Coffee

Version Pre-release Python Docker GitHub Status License Papers Cloud


Table of Contents

  1. Description
  2. Prerequisites
  3. Installation
  4. Usage
  5. Running in GitHub Codespace
  6. Docker Container
  7. Remarks

Description

Info WIP Experimental

getscipapers is a Python package designed for searching and requesting scientific papers from multiple sources. This project is a work in progress and primarily intended for personal use. It is not a comprehensive solution for accessing scientific papers. Portions of the code were developed with assistance from GitHub Copilot.

Prerequisites

Checklist Tools Keys

Installation

Install Virtualenv Setup

It is recommended to use a virtual environment to avoid conflicts with other Python packages. You can use venv or virtualenv. To set up the environment and install dependencies:

# Clone the repository
git clone https://github.com/hoanganhduc/getscipapers.git
cd getscipapers

# Create and activate a virtual environment (change the path if desired)
python -m venv ~/.getscipapers
source ~/.getscipapers/bin/activate

# Upgrade pip and install dependencies
pip install --upgrade pip
pip install build
pip install -r requirements.txt

# Build and install the package in editable mode
python -m build
pip install -e .

# Clean up build artifacts
rm -rf build/ dist/ *.egg-info/
find . -type d -name __pycache__ -exec rm -rf {} +
find . -type f -name "*.pyc" -delete

Usage

Usage CLI Search

To use the Nexus Search database, start the IPFS daemon (if this is your first time, run ipfs init first) in one terminal:

ipfs daemon

In another terminal, use the getscipapers command to search for and request scientific papers. For usage details, run:

getscipapers --help

Running getscipapers in GitHub Codespace

GitHub Codespaces Cloud Dev Fast

The fastest way to run getscipapers is via GitHub Codespaces. This provides a preconfigured environment, eliminating local setup. To use it:

  1. 🍴 Fork the repository to your GitHub account.
  2. 🔐 (Optional) Set up codespace secrets for your API keys and configurations. See .devcontainer/set-secrets.sh for an example using GitHub CLI.
  3. 💻 Create a new codespace from your forked repository. This will automatically set up the environment with all dependencies installed. You can also use GitHub CLI to create a codespace, for example:

    gh codespace create --repo hoanganhduc/getscipapers --branch master --machine basicLinux32gb
    

    ℹ️ The basicLinux32gb machine type provides 2 cores, 8GB RAM, and 32GB storage. See GitHub Codespaces documentation for more machine types such as standardLinux32gb, premiumLinux, and largePremiumLinux.

  4. 💻 Once the codespace is ready, open a terminal and run getscipapers commands directly.

Docker Container for Running getscipapers

Docker Container Isolated

Overview

Overview Docs

This guide explains how to use getscipapers inside a Docker container. The container includes all dependencies, so you can start downloading scientific papers immediately—no manual setup required.

Quick Start

Quick Start Start

1. Pull and Run the Prebuilt Image

Pull Ready

To get started quickly, pull the latest image from GitHub Container Registry and run it:

docker pull ghcr.io/hoanganhduc/getscipapers:latest
docker run -it --rm -v $(pwd):/workspace ghcr.io/hoanganhduc/getscipapers:latest

This mounts your current directory to /workspace inside the container for easy file access.

2. Build and Run Locally

Build Local

To build the image yourself:

docker build -t getscipapers .
docker run -it --rm -v $(pwd):/workspace getscipapers

3. Run in Detached Mode with Persistent Storage

Persistent Storage Detached

To keep the container running in the background and ensure downloads and configuration persist:

docker run -d \
    --name getscipapers-container \
    --restart always \
    -v $HOME/Downloads:/home/getscipaper/Downloads \
    -v $HOME/.config/getscipapers:/home/getscipaper/.config/getscipapers \
    ghcr.io/hoanganhduc/getscipapers:latest

This setup saves downloaded papers and settings to your host machine. Adjust folder paths as needed.

Optional: Integrate with IPFS

IPFS Integration

To use IPFS with getscipapers, run an IPFS Kubo daemon in a separate container:

docker pull ipfs/kubo:latest
sudo ufw allow 4001
sudo ufw allow 8080
sudo ufw allow 5001

export ipfs_staging=$HOME/.ipfs
export ipfs_data=$HOME/.ipfs

docker run -d \
    --name ipfs_host \
    --restart always \
    -v $ipfs_staging:/export \
    -v $ipfs_data:/data/ipfs \
    -p 4001:4001 \
    -p 8080:8080 \
    -p 5001:5001 \
    ipfs/kubo:latest

This starts the IPFS daemon with persistent storage and required ports. Adjust folder paths as needed.

Running getscipapers Commands

CLI Exec

To run getscipapers inside the container:

docker exec -it getscipapers-container getscipapers --help

Optional: Create a Convenience Script

Script Shortcut

For easier access, create a script at ~/.local/bin/getscipapers:

#!/bin/bash
CONTAINER_NAME="getscipapers-container"

if [ $# -lt 1 ]; then
    echo "Usage: $0 [arguments...]"
    exit 1
fi

COMMAND=("getscipapers" "$@")

if ! command -v docker &> /dev/null; then
    echo "Error: Docker is not installed"
    exit 1
fi

if ! docker ps -q -f name="$CONTAINER_NAME" | grep -q .; then
    echo "Error: Container '$CONTAINER_NAME' is not running"
    exit 1
fi

docker exec -i "$CONTAINER_NAME" "${COMMAND[@]}"

Make it executable:

chmod +x ~/.local/bin/getscipapers

Now you can run getscipapers directly from your terminal:

getscipapers --help

For more information, see the official documentation or repository.

Running Locally with Docker

Docker Local Isolated

You can run getscipapers locally using Docker without installing Python or dependencies on your system.

  1. 🐳 Ensure Docker is installed.
  2. ⬇️ Pull the latest image:
  docker pull ghcr.io/hoanganhduc/getscipapers:latest
  1. ▶️ Run the container, mounting a local directory for downloads or configuration:
  docker run --rm -it -v /path/to/local/dir:/data ghcr.io/hoanganhduc/getscipapers:latest --output /data

Replace /path/to/local/dir with your preferred local directory.

This setup allows you to use getscipapers in an isolated environment, keeping your files accessible on your host machine.

Remarks

Remarks Caution Note