API Reference

Core Modules

Centralized configuration and credential utilities for getscipapers.

This module keeps runtime settings in one place to reduce the amount of cross-module global state. Functions that previously lived in getpapers.py now reside here so other modules can import and share a single source of truth for paths and credentials.

class getscipapers_hoanganhduc.configuration.Credentials(email='', elsevier_api_key='', wiley_tdm_token='', ieee_api_key='')[source]

Bases: object

Parameters:
  • email (str)

  • elsevier_api_key (str)

  • wiley_tdm_token (str)

  • ieee_api_key (str)

email: str = ''
elsevier_api_key: str = ''
wiley_tdm_token: str = ''
ieee_api_key: str = ''
normalized_email()[source]
Return type:

str

require_email()[source]
Return type:

str

to_dict()[source]
Return type:

Dict[str, str]

getscipapers_hoanganhduc.configuration.ensure_directory_exists(path)[source]
Return type:

None

Parameters:

path (Path)

getscipapers_hoanganhduc.configuration.get_default_download_folder(create=False)[source]
Return type:

str

Parameters:

create (bool)

getscipapers_hoanganhduc.configuration.load_credentials(config_file=None, interactive=None, env_prefix='GETSCIPAPERS_', verbose=False)[source]
Return type:

Credentials

Parameters:
  • config_file (str | None)

  • interactive (bool | None)

  • env_prefix (str)

  • verbose (bool)

getscipapers_hoanganhduc.configuration.require_email(email=None)[source]
Return type:

str

Parameters:

email (str | None)

getscipapers_hoanganhduc.configuration.save_credentials(email=None, elsevier_api_key=None, wiley_tdm_token=None, ieee_api_key=None, config_file=None, verbose=False)[source]
Return type:

bool

Parameters:
  • email (str | None)

  • elsevier_api_key (str | None)

  • wiley_tdm_token (str | None)

  • ieee_api_key (str | None)

  • config_file (str | None)

  • verbose (bool)

Core search and retrieval workflow for getpapers CLI invocations.

This module coordinates searches across Nexus, CrossRef, Unpaywall, and publisher APIs, while handling caching, configuration, and output formatting. Functions here are designed for reuse by other modules (for example request.py) and are intentionally asynchronous-aware so they can run in concurrent contexts.

getscipapers_hoanganhduc.getpapers.vprint(*args, **kwargs)[source]
getscipapers_hoanganhduc.getpapers.ensure_directory_exists(path)[source]
Return type:

None

Parameters:

path (str)

getscipapers_hoanganhduc.getpapers.save_credentials(email=None, elsevier_api_key=None, wiley_tdm_token=None, ieee_api_key=None, config_file=None)[source]
Parameters:
  • email (str | None)

  • elsevier_api_key (str | None)

  • wiley_tdm_token (str | None)

  • ieee_api_key (str | None)

  • config_file (str | None)

getscipapers_hoanganhduc.getpapers.normalize_db_selection(db)[source]

Normalize the --db selection to a concrete list of services.

The CLI accepts comma-delimited strings or multiple --db flags. Any request containing "all" or no explicit services resolves to the full list defined in DB_CHOICES.

Return type:

list[str]

Parameters:

db (str | list[str] | tuple[str, ...] | None)

getscipapers_hoanganhduc.getpapers.load_credentials(config_file=None, interactive=None, env_prefix='GETSCIPAPERS_')[source]
Parameters:
  • config_file (str | None)

  • interactive (bool | None)

  • env_prefix (str)

getscipapers_hoanganhduc.getpapers.fetch_crossref_data(doi)[source]

Fetch data from Crossref API for a given DOI. Returns the message part of the response if successful, None otherwise.

async getscipapers_hoanganhduc.getpapers.is_open_access_unpaywall(doi, email=None)[source]

Check if a DOI is open access using the Unpaywall API. Returns True if open access, False otherwise.

Return type:

bool

Parameters:
  • doi (str)

  • email (str | None)

getscipapers_hoanganhduc.getpapers.resolve_pii_to_doi(pii)[source]

Try to resolve a ScienceDirect PII to a DOI using Elsevier’s API. Returns DOI string if found, else None.

Return type:

str

Parameters:

pii (str)

getscipapers_hoanganhduc.getpapers.extract_mdpi_doi_from_url(url)[source]

Try to extract an MDPI DOI from a URL. Returns DOI string if found, else None.

Return type:

str

Parameters:

url (str)

getscipapers_hoanganhduc.getpapers.fetch_dois_from_url(url, doi_pattern)[source]

Fetch a URL and extract DOIs from its content. Returns a list with up to 3 valid DOIs found, or an empty list if none.

Return type:

list

Parameters:
  • url (str)

  • doi_pattern (str)

getscipapers_hoanganhduc.getpapers.is_valid_doi(doi)[source]

Check if a single DOI is valid using the DOI System Proxy Server REST API. Returns True if the DOI exists and resolves properly. Falls back to Crossref if the API doesn’t work.

Return type:

bool

Parameters:

doi (str)

getscipapers_hoanganhduc.getpapers.validate_dois(dois)[source]

Given a list of DOIs, return only those that are valid (resolve at doi.org or found in Crossref).

Return type:

list

Parameters:

dois (list)

getscipapers_hoanganhduc.getpapers.extract_isbns_from_text(text)[source]

Extract ISBN-13 (preferred) and ISBN-10 numbers from text content. Returns a list of (isbn, doi) tuples, preferring ISBN-13 if found, otherwise ISBN-10. Only includes valid ISBNs (according to Crossref) and their associated DOI(s) if available. If multiple DOIs are found for an ISBN, tries to extract the common DOI prefix (e.g., <common doi>.ch001, <common doi>.ch002). If the common prefix is not a valid DOI, returns None for DOI. Prints details with vprint. Only extracts ISBN-10 if no ISBN-13 is found.

Return type:

list

Parameters:

text (str)

getscipapers_hoanganhduc.getpapers.extract_dois_from_text(text)[source]

Extract DOI numbers from text content. Returns a list of unique, valid paper DOIs. Only keeps DOIs that resolve at https://doi.org/<doi> (HTTP 200, 301, 302). If no DOI is found, tries to extract ISBN and resolve to DOI.

Return type:

list

Parameters:

text (str)

getscipapers_hoanganhduc.getpapers.extract_doi_from_title(title)[source]

Search Crossref for a given paper title and return the DOI if there is a unique match. If Crossref returns more than one matching item, return None.

Return type:

str

Parameters:

title (str)

getscipapers_hoanganhduc.getpapers.extract_dois_from_file(input_file)[source]

Extract DOI numbers from a text file and write them to a new file. Also tries to extract Elsevier PII numbers from the file name and resolve them to DOIs. Additionally attempts to extract ISBN numbers from the file name and resolve them to DOIs via Crossref. As a final fallback, use the file name (cleaned) as a title and try to extract a DOI via Crossref title search. Returns the list of extracted DOIs. Prints status messages with icons for better readability.

Parameters:

input_file (str)

getscipapers_hoanganhduc.getpapers.extract_text_from_pdf(pdf_file, max_pages=None)[source]

Extract text from a PDF file using PyMuPDF (pymupdf) if available, otherwise fall back to PyPDF2. Uses text blocks to intelligently preserve document structure including paragraphs and headings. Returns the extracted text as a string. If max_pages is specified, only extract up to the first N pages.

Return type:

str

Parameters:
  • pdf_file (str)

  • max_pages (int)

getscipapers_hoanganhduc.getpapers.extract_doi_from_pdf(pdf_file)[source]

Extract the most likely DOI found in a PDF file. If multiple DOIs are found, fetch the paper title from Crossref for each DOI, and check if a similar title exists in the first page of the PDF. Select the DOI whose title matches; if none match, select the first found. Also tries to extract Elsevier PII numbers from the file name and resolve them to DOIs. Only considers the first five pages of the PDF. Keeps newlines intact when extracting text from PDF pages. Prints more details for debug in verbose mode.

Fallback: if no DOI can be extracted from text or PII, try to extract ISBN(s) from the file name and resolve them to DOI(s) via Crossref (using extract_isbns_from_text).

Return type:

str

Parameters:

pdf_file (str)

async getscipapers_hoanganhduc.getpapers.search_documents(query, limit=1)[source]

Search for documents using StcGeck, Nexus bot, Crossref, and DOI REST API in order. Build a StcGeck-style document with all fields empty, and iteratively fill fields by searching each source in order. Return up to the requested limit of results. Always tries all sources before returning results. Prints important search steps with icons for better readability.

Parameters:
  • query (str)

  • limit (int)

async getscipapers_hoanganhduc.getpapers.search_with_nexus_bot(query, limit=1)[source]

Search for documents using the Nexus bot (functions imported from .nexus). Returns a list of ScoredDocument-like objects with a .document JSON string. Tries first without proxy, then with proxy if it fails.

Parameters:
  • query (str)

  • limit (int)

getscipapers_hoanganhduc.getpapers.convert_nexus_to_stc_format(nexus_item)[source]

Convert a Nexus bot result (raw dict) to a list of StcGeck compatible documents. Handles both search (multiple results) and DOI (single result) formats. Returns a list of dicts (one per result).

async getscipapers_hoanganhduc.getpapers.search_with_crossref(query, limit=1)[source]
Parameters:
  • query (str)

  • limit (int)

getscipapers_hoanganhduc.getpapers.convert_crossref_to_stc_format(crossref_item)[source]

Convert Crossref API result to StcGeck compatible format

getscipapers_hoanganhduc.getpapers.fetch_doi_rest_api(doi, params=None)[source]

Fetch DOI metadata using the DOI Proxy REST API. Returns the parsed JSON response, or None if not found/error.

Return type:

dict

Parameters:
  • doi (str)

  • params (dict)

getscipapers_hoanganhduc.getpapers.convert_doi_rest_to_stc_format(rest_data)[source]

Convert DOI REST API response to StcGeck compatible document format. Only fills fields available in the REST API response. Handles cases where ‘DESCRIPTION’, ‘EMAIL’, etc. may not be present.

Return type:

dict

Parameters:

rest_data (dict)

async getscipapers_hoanganhduc.getpapers.search_with_doi_rest_api(query, limit=1)[source]

Search for a DOI using the DOI REST API and convert to StcGeck format. Returns a list of ScoredDocument-like objects.

Parameters:
  • query (str)

  • limit (int)

getscipapers_hoanganhduc.getpapers.format_reference(document)[source]
async getscipapers_hoanganhduc.getpapers.search_and_print(query, limit)[source]
Parameters:
  • query (str)

  • limit (int)

getscipapers_hoanganhduc.getpapers.is_elsevier_doi(doi)[source]

Check if a DOI is published by Elsevier. First, try to fetch metadata from DOI REST API and check if publisher is Elsevier. If not available, fallback to prefix/domain check. Returns True if the DOI is published by Elsevier.

Return type:

bool

Parameters:

doi (str)

async getscipapers_hoanganhduc.getpapers.download_elsevier_pdf_by_doi(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', api_key=None)[source]

Try to download a PDF from Elsevier Full-Text API using DOI. Returns True if successful, else False.

Parameters:
  • doi (str)

  • download_folder (str)

  • api_key (str | None)

getscipapers_hoanganhduc.getpapers.is_wiley_doi(doi)[source]

Check if a DOI is published by Wiley. First, try to fetch metadata from DOI REST API and check if publisher is Wiley. If not available, fallback to prefix/domain check. Returns True if the DOI is published by Wiley.

Return type:

bool

Parameters:

doi (str)

async getscipapers_hoanganhduc.getpapers.download_wiley_pdf_by_doi(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', tdm_token=None)[source]

Attempt to download a PDF from Wiley using the DOI and Wiley-TDM-Client-Token. Returns True if successful, else False.

Return type:

bool

Parameters:
  • doi (str)

  • download_folder (str)

  • tdm_token (str | None)

getscipapers_hoanganhduc.getpapers.is_pmc_doi(doi)[source]

Check if a DOI is associated with PubMed Central (PMC). Returns True if the DOI can be found in PMC via NCBI E-utilities.

Return type:

bool

Parameters:

doi (str)

async getscipapers_hoanganhduc.getpapers.download_from_pmc(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]

Download a PDF from PubMed Central (PMC) using the DOI. Returns True if successful, else False.

Return type:

bool

Parameters:
  • doi (str)

  • download_folder (str)

async getscipapers_hoanganhduc.getpapers.download_from_unpaywall(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', email=None)[source]

Download all possible open access PDFs for a DOI via Unpaywall. Each PDF is saved as <safe_doi>_unpaywall_file1.pdf, <safe_doi>_unpaywall_file2.pdf, etc. Returns True if at least one PDF was downloaded, else False. Always uses custom headers to bypass HTTP 418. If the DOI is from PMC, Elsevier or Wiley, try their API first.

Parameters:
  • doi (str)

  • download_folder (str)

  • email (str | None)

async getscipapers_hoanganhduc.getpapers.download_from_nexus(id, doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
Parameters:
  • id (str)

  • doi (str)

  • download_folder (str)

async getscipapers_hoanganhduc.getpapers.download_from_nexus_bot(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]

Download a PDF by DOI using the Nexus bot (via .nexus module). Returns True if successful, else False. Uses decide_proxy_usage function to determine whether to use proxy.

Parameters:
  • doi (str)

  • download_folder (str)

async getscipapers_hoanganhduc.getpapers.download_from_scihub(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
Parameters:
  • doi (str)

  • download_folder (str)

async getscipapers_hoanganhduc.getpapers.download_from_anna_archive(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
Parameters:
  • doi (str)

  • download_folder (str)

async getscipapers_hoanganhduc.getpapers.download_by_doi(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', db='all', no_download=False)[source]
Parameters:
  • doi (str)

  • download_folder (str)

  • db (str | list[str] | tuple[str, ...])

  • no_download (bool)

async getscipapers_hoanganhduc.getpapers.download_by_doi_list(doi_file, download_folder='/home/runner/Downloads/getscipapers/getpapers', db='all', no_download=False)[source]
Parameters:
  • doi_file (str)

  • download_folder (str)

  • db (str | list[str] | tuple[str, ...])

  • no_download (bool)

getscipapers_hoanganhduc.getpapers.print_default_paths()[source]

Print all default paths and configuration file locations used by the script.

async getscipapers_hoanganhduc.getpapers.main(argv=None)[source]
Parameters:

argv (list[str] | None)

Service Integrations

Async interactions with the Nexus Telegram bot.

The routines here handle authentication, command dispatch, and output parsing for the Nexus search bot. They are structured around Telethon event loops so they can be driven from the CLI without blocking other concurrent work.

getscipapers_hoanganhduc.nexus.setup_logging(log_file=None, verbose=False)[source]

Setup logging configuration

getscipapers_hoanganhduc.nexus.debug_print(message)[source]

Print debug message if verbose mode is enabled

getscipapers_hoanganhduc.nexus.info_print(message)[source]

Print info message

getscipapers_hoanganhduc.nexus.error_print(message)[source]

Print error message

getscipapers_hoanganhduc.nexus.get_file_paths()[source]

Get the appropriate file paths based on the operating system, using a single config dir for all except downloads.

getscipapers_hoanganhduc.nexus.get_free_proxies()[source]

Retrieve and store free proxies using the shared proxy helper.

getscipapers_hoanganhduc.nexus.test_proxy_speed(ip, port, timeout=10)[source]

Test proxy speed by making a simple HTTP request through the proxy

Parameters:
  • ip – Proxy IP address

  • port – Proxy port

  • timeout – Request timeout in seconds

Returns:

Response time in milliseconds (0 if failed)

getscipapers_hoanganhduc.nexus.load_proxy_config(proxy)[source]

Load proxy configuration from file or dict

async getscipapers_hoanganhduc.nexus.test_proxy_telegram_connection(proxy_config, timeout=10)[source]

Test if a proxy can successfully connect to Telegram Based on OONI probe methodology for testing Telegram connectivity

async getscipapers_hoanganhduc.nexus.test_and_select_working_proxy()[source]

Test multiple proxies in parallel and select the first working one for Telegram

async getscipapers_hoanganhduc.nexus.test_telegram_connection(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Test connection to Telegram servers with comprehensive diagnostics

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • session_file – Name of the session file

  • proxy – Proxy configuration dict or file path

async getscipapers_hoanganhduc.nexus.decide_proxy_usage(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy_file='/home/runner/.config/getscipapers/nexus/proxy.json', print_result=True)[source]

Decide whether to use a proxy for Telegram connection. If connection works without proxy, return None (no proxy). If not, try default proxy file. If that fails, select a new proxy and try again. :returns: None if no proxy needed,

proxy_file if proxy is needed, False if neither works.

getscipapers_hoanganhduc.nexus.create_telegram_client(api_id, api_hash, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Create TelegramClient with or without proxy

getscipapers_hoanganhduc.nexus.extract_button_info(reply_markup)[source]

Extract button information from reply markup

getscipapers_hoanganhduc.nexus.create_message_handler(bot_entity)[source]

Create message handler for bot replies

async getscipapers_hoanganhduc.nexus.wait_for_reply(get_bot_reply, timeout=30)[source]

Wait for bot reply with timeout

async getscipapers_hoanganhduc.nexus.handle_search_message(get_bot_reply, set_bot_reply)[source]

Handle ‘searching…’ message and wait for actual result

async getscipapers_hoanganhduc.nexus.fetch_recent_messages(client, bot_entity, sent_message)[source]

Fetch recent messages from bot if no immediate reply

async getscipapers_hoanganhduc.nexus.click_callback_button(api_id, api_hash, phone_number, bot_username, message_id, button_data, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Click a callback button in a bot’s message

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • message_id – ID of the message containing the button

  • button_data – The callback data of the button to click

  • session_file – Name of the session file

  • proxy – Proxy configuration dict with keys: type, addr, port, username, password Example: {‘type’: ‘http’, ‘addr’: ‘127.0.0.1’, ‘port’: 8080} or {‘type’: ‘socks5’, ‘addr’: ‘127.0.0.1’, ‘port’: 1080, ‘username’: ‘user’, ‘password’: ‘pass’} or string path to JSON file containing proxy configuration

async getscipapers_hoanganhduc.nexus.send_message_to_bot(api_id, api_hash, phone_number, bot_username, message, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, limit=None)[source]

Send a message from your user account to a Telegram bot and wait for its reply.

Parameters:
  • api_id – Your Telegram API ID (get from my.telegram.org)

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number

  • bot_username – Bot’s username (e.g., ‘your_bot_name’)

  • message – Message text to send (search query or DOI)

  • session_file – Name of the session file to save/load

  • proxy – Proxy configuration dict or file path (see create_telegram_client)

  • limit – Maximum number of search results to fetch (default: 1 for DOI, 5 for search; can be set by user)

Returns:

{

“ok”: True if successful, False or “error” key otherwise, “sent_message”: {

”message_id”: int, “date”: float (timestamp), “text”: str

}, “bot_reply”: {

”message_id”: int, “date”: float (timestamp), “text”: str, # reply text, possibly concatenated for search “buttons”: list of dicts with button info (text, type, callback_data/url)

}

} If an error occurs, returns {“error”: “…”}.

Return type:

dict

async getscipapers_hoanganhduc.nexus.create_session(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session')[source]

Create a new session file interactively

getscipapers_hoanganhduc.nexus.format_result(result)[source]

Format the result in a human-readable way

getscipapers_hoanganhduc.nexus.handle_single_search_result(bot_reply)[source]

Handle a single search result based on whether the first callback button contains “Request”

Parameters:

bot_reply – Dictionary containing bot reply with buttons

Returns:

Dictionary with action type and relevant information

async getscipapers_hoanganhduc.nexus.handle_button_click_logic(bot_reply, proxy=None)[source]

Handle button clicking based on button text - interactive prompts for user

Parameters:
  • bot_reply – Dictionary containing bot reply with buttons

  • proxy – Proxy configuration (same format as other functions)

Returns:

Dictionary with click result or None if no action needed

async getscipapers_hoanganhduc.nexus.download_telegram_file(client, message, download_path=None)[source]

Download a file from a Telegram message

Parameters:
  • client – TelegramClient instance

  • message – Telegram message containing the file

  • download_path – Path where to save the file (optional)

Returns:

Dictionary with download result

async getscipapers_hoanganhduc.nexus.handle_file_download_from_bot_reply(bot_reply, proxy=None)[source]

Handle file download from bot reply if it contains a document

Parameters:
  • bot_reply – Dictionary containing bot reply information

  • proxy – Proxy configuration (same format as other functions)

Returns:

Dictionary with download result or None if no file to download

getscipapers_hoanganhduc.nexus.get_input_with_timeout(prompt, timeout=30, default='y', keep_origin=False)[source]

Get user input with timeout, return default if timeout occurs

async getscipapers_hoanganhduc.nexus.load_credentials_from_file(credentials_path, print_result=True)[source]

Load API credentials from JSON file, validate, and prompt user if invalid or missing.

async getscipapers_hoanganhduc.nexus.test_credentials(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Test if the provided Telegram API credentials are correct by attempting to connect and authorize. Returns a dictionary with the result.

async getscipapers_hoanganhduc.nexus.setup_proxy_configuration(proxy_arg)[source]

Setup proxy configuration - load existing or find new working proxy

async getscipapers_hoanganhduc.nexus.handle_request_button(button_text, callback_data, message_id, proxy_to_use)[source]

Handle request button click

async getscipapers_hoanganhduc.nexus.handle_download_button(button_text, callback_data, message_id, proxy_to_use)[source]

Handle download button click

getscipapers_hoanganhduc.nexus.extract_file_size_from_callback_data(callback_data)[source]

Extract file size information from callback data

Parameters:

callback_data – The callback data string that might contain file size info

Returns:

Dictionary with size information or None if not found

getscipapers_hoanganhduc.nexus.extract_file_size_from_button_text(button_text)[source]

Extract file size information from button text

Parameters:

button_text – The button text string that might contain file size info

Returns:

Dictionary with size information or None if not found

async getscipapers_hoanganhduc.nexus.wait_and_download_file(click_result, proxy_to_use)[source]

Wait for file upload to Telegram and download it

async getscipapers_hoanganhduc.nexus.process_callback_buttons(bot_reply, proxy_to_use)[source]

Process callback buttons from bot reply

async getscipapers_hoanganhduc.nexus.get_latest_messages_from_bot(api_id, api_hash, bot_username, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None)[source]

Get the latest messages from a bot

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • bot_username – Bot’s username

  • session_file – Name of the session file

  • limit – Maximum number of messages to retrieve (default: 10)

  • proxy – Proxy configuration dict or file path

Returns:

Dictionary with success status and messages list

async getscipapers_hoanganhduc.nexus.get_user_profile(api_id, api_hash, phone_number, bot_username, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Get user profile information from Nexus bot by sending /profile command

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • session_file – Name of the session file

  • proxy – Proxy configuration dict or file path

Returns:

Dictionary with user profile information or error

getscipapers_hoanganhduc.nexus.format_profile_result(profile_result)[source]

Format the profile result in a human-readable way

getscipapers_hoanganhduc.nexus.format_messages_result(messages_result)[source]

Format the messages result in a human-readable way

async getscipapers_hoanganhduc.nexus.fetch_and_display_recent_messages(api_id, api_hash, bot_username, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None, display=True)[source]

Fetch recent messages from a bot and optionally display them

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • bot_username – Bot’s username

  • session_file – Name of the session file

  • limit – Maximum number of messages to retrieve (default: 10, max: 100)

  • proxy – Proxy configuration dict or file path

  • display – Whether to display formatted results (default: True)

Returns:

Dictionary with success status and messages list

async getscipapers_hoanganhduc.nexus.fetch_nexus_aaron_messages(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None, display=True)[source]

Fetch recent messages from the @nexus_aaron bot specifically

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • session_file – Name of the session file

  • limit – Maximum number of messages to retrieve (default: 10, max: 100)

  • proxy – Proxy configuration dict or file path

  • display – Whether to display formatted results (default: True)

Returns:

Dictionary with success status and messages list from @nexus_aaron

getscipapers_hoanganhduc.nexus.format_nexus_aaron_messages(messages_result)[source]

Format nexus_aaron messages with specialized formatting for research requests

getscipapers_hoanganhduc.nexus.get_publisher_name_from_doi(doi)[source]

Extract publisher name from DOI using Crossref API

Parameters:

doi – DOI string (e.g., “10.1038/nature12373”)

Returns:

Publisher name string or None if not found

getscipapers_hoanganhduc.nexus.parse_nexus_aaron_request(text)[source]

Parse a nexus_aaron request message to extract structured information

Parameters:

text – The raw message text from nexus_aaron

Returns:

Dictionary with parsed information

getscipapers_hoanganhduc.nexus.parse_nexus_aaron_upload(text)[source]

Parse a nexus_aaron upload/voting message to extract structured information

Parameters:

text – The raw message text from nexus_aaron upload

Returns:

Dictionary with parsed upload information

async getscipapers_hoanganhduc.nexus.upload_file_to_bot(api_id, api_hash, phone_number, bot_username, file_path, message='', session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Upload a file to a Telegram bot with optional message

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • file_path – Path to the file to upload

  • message – Optional message to send with the file (default: “”)

  • session_file – Name of the session file

  • proxy – Proxy configuration dict or file path

Returns:

Dictionary with upload result and bot reply

getscipapers_hoanganhduc.nexus.format_upload_result(upload_result)[source]

Format the upload result in a human-readable way

async getscipapers_hoanganhduc.nexus.upload_file_to_nexus_aaron(api_id, api_hash, phone_number, file_path, message='', session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Upload a file to the @nexus_aaron bot specifically

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • file_path – Path to the file to upload

  • message – Optional message to send with the file (default: “”)

  • session_file – Name of the session file

  • proxy – Proxy configuration dict or file path

Returns:

Dictionary with upload result and bot reply from @nexus_aaron

async getscipapers_hoanganhduc.nexus.simple_upload_to_nexus_aaron(file_path, verbose=False)[source]

Upload a file to the @nexus_aaron bot with minimal input. If the file is a PDF, try to extract the DOI using getpapers. If DOI extraction fails, prompt the user to enter a DOI manually (with timeout). :type file_path: :param file_path: Path to the file to upload. :type file_path: str :type verbose: :param verbose: If True, enable verbose output. :type verbose: bool

Returns:

Upload result.

Return type:

dict

getscipapers_hoanganhduc.nexus.format_nexus_aaron_upload_result(upload_result)[source]

Format the nexus_aaron upload result with specialized formatting

async getscipapers_hoanganhduc.nexus.list_and_reply_to_nexus_aaron_message(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None)[source]

List recent research request messages from @nexus_aaron, allow user to select one, and upload a file as reply

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • session_file – Name of the session file

  • limit – Maximum number of messages to retrieve (default: 10, max: 50)

  • proxy – Proxy configuration dict or file path

Returns:

Dictionary with operation result

getscipapers_hoanganhduc.nexus.format_list_and_reply_result(result)[source]

Format the list and reply result in a human-readable way

async getscipapers_hoanganhduc.nexus.check_doi_availability_on_nexus(api_id, api_hash, phone_number, bot_username, doi, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, download=False)[source]

Check if a DOI is available on Nexus by sending it to the bot and analyzing the response

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • doi – DOI number to check (e.g., “10.1038/nature12373”)

  • session_file – Name of the session file

  • proxy – Proxy configuration dict or file path

  • download – If True, automatically download the paper if available (default: False)

Returns:

Dictionary with availability status and details, including download result if applicable

getscipapers_hoanganhduc.nexus.format_doi_availability_result(availability_result)[source]

Format the DOI availability result in a human-readable way

async getscipapers_hoanganhduc.nexus.batch_check_doi_availability(api_id, api_hash, phone_number, bot_username, doi_list, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, delay=2, download=False)[source]

Check availability of multiple DOIs on Nexus with rate limiting and optional auto-download

Parameters:
  • api_id – Your Telegram API ID

  • api_hash – Your Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • doi_list – List of DOI strings to check

  • session_file – Name of the session file

  • proxy – Proxy configuration dict or file path

  • delay – Delay in seconds between requests to avoid rate limiting (default: 2)

  • download – If True, automatically download papers that are available (default: False)

Returns:

Dictionary with batch results including download information

getscipapers_hoanganhduc.nexus.format_batch_doi_results(batch_results)[source]

Format the batch DOI results in a human-readable way

async getscipapers_hoanganhduc.nexus.download_from_nexus_bot(doi, download_dir=None, bot_username=None)[source]

Download a paper from Nexus based on DOI

Parameters:
  • doi – DOI string to search and download (e.g., “10.1038/nature12373”)

  • download_dir – Target directory to save the file (optional, uses default if None)

  • bot_username – Bot username to use (optional, uses global BOT_USERNAME if None)

Returns:

Dictionary with download result and file information

getscipapers_hoanganhduc.nexus.format_download_from_nexus_bot_result(download_result)[source]

Format the download result in a human-readable way

async getscipapers_hoanganhduc.nexus.request_paper_by_doi(api_id, api_hash, phone_number, bot_username, doi, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]

Request a paper from Nexus by DOI. This will send the DOI to the bot, detect if a request is needed, and click the request button if available.

Parameters:
  • api_id – Telegram API ID

  • api_hash – Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • doi – DOI string to request (e.g., “10.1038/nature12373”)

  • session_file – Session file name

  • proxy – Proxy configuration dict or file path

Returns:

{

“ok”: True if request sent, False or “error” otherwise, “doi”: <doi>, “request_sent”: True/False, “details”: …,

}

Return type:

dict

async getscipapers_hoanganhduc.nexus.batch_request_papers_by_doi(api_id, api_hash, phone_number, bot_username, doi_list, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, delay=2)[source]

Request multiple papers from Nexus by DOI. For each DOI, sends the DOI to the bot, detects if a request is needed, and clicks the request button if available.

Parameters:
  • api_id – Telegram API ID

  • api_hash – Telegram API hash

  • phone_number – Your phone number (not used, kept for compatibility)

  • bot_username – Bot’s username

  • doi_list – List of DOI strings to request

  • session_file – Session file name

  • proxy – Proxy configuration dict or file path

  • delay – Delay in seconds between requests (default: 2)

Returns:

{

“total”: int, “requested”: int, “skipped”: int, “errors”: int, “results”: list of per-DOI results

}

Return type:

dict

async getscipapers_hoanganhduc.nexus.request_papers_by_doi_list(doi_list)[source]

Request one or more papers by DOI using the Nexus bot. Attempts direct connection first, falls back to proxy if needed.

Parameters:

doi_list (list) – List of DOI strings.

Returns:

Summary of request results.

Return type:

dict

getscipapers_hoanganhduc.nexus.print_default_paths()[source]

Print all default file and directory paths used by the script.

async getscipapers_hoanganhduc.nexus.main()[source]

Utility functions for querying the Library Genesis catalog.

These helpers scrape search results and fetch download links so they can be orchestrated by the higher-level request flows. Network and HTML parsing logic live here to keep the CLI modules focused on argument handling.

getscipapers_hoanganhduc.libgen.select_active_libgen_domain(mirrors=['libgen.li', 'libgen.vg', 'libgen.la', 'libgen.bz', 'libgen.gl'], timeout=3)[source]

Returns the first LibGen domain that responds to a simple GET request. Falls back to the default if none respond.

getscipapers_hoanganhduc.libgen.get_default_download_folder()[source]

Returns the default Downloads folder path for the current OS. Creates the folder if it does not exist.

getscipapers_hoanganhduc.libgen.get_default_cache_dir()[source]

Returns the default cache directory for the current OS. Creates the folder if it does not exist.

getscipapers_hoanganhduc.libgen.search_libgen_by_doi(doi, limit=10)[source]

Search for documents on LibGen using a DOI number via the JSON API, and fetch additional details from the edition page. If found, also search Crossref to update missing or incorrect information if possible.

Parameters:
  • doi (str) – The DOI number to search for.

  • limit (int) – Maximum number of results to return.

Returns:

Matching documents with extra details, or empty dict if none found.

Return type:

dict

getscipapers_hoanganhduc.libgen.print_libgen_doi_result(result)[source]

Pretty-print the result of a LibGen DOI search using icons. Only print fields that have non-empty values. Formats ‘series’ to better display journal, volume, and issue if possible. If ‘pages’ looks like an article number (i.e., only a single number, not a range), display as ‘Article Number’.

getscipapers_hoanganhduc.libgen.download_libgen_paper_by_doi(doi, dest_folder=None, preferred_exts=None, verbose=False, print_result=True)[source]

Download the first available file for a given DOI from LibGen.

Parameters:
  • doi (str) – The DOI number to search and download.

  • dest_folder (str) – Folder to save the downloaded file. If None, uses default.

  • preferred_exts (list) – List of preferred file extensions (e.g., [“pdf”, “epub”]).

  • verbose (bool) – If True, print debug information.

  • print_result (bool) – If True, print download summary. If False, suppress output.

Returns:

File path if download succeeded, None otherwise.

Return type:

str or None

getscipapers_hoanganhduc.libgen.search_libgen_by_query(query, limit=10, object_type='f', curtab='f', verbose=False, sort_by_year=True, order_desc=True)[source]

Search for documents on LibGen using a query string by parsing the HTML results. If a DOI is found, also search Crossref to update missing or incorrect information if possible.

Parameters:
  • query (str) – The search query.

  • limit (int) – Maximum number of results to return.

  • object_type (str) – The object type parameter for LibGen (default “f”).

  • curtab (str) – The curtab parameter for LibGen (default “f”).

  • verbose (bool) – If True, print debug information.

  • sort_by_year (bool) – If True, sort results by year.

  • order_desc (bool) – If True, sort descending (newest first).

Returns:

List of matching documents (dicts), or empty list if none found.

Return type:

list

getscipapers_hoanganhduc.libgen.print_libgen_query_results(results)[source]

Pretty-print the results of a LibGen query search using icons and numbering. Handles ‘series’ text for journal/volume/issue, and prints ‘pages’ as article number if appropriate.

getscipapers_hoanganhduc.libgen.interactive_libgen_download(query, limit=10, preferred_exts=None, dest_folder=None, verbose=False)[source]

Search LibGen for a query, print results, and interactively ask user which to download. User can select a single index or a range (e.g., 2-4). Tries all available mirrors for each selected result until download succeeds or all fail. At the end, prints a summary of successful and failed downloads. If verbose is False, only the summary is printed.

getscipapers_hoanganhduc.libgen.fetch_libgen_edition_info(libgen_id, verbose=False)[source]

Fetch extra info from edition.php for a given LibGen ID.

Parameters:
  • libgen_id (str) – The LibGen edition ID.

  • verbose (bool) – If True, print debug info.

Returns:

Extracted info dictionary, or empty dict if not found.

Return type:

dict

getscipapers_hoanganhduc.libgen.file_md5sum(path)[source]
getscipapers_hoanganhduc.libgen.is_file_on_libgen(md5sum, verbose=False)[source]

Check if a file with the given md5sum already exists in LibGen.

Parameters:
  • md5sum (str) – The md5sum of the file.

  • verbose (bool) – If True, print debug info.

Returns:

The file URL if it exists, else None.

Return type:

str or None

getscipapers_hoanganhduc.libgen.upload_file_to_libgen_ftp(filepath, username='anonymous', password='', verbose=False)[source]

Upload a file to ftp://ftp.libgen.bz/upload and return the file URL if successful. Before uploading, check if the file (by md5sum) already exists in LibGen.

Parameters:
  • filepath (str) – Path to the file to upload.

  • username (str) – FTP username (default: ‘anonymous’).

  • password (str) – FTP password (default: ‘’).

  • verbose (bool) – If True, print debug info.

Returns:

The URL of the uploaded file if successful, else None.

Return type:

str or None

getscipapers_hoanganhduc.libgen.create_chrome_driver(headless=True, extra_prefs=None)[source]

Create and return a Selenium Chrome WebDriver with default user data directory and options.

getscipapers_hoanganhduc.libgen.selenium_libgen_login(username='genesis', password='upload', headless=True, verbose=False)[source]

Open Chrome with Selenium, load http://libgen.li/librarian.php, find and follow the login link if present, and login with phpBB forum settings. Checks “remember me” and “hide my online status this session” before login. If already logged in (by detecting upload form), skip login.

getscipapers_hoanganhduc.libgen.selenium_libgen_upload(local_file_path, bib_id, username='genesis', password='upload', headless=True, verbose=False)[source]

Upload a local file to http://libgen.li/librarian.php after logging in with Selenium. Fills the FTP path in the upload form and clicks the Upload button. After upload, finds the bibliography search form, selects the appropriate source (crossref for DOI, goodreads for ISBN), fills the bib_id in the bibliography search input, and clicks the Search button. Then waits for a while and clicks the Register button.

Parameters:
  • local_file_path (str) – Path to the local file to upload.

  • bib_id (str) – DOI or ISBN to associate with the upload.

  • username (str) – LibGen username (default: ‘genesis’).

  • password (str) – LibGen password (default: ‘upload’).

  • headless (bool) – Run browser in headless mode.

  • verbose (bool) – Print debug info.

Returns:

True if upload succeeded, False otherwise.

Return type:

bool

getscipapers_hoanganhduc.libgen.upload_and_register_to_libgen(filepath, verbose=False, headless=True)[source]

Upload and register a file to LibGen using Selenium automation. Tries to extract DOI or ISBN from the file name. If found, registers the file with that ID. If not found, uploads to FTP only (not registered in LibGen database). If the file is a PDF, tries to extract DOI from the PDF using getpapers.extract_doi_from_pdf.

Parameters:
  • filepath (str) – Path to the file to upload.

  • verbose (bool) – Enable verbose/debug output.

Returns:

URL of the uploaded file if successful, else None.

Return type:

str or None

getscipapers_hoanganhduc.libgen.main()[source]

Integration helpers for Z-Library via the third-party Zlibrary API.

These utilities provide a thin wrapper around the upstream client to keep the rest of the codebase consistent with other source modules. Configuration helpers mirror the patterns used elsewhere in the package for clarity.

getscipapers_hoanganhduc.zlib.get_default_config_dir()[source]
getscipapers_hoanganhduc.zlib.get_default_download_dir()[source]
getscipapers_hoanganhduc.zlib.save_credentials(email=None, password=None)[source]
getscipapers_hoanganhduc.zlib.load_credentials(credentials_path=None)[source]

Load credentials from the given path or the default config file. Returns a list: [email, password]. If credentials_path is specified, load from it and save to default location if different. If not specified but default config exists, load default config. If neither exists, prompt user to input and save.

getscipapers_hoanganhduc.zlib.prompt_and_save_credentials()[source]

Prompt the user to input Z-library email and password. If the input is different from the saved credentials, save to default config location. If no response after 30 seconds, quit. Also sets global EMAIL and PASSWORD after user input.

getscipapers_hoanganhduc.zlib.search_zlibrary_books(query, limit=20, email=None, password=None, sort_by_year=True)[source]

Search for books in Z-library using the Zlibrary-API wrapper.

Parameters:
  • query (str) – The search query (book title, author, etc.).

  • limit (int) – Number of results to return.

  • email (str, optional) – Z-library email for login.

  • password (str, optional) – Z-library password for login.

  • sort_by_year (bool) – If True, sort results by year (descending).

Returns:

List of book results (dicts), or empty list if none found.

Return type:

list

getscipapers_hoanganhduc.zlib.print_book_details(book)[source]

Print detailed information about a book result in a human-readable format.

getscipapers_hoanganhduc.zlib.get_profile(email=None, password=None)[source]

Get the user’s Z-library profile information.

Get most popular books (optionally for a specific language).

getscipapers_hoanganhduc.zlib.get_recently()[source]

Get recently added books.

Get user recommended books.

getscipapers_hoanganhduc.zlib.get_user_saved(email=None, password=None, order=None, page=None, limit=20)[source]

Get books saved by the user.

getscipapers_hoanganhduc.zlib.get_user_downloaded(email=None, password=None, order=None, page=None, limit=None)[source]

Get books downloaded by the user.

getscipapers_hoanganhduc.zlib.get_book_info(bookid, hashid, language=None)[source]

Get detailed info for a book.

getscipapers_hoanganhduc.zlib.download_book(book, email=None, password=None, download_dir=None)[source]

Download a book using the Zlibrary API.

getscipapers_hoanganhduc.zlib.is_logged_in(email=None, password=None)[source]

Check if the user is logged in.

getscipapers_hoanganhduc.zlib.interactive_login_search_download(query=None, download_dir=None, limit=20, sort_by_year=True)[source]

Login, search, print results, and allow user to select (single or range) books to download. Optionally takes a search query, a download directory, a limit on number of results, and sort_by_year.

getscipapers_hoanganhduc.zlib.main()[source]