API Reference
Core Modules
Centralized configuration and credential utilities for getscipapers.
This module keeps runtime settings in one place to reduce the amount of
cross-module global state. Functions that previously lived in
getpapers.py now reside here so other modules can import and share a
single source of truth for paths and credentials.
- class getscipapers_hoanganhduc.configuration.Credentials(email='', elsevier_api_key='', wiley_tdm_token='', ieee_api_key='')[source]
Bases:
object- Parameters:
email (str)
elsevier_api_key (str)
wiley_tdm_token (str)
ieee_api_key (str)
-
email:
str= ''
-
elsevier_api_key:
str= ''
-
wiley_tdm_token:
str= ''
-
ieee_api_key:
str= ''
- getscipapers_hoanganhduc.configuration.ensure_directory_exists(path)[source]
- Return type:
None- Parameters:
path (Path)
- getscipapers_hoanganhduc.configuration.get_default_download_folder(create=False)[source]
- Return type:
str- Parameters:
create (bool)
- getscipapers_hoanganhduc.configuration.load_credentials(config_file=None, interactive=None, env_prefix='GETSCIPAPERS_', verbose=False)[source]
- Return type:
- Parameters:
config_file (str | None)
interactive (bool | None)
env_prefix (str)
verbose (bool)
- getscipapers_hoanganhduc.configuration.require_email(email=None)[source]
- Return type:
str- Parameters:
email (str | None)
- getscipapers_hoanganhduc.configuration.save_credentials(email=None, elsevier_api_key=None, wiley_tdm_token=None, ieee_api_key=None, config_file=None, verbose=False)[source]
- Return type:
bool- Parameters:
email (str | None)
elsevier_api_key (str | None)
wiley_tdm_token (str | None)
ieee_api_key (str | None)
config_file (str | None)
verbose (bool)
Core search and retrieval workflow for getpapers CLI invocations.
This module coordinates searches across Nexus, CrossRef, Unpaywall, and
publisher APIs, while handling caching, configuration, and output formatting.
Functions here are designed for reuse by other modules (for example
request.py) and are intentionally asynchronous-aware so they can run in
concurrent contexts.
- getscipapers_hoanganhduc.getpapers.ensure_directory_exists(path)[source]
- Return type:
None- Parameters:
path (str)
- getscipapers_hoanganhduc.getpapers.save_credentials(email=None, elsevier_api_key=None, wiley_tdm_token=None, ieee_api_key=None, config_file=None)[source]
- Parameters:
email (str | None)
elsevier_api_key (str | None)
wiley_tdm_token (str | None)
ieee_api_key (str | None)
config_file (str | None)
- getscipapers_hoanganhduc.getpapers.normalize_db_selection(db)[source]
Normalize the
--dbselection to a concrete list of services.The CLI accepts comma-delimited strings or multiple
--dbflags. Any request containing"all"or no explicit services resolves to the full list defined inDB_CHOICES.- Return type:
list[str]- Parameters:
db (str | list[str] | tuple[str, ...] | None)
- getscipapers_hoanganhduc.getpapers.load_credentials(config_file=None, interactive=None, env_prefix='GETSCIPAPERS_')[source]
- Parameters:
config_file (str | None)
interactive (bool | None)
env_prefix (str)
- getscipapers_hoanganhduc.getpapers.fetch_crossref_data(doi)[source]
Fetch data from Crossref API for a given DOI. Returns the message part of the response if successful, None otherwise.
- async getscipapers_hoanganhduc.getpapers.is_open_access_unpaywall(doi, email=None)[source]
Check if a DOI is open access using the Unpaywall API. Returns True if open access, False otherwise.
- Return type:
bool- Parameters:
doi (str)
email (str | None)
- getscipapers_hoanganhduc.getpapers.resolve_pii_to_doi(pii)[source]
Try to resolve a ScienceDirect PII to a DOI using Elsevier’s API. Returns DOI string if found, else None.
- Return type:
str- Parameters:
pii (str)
- getscipapers_hoanganhduc.getpapers.extract_mdpi_doi_from_url(url)[source]
Try to extract an MDPI DOI from a URL. Returns DOI string if found, else None.
- Return type:
str- Parameters:
url (str)
- getscipapers_hoanganhduc.getpapers.fetch_dois_from_url(url, doi_pattern)[source]
Fetch a URL and extract DOIs from its content. Returns a list with up to 3 valid DOIs found, or an empty list if none.
- Return type:
list- Parameters:
url (str)
doi_pattern (str)
- getscipapers_hoanganhduc.getpapers.is_valid_doi(doi)[source]
Check if a single DOI is valid using the DOI System Proxy Server REST API. Returns True if the DOI exists and resolves properly. Falls back to Crossref if the API doesn’t work.
- Return type:
bool- Parameters:
doi (str)
- getscipapers_hoanganhduc.getpapers.validate_dois(dois)[source]
Given a list of DOIs, return only those that are valid (resolve at doi.org or found in Crossref).
- Return type:
list- Parameters:
dois (list)
- getscipapers_hoanganhduc.getpapers.extract_isbns_from_text(text)[source]
Extract ISBN-13 (preferred) and ISBN-10 numbers from text content. Returns a list of (isbn, doi) tuples, preferring ISBN-13 if found, otherwise ISBN-10. Only includes valid ISBNs (according to Crossref) and their associated DOI(s) if available. If multiple DOIs are found for an ISBN, tries to extract the common DOI prefix (e.g., <common doi>.ch001, <common doi>.ch002). If the common prefix is not a valid DOI, returns None for DOI. Prints details with vprint. Only extracts ISBN-10 if no ISBN-13 is found.
- Return type:
list- Parameters:
text (str)
- getscipapers_hoanganhduc.getpapers.extract_dois_from_text(text)[source]
Extract DOI numbers from text content. Returns a list of unique, valid paper DOIs. Only keeps DOIs that resolve at https://doi.org/<doi> (HTTP 200, 301, 302). If no DOI is found, tries to extract ISBN and resolve to DOI.
- Return type:
list- Parameters:
text (str)
- getscipapers_hoanganhduc.getpapers.extract_doi_from_title(title)[source]
Search Crossref for a given paper title and return the DOI if there is a unique match. If Crossref returns more than one matching item, return None.
- Return type:
str- Parameters:
title (str)
- getscipapers_hoanganhduc.getpapers.extract_dois_from_file(input_file)[source]
Extract DOI numbers from a text file and write them to a new file. Also tries to extract Elsevier PII numbers from the file name and resolve them to DOIs. Additionally attempts to extract ISBN numbers from the file name and resolve them to DOIs via Crossref. As a final fallback, use the file name (cleaned) as a title and try to extract a DOI via Crossref title search. Returns the list of extracted DOIs. Prints status messages with icons for better readability.
- Parameters:
input_file (str)
- getscipapers_hoanganhduc.getpapers.extract_text_from_pdf(pdf_file, max_pages=None)[source]
Extract text from a PDF file using PyMuPDF (pymupdf) if available, otherwise fall back to PyPDF2. Uses text blocks to intelligently preserve document structure including paragraphs and headings. Returns the extracted text as a string. If max_pages is specified, only extract up to the first N pages.
- Return type:
str- Parameters:
pdf_file (str)
max_pages (int)
- getscipapers_hoanganhduc.getpapers.extract_doi_from_pdf(pdf_file)[source]
Extract the most likely DOI found in a PDF file. If multiple DOIs are found, fetch the paper title from Crossref for each DOI, and check if a similar title exists in the first page of the PDF. Select the DOI whose title matches; if none match, select the first found. Also tries to extract Elsevier PII numbers from the file name and resolve them to DOIs. Only considers the first five pages of the PDF. Keeps newlines intact when extracting text from PDF pages. Prints more details for debug in verbose mode.
Fallback: if no DOI can be extracted from text or PII, try to extract ISBN(s) from the file name and resolve them to DOI(s) via Crossref (using extract_isbns_from_text).
- Return type:
str- Parameters:
pdf_file (str)
- async getscipapers_hoanganhduc.getpapers.search_documents(query, limit=1)[source]
Search for documents using StcGeck, Nexus bot, Crossref, and DOI REST API in order. Build a StcGeck-style document with all fields empty, and iteratively fill fields by searching each source in order. Return up to the requested limit of results. Always tries all sources before returning results. Prints important search steps with icons for better readability.
- Parameters:
query (str)
limit (int)
- async getscipapers_hoanganhduc.getpapers.search_with_nexus_bot(query, limit=1)[source]
Search for documents using the Nexus bot (functions imported from .nexus). Returns a list of ScoredDocument-like objects with a .document JSON string. Tries first without proxy, then with proxy if it fails.
- Parameters:
query (str)
limit (int)
- getscipapers_hoanganhduc.getpapers.convert_nexus_to_stc_format(nexus_item)[source]
Convert a Nexus bot result (raw dict) to a list of StcGeck compatible documents. Handles both search (multiple results) and DOI (single result) formats. Returns a list of dicts (one per result).
- async getscipapers_hoanganhduc.getpapers.search_with_crossref(query, limit=1)[source]
- Parameters:
query (str)
limit (int)
- getscipapers_hoanganhduc.getpapers.convert_crossref_to_stc_format(crossref_item)[source]
Convert Crossref API result to StcGeck compatible format
- getscipapers_hoanganhduc.getpapers.fetch_doi_rest_api(doi, params=None)[source]
Fetch DOI metadata using the DOI Proxy REST API. Returns the parsed JSON response, or None if not found/error.
- Return type:
dict- Parameters:
doi (str)
params (dict)
- getscipapers_hoanganhduc.getpapers.convert_doi_rest_to_stc_format(rest_data)[source]
Convert DOI REST API response to StcGeck compatible document format. Only fills fields available in the REST API response. Handles cases where ‘DESCRIPTION’, ‘EMAIL’, etc. may not be present.
- Return type:
dict- Parameters:
rest_data (dict)
- async getscipapers_hoanganhduc.getpapers.search_with_doi_rest_api(query, limit=1)[source]
Search for a DOI using the DOI REST API and convert to StcGeck format. Returns a list of ScoredDocument-like objects.
- Parameters:
query (str)
limit (int)
- async getscipapers_hoanganhduc.getpapers.search_and_print(query, limit)[source]
- Parameters:
query (str)
limit (int)
- getscipapers_hoanganhduc.getpapers.is_elsevier_doi(doi)[source]
Check if a DOI is published by Elsevier. First, try to fetch metadata from DOI REST API and check if publisher is Elsevier. If not available, fallback to prefix/domain check. Returns True if the DOI is published by Elsevier.
- Return type:
bool- Parameters:
doi (str)
- async getscipapers_hoanganhduc.getpapers.download_elsevier_pdf_by_doi(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', api_key=None)[source]
Try to download a PDF from Elsevier Full-Text API using DOI. Returns True if successful, else False.
- Parameters:
doi (str)
download_folder (str)
api_key (str | None)
- getscipapers_hoanganhduc.getpapers.is_wiley_doi(doi)[source]
Check if a DOI is published by Wiley. First, try to fetch metadata from DOI REST API and check if publisher is Wiley. If not available, fallback to prefix/domain check. Returns True if the DOI is published by Wiley.
- Return type:
bool- Parameters:
doi (str)
- async getscipapers_hoanganhduc.getpapers.download_wiley_pdf_by_doi(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', tdm_token=None)[source]
Attempt to download a PDF from Wiley using the DOI and Wiley-TDM-Client-Token. Returns True if successful, else False.
- Return type:
bool- Parameters:
doi (str)
download_folder (str)
tdm_token (str | None)
- getscipapers_hoanganhduc.getpapers.is_pmc_doi(doi)[source]
Check if a DOI is associated with PubMed Central (PMC). Returns True if the DOI can be found in PMC via NCBI E-utilities.
- Return type:
bool- Parameters:
doi (str)
- async getscipapers_hoanganhduc.getpapers.download_from_pmc(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
Download a PDF from PubMed Central (PMC) using the DOI. Returns True if successful, else False.
- Return type:
bool- Parameters:
doi (str)
download_folder (str)
- async getscipapers_hoanganhduc.getpapers.download_from_unpaywall(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', email=None)[source]
Download all possible open access PDFs for a DOI via Unpaywall. Each PDF is saved as <safe_doi>_unpaywall_file1.pdf, <safe_doi>_unpaywall_file2.pdf, etc. Returns True if at least one PDF was downloaded, else False. Always uses custom headers to bypass HTTP 418. If the DOI is from PMC, Elsevier or Wiley, try their API first.
- Parameters:
doi (str)
download_folder (str)
email (str | None)
- async getscipapers_hoanganhduc.getpapers.download_from_nexus(id, doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
- Parameters:
id (str)
doi (str)
download_folder (str)
- async getscipapers_hoanganhduc.getpapers.download_from_nexus_bot(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
Download a PDF by DOI using the Nexus bot (via .nexus module). Returns True if successful, else False. Uses decide_proxy_usage function to determine whether to use proxy.
- Parameters:
doi (str)
download_folder (str)
- async getscipapers_hoanganhduc.getpapers.download_from_scihub(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
- Parameters:
doi (str)
download_folder (str)
- async getscipapers_hoanganhduc.getpapers.download_from_anna_archive(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers')[source]
- Parameters:
doi (str)
download_folder (str)
- async getscipapers_hoanganhduc.getpapers.download_by_doi(doi, download_folder='/home/runner/Downloads/getscipapers/getpapers', db='all', no_download=False)[source]
- Parameters:
doi (str)
download_folder (str)
db (str | list[str] | tuple[str, ...])
no_download (bool)
- async getscipapers_hoanganhduc.getpapers.download_by_doi_list(doi_file, download_folder='/home/runner/Downloads/getscipapers/getpapers', db='all', no_download=False)[source]
- Parameters:
doi_file (str)
download_folder (str)
db (str | list[str] | tuple[str, ...])
no_download (bool)
Service Integrations
Async interactions with the Nexus Telegram bot.
The routines here handle authentication, command dispatch, and output parsing
for the Nexus search bot. They are structured around Telethon event loops
so they can be driven from the CLI without blocking other concurrent work.
- getscipapers_hoanganhduc.nexus.setup_logging(log_file=None, verbose=False)[source]
Setup logging configuration
- getscipapers_hoanganhduc.nexus.debug_print(message)[source]
Print debug message if verbose mode is enabled
- getscipapers_hoanganhduc.nexus.get_file_paths()[source]
Get the appropriate file paths based on the operating system, using a single config dir for all except downloads.
- getscipapers_hoanganhduc.nexus.get_free_proxies()[source]
Retrieve and store free proxies using the shared proxy helper.
- getscipapers_hoanganhduc.nexus.test_proxy_speed(ip, port, timeout=10)[source]
Test proxy speed by making a simple HTTP request through the proxy
- Parameters:
ip – Proxy IP address
port – Proxy port
timeout – Request timeout in seconds
- Returns:
Response time in milliseconds (0 if failed)
- getscipapers_hoanganhduc.nexus.load_proxy_config(proxy)[source]
Load proxy configuration from file or dict
- async getscipapers_hoanganhduc.nexus.test_proxy_telegram_connection(proxy_config, timeout=10)[source]
Test if a proxy can successfully connect to Telegram Based on OONI probe methodology for testing Telegram connectivity
- async getscipapers_hoanganhduc.nexus.test_and_select_working_proxy()[source]
Test multiple proxies in parallel and select the first working one for Telegram
- async getscipapers_hoanganhduc.nexus.test_telegram_connection(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Test connection to Telegram servers with comprehensive diagnostics
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
session_file – Name of the session file
proxy – Proxy configuration dict or file path
- async getscipapers_hoanganhduc.nexus.decide_proxy_usage(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy_file='/home/runner/.config/getscipapers/nexus/proxy.json', print_result=True)[source]
Decide whether to use a proxy for Telegram connection. If connection works without proxy, return None (no proxy). If not, try default proxy file. If that fails, select a new proxy and try again. :returns: None if no proxy needed,
proxy_file if proxy is needed, False if neither works.
- getscipapers_hoanganhduc.nexus.create_telegram_client(api_id, api_hash, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Create TelegramClient with or without proxy
- getscipapers_hoanganhduc.nexus.extract_button_info(reply_markup)[source]
Extract button information from reply markup
- getscipapers_hoanganhduc.nexus.create_message_handler(bot_entity)[source]
Create message handler for bot replies
- async getscipapers_hoanganhduc.nexus.wait_for_reply(get_bot_reply, timeout=30)[source]
Wait for bot reply with timeout
- async getscipapers_hoanganhduc.nexus.handle_search_message(get_bot_reply, set_bot_reply)[source]
Handle ‘searching…’ message and wait for actual result
- async getscipapers_hoanganhduc.nexus.fetch_recent_messages(client, bot_entity, sent_message)[source]
Fetch recent messages from bot if no immediate reply
- async getscipapers_hoanganhduc.nexus.click_callback_button(api_id, api_hash, phone_number, bot_username, message_id, button_data, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Click a callback button in a bot’s message
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
message_id – ID of the message containing the button
button_data – The callback data of the button to click
session_file – Name of the session file
proxy – Proxy configuration dict with keys: type, addr, port, username, password Example: {‘type’: ‘http’, ‘addr’: ‘127.0.0.1’, ‘port’: 8080} or {‘type’: ‘socks5’, ‘addr’: ‘127.0.0.1’, ‘port’: 1080, ‘username’: ‘user’, ‘password’: ‘pass’} or string path to JSON file containing proxy configuration
- async getscipapers_hoanganhduc.nexus.send_message_to_bot(api_id, api_hash, phone_number, bot_username, message, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, limit=None)[source]
Send a message from your user account to a Telegram bot and wait for its reply.
- Parameters:
api_id – Your Telegram API ID (get from my.telegram.org)
api_hash – Your Telegram API hash
phone_number – Your phone number
bot_username – Bot’s username (e.g., ‘your_bot_name’)
message – Message text to send (search query or DOI)
session_file – Name of the session file to save/load
proxy – Proxy configuration dict or file path (see create_telegram_client)
limit – Maximum number of search results to fetch (default: 1 for DOI, 5 for search; can be set by user)
- Returns:
- {
“ok”: True if successful, False or “error” key otherwise, “sent_message”: {
”message_id”: int, “date”: float (timestamp), “text”: str
}, “bot_reply”: {
”message_id”: int, “date”: float (timestamp), “text”: str, # reply text, possibly concatenated for search “buttons”: list of dicts with button info (text, type, callback_data/url)
}
} If an error occurs, returns {“error”: “…”}.
- Return type:
dict
- async getscipapers_hoanganhduc.nexus.create_session(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session')[source]
Create a new session file interactively
- getscipapers_hoanganhduc.nexus.format_result(result)[source]
Format the result in a human-readable way
- getscipapers_hoanganhduc.nexus.handle_single_search_result(bot_reply)[source]
Handle a single search result based on whether the first callback button contains “Request”
- Parameters:
bot_reply – Dictionary containing bot reply with buttons
- Returns:
Dictionary with action type and relevant information
- async getscipapers_hoanganhduc.nexus.handle_button_click_logic(bot_reply, proxy=None)[source]
Handle button clicking based on button text - interactive prompts for user
- Parameters:
bot_reply – Dictionary containing bot reply with buttons
proxy – Proxy configuration (same format as other functions)
- Returns:
Dictionary with click result or None if no action needed
- async getscipapers_hoanganhduc.nexus.download_telegram_file(client, message, download_path=None)[source]
Download a file from a Telegram message
- Parameters:
client – TelegramClient instance
message – Telegram message containing the file
download_path – Path where to save the file (optional)
- Returns:
Dictionary with download result
- async getscipapers_hoanganhduc.nexus.handle_file_download_from_bot_reply(bot_reply, proxy=None)[source]
Handle file download from bot reply if it contains a document
- Parameters:
bot_reply – Dictionary containing bot reply information
proxy – Proxy configuration (same format as other functions)
- Returns:
Dictionary with download result or None if no file to download
- getscipapers_hoanganhduc.nexus.get_input_with_timeout(prompt, timeout=30, default='y', keep_origin=False)[source]
Get user input with timeout, return default if timeout occurs
- async getscipapers_hoanganhduc.nexus.load_credentials_from_file(credentials_path, print_result=True)[source]
Load API credentials from JSON file, validate, and prompt user if invalid or missing.
- async getscipapers_hoanganhduc.nexus.test_credentials(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Test if the provided Telegram API credentials are correct by attempting to connect and authorize. Returns a dictionary with the result.
- async getscipapers_hoanganhduc.nexus.setup_proxy_configuration(proxy_arg)[source]
Setup proxy configuration - load existing or find new working proxy
- async getscipapers_hoanganhduc.nexus.handle_request_button(button_text, callback_data, message_id, proxy_to_use)[source]
Handle request button click
- async getscipapers_hoanganhduc.nexus.handle_download_button(button_text, callback_data, message_id, proxy_to_use)[source]
Handle download button click
- getscipapers_hoanganhduc.nexus.extract_file_size_from_callback_data(callback_data)[source]
Extract file size information from callback data
- Parameters:
callback_data – The callback data string that might contain file size info
- Returns:
Dictionary with size information or None if not found
- getscipapers_hoanganhduc.nexus.extract_file_size_from_button_text(button_text)[source]
Extract file size information from button text
- Parameters:
button_text – The button text string that might contain file size info
- Returns:
Dictionary with size information or None if not found
- async getscipapers_hoanganhduc.nexus.wait_and_download_file(click_result, proxy_to_use)[source]
Wait for file upload to Telegram and download it
- async getscipapers_hoanganhduc.nexus.process_callback_buttons(bot_reply, proxy_to_use)[source]
Process callback buttons from bot reply
- async getscipapers_hoanganhduc.nexus.get_latest_messages_from_bot(api_id, api_hash, bot_username, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None)[source]
Get the latest messages from a bot
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
bot_username – Bot’s username
session_file – Name of the session file
limit – Maximum number of messages to retrieve (default: 10)
proxy – Proxy configuration dict or file path
- Returns:
Dictionary with success status and messages list
- async getscipapers_hoanganhduc.nexus.get_user_profile(api_id, api_hash, phone_number, bot_username, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Get user profile information from Nexus bot by sending /profile command
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
session_file – Name of the session file
proxy – Proxy configuration dict or file path
- Returns:
Dictionary with user profile information or error
- getscipapers_hoanganhduc.nexus.format_profile_result(profile_result)[source]
Format the profile result in a human-readable way
- getscipapers_hoanganhduc.nexus.format_messages_result(messages_result)[source]
Format the messages result in a human-readable way
- async getscipapers_hoanganhduc.nexus.fetch_and_display_recent_messages(api_id, api_hash, bot_username, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None, display=True)[source]
Fetch recent messages from a bot and optionally display them
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
bot_username – Bot’s username
session_file – Name of the session file
limit – Maximum number of messages to retrieve (default: 10, max: 100)
proxy – Proxy configuration dict or file path
display – Whether to display formatted results (default: True)
- Returns:
Dictionary with success status and messages list
- async getscipapers_hoanganhduc.nexus.fetch_nexus_aaron_messages(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None, display=True)[source]
Fetch recent messages from the @nexus_aaron bot specifically
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
session_file – Name of the session file
limit – Maximum number of messages to retrieve (default: 10, max: 100)
proxy – Proxy configuration dict or file path
display – Whether to display formatted results (default: True)
- Returns:
Dictionary with success status and messages list from @nexus_aaron
- getscipapers_hoanganhduc.nexus.format_nexus_aaron_messages(messages_result)[source]
Format nexus_aaron messages with specialized formatting for research requests
- getscipapers_hoanganhduc.nexus.get_publisher_name_from_doi(doi)[source]
Extract publisher name from DOI using Crossref API
- Parameters:
doi – DOI string (e.g., “10.1038/nature12373”)
- Returns:
Publisher name string or None if not found
- getscipapers_hoanganhduc.nexus.parse_nexus_aaron_request(text)[source]
Parse a nexus_aaron request message to extract structured information
- Parameters:
text – The raw message text from nexus_aaron
- Returns:
Dictionary with parsed information
- getscipapers_hoanganhduc.nexus.parse_nexus_aaron_upload(text)[source]
Parse a nexus_aaron upload/voting message to extract structured information
- Parameters:
text – The raw message text from nexus_aaron upload
- Returns:
Dictionary with parsed upload information
- async getscipapers_hoanganhduc.nexus.upload_file_to_bot(api_id, api_hash, phone_number, bot_username, file_path, message='', session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Upload a file to a Telegram bot with optional message
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
file_path – Path to the file to upload
message – Optional message to send with the file (default: “”)
session_file – Name of the session file
proxy – Proxy configuration dict or file path
- Returns:
Dictionary with upload result and bot reply
- getscipapers_hoanganhduc.nexus.format_upload_result(upload_result)[source]
Format the upload result in a human-readable way
- async getscipapers_hoanganhduc.nexus.upload_file_to_nexus_aaron(api_id, api_hash, phone_number, file_path, message='', session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Upload a file to the @nexus_aaron bot specifically
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
file_path – Path to the file to upload
message – Optional message to send with the file (default: “”)
session_file – Name of the session file
proxy – Proxy configuration dict or file path
- Returns:
Dictionary with upload result and bot reply from @nexus_aaron
- async getscipapers_hoanganhduc.nexus.simple_upload_to_nexus_aaron(file_path, verbose=False)[source]
Upload a file to the @nexus_aaron bot with minimal input. If the file is a PDF, try to extract the DOI using getpapers. If DOI extraction fails, prompt the user to enter a DOI manually (with timeout). :type file_path: :param file_path: Path to the file to upload. :type file_path: str :type verbose: :param verbose: If True, enable verbose output. :type verbose: bool
- Returns:
Upload result.
- Return type:
dict
- getscipapers_hoanganhduc.nexus.format_nexus_aaron_upload_result(upload_result)[source]
Format the nexus_aaron upload result with specialized formatting
- async getscipapers_hoanganhduc.nexus.list_and_reply_to_nexus_aaron_message(api_id, api_hash, phone_number, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', limit=10, proxy=None)[source]
List recent research request messages from @nexus_aaron, allow user to select one, and upload a file as reply
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
session_file – Name of the session file
limit – Maximum number of messages to retrieve (default: 10, max: 50)
proxy – Proxy configuration dict or file path
- Returns:
Dictionary with operation result
- getscipapers_hoanganhduc.nexus.format_list_and_reply_result(result)[source]
Format the list and reply result in a human-readable way
- async getscipapers_hoanganhduc.nexus.check_doi_availability_on_nexus(api_id, api_hash, phone_number, bot_username, doi, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, download=False)[source]
Check if a DOI is available on Nexus by sending it to the bot and analyzing the response
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
doi – DOI number to check (e.g., “10.1038/nature12373”)
session_file – Name of the session file
proxy – Proxy configuration dict or file path
download – If True, automatically download the paper if available (default: False)
- Returns:
Dictionary with availability status and details, including download result if applicable
- getscipapers_hoanganhduc.nexus.format_doi_availability_result(availability_result)[source]
Format the DOI availability result in a human-readable way
- async getscipapers_hoanganhduc.nexus.batch_check_doi_availability(api_id, api_hash, phone_number, bot_username, doi_list, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, delay=2, download=False)[source]
Check availability of multiple DOIs on Nexus with rate limiting and optional auto-download
- Parameters:
api_id – Your Telegram API ID
api_hash – Your Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
doi_list – List of DOI strings to check
session_file – Name of the session file
proxy – Proxy configuration dict or file path
delay – Delay in seconds between requests to avoid rate limiting (default: 2)
download – If True, automatically download papers that are available (default: False)
- Returns:
Dictionary with batch results including download information
- getscipapers_hoanganhduc.nexus.format_batch_doi_results(batch_results)[source]
Format the batch DOI results in a human-readable way
- async getscipapers_hoanganhduc.nexus.download_from_nexus_bot(doi, download_dir=None, bot_username=None)[source]
Download a paper from Nexus based on DOI
- Parameters:
doi – DOI string to search and download (e.g., “10.1038/nature12373”)
download_dir – Target directory to save the file (optional, uses default if None)
bot_username – Bot username to use (optional, uses global BOT_USERNAME if None)
- Returns:
Dictionary with download result and file information
- getscipapers_hoanganhduc.nexus.format_download_from_nexus_bot_result(download_result)[source]
Format the download result in a human-readable way
- async getscipapers_hoanganhduc.nexus.request_paper_by_doi(api_id, api_hash, phone_number, bot_username, doi, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None)[source]
Request a paper from Nexus by DOI. This will send the DOI to the bot, detect if a request is needed, and click the request button if available.
- Parameters:
api_id – Telegram API ID
api_hash – Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
doi – DOI string to request (e.g., “10.1038/nature12373”)
session_file – Session file name
proxy – Proxy configuration dict or file path
- Returns:
- {
“ok”: True if request sent, False or “error” otherwise, “doi”: <doi>, “request_sent”: True/False, “details”: …,
}
- Return type:
dict
- async getscipapers_hoanganhduc.nexus.batch_request_papers_by_doi(api_id, api_hash, phone_number, bot_username, doi_list, session_file='/home/runner/.config/getscipapers/nexus/telegram_session.session', proxy=None, delay=2)[source]
Request multiple papers from Nexus by DOI. For each DOI, sends the DOI to the bot, detects if a request is needed, and clicks the request button if available.
- Parameters:
api_id – Telegram API ID
api_hash – Telegram API hash
phone_number – Your phone number (not used, kept for compatibility)
bot_username – Bot’s username
doi_list – List of DOI strings to request
session_file – Session file name
proxy – Proxy configuration dict or file path
delay – Delay in seconds between requests (default: 2)
- Returns:
- {
“total”: int, “requested”: int, “skipped”: int, “errors”: int, “results”: list of per-DOI results
}
- Return type:
dict
- async getscipapers_hoanganhduc.nexus.request_papers_by_doi_list(doi_list)[source]
Request one or more papers by DOI using the Nexus bot. Attempts direct connection first, falls back to proxy if needed.
- Parameters:
doi_list (list) – List of DOI strings.
- Returns:
Summary of request results.
- Return type:
dict
- getscipapers_hoanganhduc.nexus.print_default_paths()[source]
Print all default file and directory paths used by the script.
Utility functions for querying the Library Genesis catalog.
These helpers scrape search results and fetch download links so they can be orchestrated by the higher-level request flows. Network and HTML parsing logic live here to keep the CLI modules focused on argument handling.
- getscipapers_hoanganhduc.libgen.select_active_libgen_domain(mirrors=['libgen.li', 'libgen.vg', 'libgen.la', 'libgen.bz', 'libgen.gl'], timeout=3)[source]
Returns the first LibGen domain that responds to a simple GET request. Falls back to the default if none respond.
- getscipapers_hoanganhduc.libgen.get_default_download_folder()[source]
Returns the default Downloads folder path for the current OS. Creates the folder if it does not exist.
- getscipapers_hoanganhduc.libgen.get_default_cache_dir()[source]
Returns the default cache directory for the current OS. Creates the folder if it does not exist.
- getscipapers_hoanganhduc.libgen.search_libgen_by_doi(doi, limit=10)[source]
Search for documents on LibGen using a DOI number via the JSON API, and fetch additional details from the edition page. If found, also search Crossref to update missing or incorrect information if possible.
- Parameters:
doi (str) – The DOI number to search for.
limit (int) – Maximum number of results to return.
- Returns:
Matching documents with extra details, or empty dict if none found.
- Return type:
dict
- getscipapers_hoanganhduc.libgen.print_libgen_doi_result(result)[source]
Pretty-print the result of a LibGen DOI search using icons. Only print fields that have non-empty values. Formats ‘series’ to better display journal, volume, and issue if possible. If ‘pages’ looks like an article number (i.e., only a single number, not a range), display as ‘Article Number’.
- getscipapers_hoanganhduc.libgen.download_libgen_paper_by_doi(doi, dest_folder=None, preferred_exts=None, verbose=False, print_result=True)[source]
Download the first available file for a given DOI from LibGen.
- Parameters:
doi (str) – The DOI number to search and download.
dest_folder (str) – Folder to save the downloaded file. If None, uses default.
preferred_exts (list) – List of preferred file extensions (e.g., [“pdf”, “epub”]).
verbose (bool) – If True, print debug information.
print_result (bool) – If True, print download summary. If False, suppress output.
- Returns:
File path if download succeeded, None otherwise.
- Return type:
str or None
- getscipapers_hoanganhduc.libgen.search_libgen_by_query(query, limit=10, object_type='f', curtab='f', verbose=False, sort_by_year=True, order_desc=True)[source]
Search for documents on LibGen using a query string by parsing the HTML results. If a DOI is found, also search Crossref to update missing or incorrect information if possible.
- Parameters:
query (str) – The search query.
limit (int) – Maximum number of results to return.
object_type (str) – The object type parameter for LibGen (default “f”).
curtab (str) – The curtab parameter for LibGen (default “f”).
verbose (bool) – If True, print debug information.
sort_by_year (bool) – If True, sort results by year.
order_desc (bool) – If True, sort descending (newest first).
- Returns:
List of matching documents (dicts), or empty list if none found.
- Return type:
list
- getscipapers_hoanganhduc.libgen.print_libgen_query_results(results)[source]
Pretty-print the results of a LibGen query search using icons and numbering. Handles ‘series’ text for journal/volume/issue, and prints ‘pages’ as article number if appropriate.
- getscipapers_hoanganhduc.libgen.interactive_libgen_download(query, limit=10, preferred_exts=None, dest_folder=None, verbose=False)[source]
Search LibGen for a query, print results, and interactively ask user which to download. User can select a single index or a range (e.g., 2-4). Tries all available mirrors for each selected result until download succeeds or all fail. At the end, prints a summary of successful and failed downloads. If verbose is False, only the summary is printed.
- getscipapers_hoanganhduc.libgen.fetch_libgen_edition_info(libgen_id, verbose=False)[source]
Fetch extra info from edition.php for a given LibGen ID.
- Parameters:
libgen_id (str) – The LibGen edition ID.
verbose (bool) – If True, print debug info.
- Returns:
Extracted info dictionary, or empty dict if not found.
- Return type:
dict
- getscipapers_hoanganhduc.libgen.is_file_on_libgen(md5sum, verbose=False)[source]
Check if a file with the given md5sum already exists in LibGen.
- Parameters:
md5sum (str) – The md5sum of the file.
verbose (bool) – If True, print debug info.
- Returns:
The file URL if it exists, else None.
- Return type:
str or None
- getscipapers_hoanganhduc.libgen.upload_file_to_libgen_ftp(filepath, username='anonymous', password='', verbose=False)[source]
Upload a file to ftp://ftp.libgen.bz/upload and return the file URL if successful. Before uploading, check if the file (by md5sum) already exists in LibGen.
- Parameters:
filepath (str) – Path to the file to upload.
username (str) – FTP username (default: ‘anonymous’).
password (str) – FTP password (default: ‘’).
verbose (bool) – If True, print debug info.
- Returns:
The URL of the uploaded file if successful, else None.
- Return type:
str or None
- getscipapers_hoanganhduc.libgen.create_chrome_driver(headless=True, extra_prefs=None)[source]
Create and return a Selenium Chrome WebDriver with default user data directory and options.
- getscipapers_hoanganhduc.libgen.selenium_libgen_login(username='genesis', password='upload', headless=True, verbose=False)[source]
Open Chrome with Selenium, load http://libgen.li/librarian.php, find and follow the login link if present, and login with phpBB forum settings. Checks “remember me” and “hide my online status this session” before login. If already logged in (by detecting upload form), skip login.
- getscipapers_hoanganhduc.libgen.selenium_libgen_upload(local_file_path, bib_id, username='genesis', password='upload', headless=True, verbose=False)[source]
Upload a local file to http://libgen.li/librarian.php after logging in with Selenium. Fills the FTP path in the upload form and clicks the Upload button. After upload, finds the bibliography search form, selects the appropriate source (crossref for DOI, goodreads for ISBN), fills the bib_id in the bibliography search input, and clicks the Search button. Then waits for a while and clicks the Register button.
- Parameters:
local_file_path (str) – Path to the local file to upload.
bib_id (str) – DOI or ISBN to associate with the upload.
username (str) – LibGen username (default: ‘genesis’).
password (str) – LibGen password (default: ‘upload’).
headless (bool) – Run browser in headless mode.
verbose (bool) – Print debug info.
- Returns:
True if upload succeeded, False otherwise.
- Return type:
bool
- getscipapers_hoanganhduc.libgen.upload_and_register_to_libgen(filepath, verbose=False, headless=True)[source]
Upload and register a file to LibGen using Selenium automation. Tries to extract DOI or ISBN from the file name. If found, registers the file with that ID. If not found, uploads to FTP only (not registered in LibGen database). If the file is a PDF, tries to extract DOI from the PDF using getpapers.extract_doi_from_pdf.
- Parameters:
filepath (str) – Path to the file to upload.
verbose (bool) – Enable verbose/debug output.
- Returns:
URL of the uploaded file if successful, else None.
- Return type:
str or None
Integration helpers for Z-Library via the third-party Zlibrary API.
These utilities provide a thin wrapper around the upstream client to keep the rest of the codebase consistent with other source modules. Configuration helpers mirror the patterns used elsewhere in the package for clarity.
- getscipapers_hoanganhduc.zlib.load_credentials(credentials_path=None)[source]
Load credentials from the given path or the default config file. Returns a list: [email, password]. If credentials_path is specified, load from it and save to default location if different. If not specified but default config exists, load default config. If neither exists, prompt user to input and save.
- getscipapers_hoanganhduc.zlib.prompt_and_save_credentials()[source]
Prompt the user to input Z-library email and password. If the input is different from the saved credentials, save to default config location. If no response after 30 seconds, quit. Also sets global EMAIL and PASSWORD after user input.
- getscipapers_hoanganhduc.zlib.search_zlibrary_books(query, limit=20, email=None, password=None, sort_by_year=True)[source]
Search for books in Z-library using the Zlibrary-API wrapper.
- Parameters:
query (str) – The search query (book title, author, etc.).
limit (int) – Number of results to return.
email (str, optional) – Z-library email for login.
password (str, optional) – Z-library password for login.
sort_by_year (bool) – If True, sort results by year (descending).
- Returns:
List of book results (dicts), or empty list if none found.
- Return type:
list
- getscipapers_hoanganhduc.zlib.print_book_details(book)[source]
Print detailed information about a book result in a human-readable format.
- getscipapers_hoanganhduc.zlib.get_profile(email=None, password=None)[source]
Get the user’s Z-library profile information.
- getscipapers_hoanganhduc.zlib.get_most_popular(language=None)[source]
Get most popular books (optionally for a specific language).
- getscipapers_hoanganhduc.zlib.get_user_recommended(email=None, password=None)[source]
Get user recommended books.
- getscipapers_hoanganhduc.zlib.get_user_saved(email=None, password=None, order=None, page=None, limit=20)[source]
Get books saved by the user.
- getscipapers_hoanganhduc.zlib.get_user_downloaded(email=None, password=None, order=None, page=None, limit=None)[source]
Get books downloaded by the user.
- getscipapers_hoanganhduc.zlib.get_book_info(bookid, hashid, language=None)[source]
Get detailed info for a book.
- getscipapers_hoanganhduc.zlib.download_book(book, email=None, password=None, download_dir=None)[source]
Download a book using the Zlibrary API.
- getscipapers_hoanganhduc.zlib.is_logged_in(email=None, password=None)[source]
Check if the user is logged in.
- getscipapers_hoanganhduc.zlib.interactive_login_search_download(query=None, download_dir=None, limit=20, sort_by_year=True)[source]
Login, search, print results, and allow user to select (single or range) books to download. Optionally takes a search query, a download directory, a limit on number of results, and sort_by_year.