Skip to main content
Open In ColabOpen on GitHub

LangChain โ€“ ScraperAPI

Give your AI agent the ability to browse websites, search Google and Amazon in just two lines of code.

The langchain-scraperapi package adds three ready-to-use LangChain tools backed by the ScraperAPI service:

Tool classUse it to
ScraperAPIToolGrab the HTML/text/markdown of any web page
ScraperAPIGoogleSearchToolGet structured Google Search SERP data
ScraperAPIAmazonSearchToolGet structured Amazon product-search data

Overviewโ€‹

Integration detailsโ€‹

PackageSerializableJS supportPackage latest
langchain-scraperapiโŒโŒv0.1.0

Setupโ€‹

Install the langchain-scraperapi package.

%pip install -U langchain-scraperapi

Credentialsโ€‹

Create an account at https://www.scraperapi.com/ and get an API key.

import os
os.environ["SCRAPERAPI_API_KEY"] = "your-api-key"

Featuresโ€‹

1. ScraperAPITool โ€” browse any websiteโ€‹

Invoke the raw ScraperAPI endpoint and get HTML, rendered DOM, text, or markdown.

Invocation arguments

  • url (required) โ€“ target page URL
  • Optional (mirror ScraperAPI query params)
    • output_format: "text" | "markdown" (default returns raw HTML)
    • country_code: e.g. "us", "de"
    • device_type: "desktop" | "mobile"
    • premium: bool โ€“ use premium proxies
    • render: bool โ€“ run JS before returning HTML
    • keep_headers: bool โ€“ include response headers

For the complete set of modifiers see the ScraperAPI request-customisation docs

from langchain_scraperapi.tools import ScraperAPITool

# Instantiate
tool = ScraperAPITool()

# Direct invoke
html_text = tool.invoke(
{
"url": "https://langchain.com",
"output_format": "markdown",
"render": True,
}
)
print(html_text[:300], "โ€ฆ")

Structured SERP data via /structured/google/search.

Invocation arguments

  • query (required) โ€“ natural-language search string
  • Optional โ€” country_code, tld, uule, hl, gl, ie, oe, start, num
  • output_format: "json" (default) or "csv"
from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool

google_search = ScraperAPIGoogleSearchTool()

results = google_search.invoke(
{
"query": "what is langchain",
"num": 20,
"output_format": "json",
}
)
print(results)

Structured product results via /structured/amazon/search.

Invocation arguments

  • query (required) โ€“ product search terms
  • Optional โ€” country_code, tld, page
  • output_format: "json" (default) or "csv"
from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool

amazon_search = ScraperAPIAmazonSearchTool()

products = amazon_search.invoke(
{
"query": "noise cancelling headphones",
"tld": "co.uk",
"page": 2,
}
)
print(products)

Example: Make an AI agent that can browse the webโ€‹

Here is an example of using the tools in an AI agent. The ScraperAPITool gives the AI the ability to browse any website, summarize articles, and click on links to navigate between pages.

import os
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_scraperapi.tools import ScraperAPITool


os.environ["SCRAPERAPI_API_KEY"] = "your-api-key"
os.environ["OPENAI_API_KEY"] = "your-api-key"

tools = [ScraperAPITool(output_format="markdown")]
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)

prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant that can browse websites for users. When asked to browse a website or a link, do so with the ScraperAPITool, then provide information based on the website based on the user's needs."),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
response = agent_executor.invoke({
"input": "can you browse hacker news and summarize the first website"
})

Further readingโ€‹

Below you can find more information on additional parameters to the tools to customize your requests.

The LangChain wrappers surface these parameters directly.


Was this page helpful?