Skip to content

Commit

Permalink
fix: pass file name to unstructured-api (langchain-ai#742)
Browse files Browse the repository at this point in the history
fix langchain-ai#738

Reference link
  • Loading branch information
LarchLiu authored Apr 11, 2023
1 parent 706a209 commit 0c43852
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 5 deletions.
41 changes: 41 additions & 0 deletions examples/src/document_loaders/example_data/notion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Testing the notion markdownloader

# 🦜️🔗 LangChain.js

⚡ Building applications with LLMs through composability ⚡

**Production Support:** As you move your LangChains into production, we'd love to offer more comprehensive support.
Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set up a dedicated support Slack channel.

## Quick Install

`yarn add langchain`

```typescript
import { OpenAI } from "langchain/llms/openai";
```

## 🤔 What is this?

Large language models (LLMs) are emerging as a transformative technology, enabling
developers to build applications that they previously could not.
But using these LLMs in isolation is often not enough to
create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.

This library is aimed at assisting in the development of those types of applications.

## Relationship with Python LangChain

This is built to integrate as seamlessly as possible with the [LangChain Python package](https://github.com/hwchase17/langchain). Specifically, this means all objects (prompts, LLMs, chains, etc) are designed in a way where they can be serialized and shared between languages.

The [LangChainHub](https://github.com/hwchase17/langchain-hub) is a central place for the serialized versions of these prompts, chains, and agents.

## 📖 Documentation

For full documentation of prompts, chains, agents and more, please see [here](https://hwchase17.github.io/langchainjs/docs/overview).

## 💁 Contributing

As an open source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infra, or better documentation.

Check out [our contributing guidelines](CONTRIBUTING.md) for instructions on how to contribute.
2 changes: 1 addition & 1 deletion examples/src/document_loaders/unstructured.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import { UnstructuredLoader } from "langchain/document_loaders/fs/unstructured";
export const run = async () => {
const loader = new UnstructuredLoader(
"http://localhost:8000/general/v0/general",
"langchain/src/document_loaders/tests/example_data/example.txt"
"src/document_loaders/example_data/notion.md"
);
const docs = await loader.load();
console.log({ docs });
Expand Down
13 changes: 9 additions & 4 deletions langchain/src/document_loaders/fs/unstructured.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import type { basename as BasenameT } from "node:path";
import type { readFile as ReaFileT } from "node:fs/promises";
import { getEnv } from "../../util/env.js";
import { Document } from "../../document.js";
import { BaseDocumentLoader } from "../base.js";
Expand All @@ -20,15 +22,16 @@ export class UnstructuredLoader extends BaseDocumentLoader {
}

async _partition() {
const { readFile } = await this.imports();
const { readFile, basename } = await this.imports();

const buffer = await readFile(this.filePath);
const fileName = basename(this.filePath);

// I'm aware this reads the file into memory first, but we have lots of work
// to do on then consuming Documents in a streaming fashion anyway, so not
// worried about this for now.
const formData = new FormData();
formData.append("files", new Blob([buffer]));
formData.append("files", new Blob([buffer]), fileName);

const response = await fetch(this.webPath, {
method: "POST",
Expand Down Expand Up @@ -73,11 +76,13 @@ export class UnstructuredLoader extends BaseDocumentLoader {
}

async imports(): Promise<{
readFile: typeof import("node:fs/promises")["readFile"];
readFile: typeof ReaFileT;
basename: typeof BasenameT;
}> {
try {
const { readFile } = await import("node:fs/promises");
return { readFile };
const { basename } = await import("node:path");
return { readFile, basename };
} catch (e) {
console.error(e);
throw new Error(
Expand Down

0 comments on commit 0c43852

Please sign in to comment.