How It Works

1

Upload

Upload your CSV file to Storytell. For detailed instructions on uploading content, see our Uploading Content guide.

2

Processing

Storytell processes the first 1,000 rows of the CSV file

3

Story Tile™ Generation

Each row is converted into a Story Tile™.

4

AI Analysis

Storytell LLM analyzes the Story Tiles™.

5

Insight Extraction

Query your data using SmartChat™.

As you can see in the image above, your uploaded CSV files are easily accessible in the “Stored Assets” section. You can start querying your data immediately after upload.

Generating Story Tiles™

Story Tiles™ are clusters of related concepts from your data. For example, let’s say you have a CSV file with the following information:

NameCompositionDistance from the Sun (AU)Orbital Period (years)Diameter (km)
MercuryRocky0.390.244879
VenusRocky0.720.6212104
EarthRocky1112742
MarsRocky1.521.886779
JupiterGas Giant5.211.86139820
SaturnGas Giant9.5829.46116460
UranusIce Giant19.2284.0150724
NeptuneIce Giant30.05164.7949244
MoonRockyN/A0.0743474
TitanIcyN/A15.945149

Here’s what a Story Tile™ would look like:

“Titan” is a “Icy” celestial body located “N/A” AU from the Sun with an orbital period of “15.94” years.It has a diameter of “5149” kilometers.

This transformation enables our AI to understand relationships and context within your data, making it possible to answer complex queries.

Querying Your Data

With Storytell, you can ask questions about your CSV data and receive clear, insightful answers through SmartChat™:

As demonstrated in the image above, when you ask about Titan’s orbital period, Storytell provides a precise answer based on the processed data.

Verifying Accuracy

Storytell’s responses are based on the data you provide. You can always verify the information by checking the original CSV file:

This screenshot confirms that Storytell’s response matches the data in your original CSV file.

Technical Considerations

  • Processing limited to first 1,000 rows for speed and efficiency
  • Secure, isolated environments for data privacy
  • Scalable architecture for concurrent processing

Handling Structured vs. Semi-Structured CSV Files

The Storytell process involves classifying CSV file content as either “structured” or “semi-structured” to determine the appropriate processing strategy. This classification is crucial for handling CSV files that do not conform to traditional tabular formats, ensuring that data is processed accurately and efficiently.

Structured vs. Semi-Structured Data

Structured Data

Structured data in CSV files typically includes a clear header row followed by consistent data rows. Each column represents a specific data attribute, and each row contains data entries corresponding to these attributes. This format is straightforward to process using standard CSV parsing techniques.

Semi-Structured Data

Semi-structured data, on the other hand, may not have a consistent structure. These CSV files might lack headers, have inconsistent columns, or contain data that resembles reports rather than traditional tables. Such files require a different approach to ensure accurate data extraction and processing.

Process for Classifying CSV Content

  1. Initial Inspection: Automatically inspect the CSV file to determine if it contains a header row and consistent data rows.
  2. Classification:
    • Structured: If the file has a clear header and consistent data rows, classify it as structured.
    • Semi-Structured: If the file lacks a header or has inconsistent columns, classify it as semi-structured.
  3. Prompt Selection: Based on the classification, select the appropriate prompt for processing:
    • For structured data, use standard CSV processing prompts.
    • For semi-structured data:
      • Use prompts designed to handle variability and generate data “chunks.”
      • Ensure that the LLM does not drop data points by refining prompts.
      • Include a “source” identifier in each chunk to facilitate data searchability.

XLS to CSV Conversion

The process of converting an XLS file to CSV involves transforming each worksheet (or tab) within the XLS file into a separate CSV file. This conversion is particularly useful for handling data in a more accessible and standardized format, as CSV files are widely supported across various platforms and applications.

Conversion Process

  1. File Structure:
    • An XLS file, such as MyFile.xls, may contain multiple worksheets. For example, it could have tabs named 1Q24, 2Q24, 3Q24, etc.
    • Each tab represents a separate dataset that needs to be converted into its own CSV file.
  2. Conversion Output:
    • Each tab in the XLS file is converted into a CSV file. The naming convention for these files follows the pattern: MyFile - tab <TabName>.csv. For instance:
      • MyFile - tab 1Q24.csv
      • MyFile - tab 2Q24.csv
      • And so on for each tab.
  3. Validation:
    • The conversion process is validated locally using simple XLSX files to ensure accuracy and reliability.
    • Even with large Excel files containing numerous tabs and extensive data, the conversion should handle the data without truncation.

Handling Different Data Structures

  1. Tabular Data:
    • The conversion process is optimized for tabular data, where data is organized in a consistent row and column format.
    • This structure is straightforward to convert as each row in the tab becomes a row in the CSV file.
  2. Non-Tabular Data:
    • Some XLS files may contain non-tabular data, such as reports or files with inconsistent column counts.
    • For these files, the conversion process may require additional handling to ensure data integrity.
  3. Challenges and Solutions:
    • Varied Headers: Files with inconsistent headers can confuse the conversion process. Enhancements to the classifier can help identify and handle these cases.
    • Single Column Data: If a tab contains only a single column, it may be classified as unstructured data, requiring adjustments in the conversion approach.
    • Fallback Mechanism: For files that do not conform to standard tabular formats, a fallback mechanism to a semi-structured approach can be implemented.

Future Enhancements

  • Support for larger datasets (beyond 1,000 rows)
  • Advanced data type detection and custom Story Tile™ generation

Was this page helpful?