Working with CSV files
Storytell transforms CSV data into Story Tiles™, enabling powerful insights and analysis through our advanced LLM processing pipeline.
How It Works
Upload
Upload your CSV file to Storytell. For detailed instructions on uploading content, see our Uploading Content guide.
Processing
Storytell processes the first 1,000 rows of the CSV file
Story Tile™ Generation
Each row is converted into a Story Tile™.
AI Analysis
Storytell LLM analyzes the Story Tiles™.
Insight Extraction
Query your data using SmartChat™.
As you can see in the image above, your uploaded CSV files are easily accessible in the “Stored Assets” section. You can start querying your data immediately after upload.
Generating Story Tiles™
Story Tiles™ are clusters of related concepts from your data. For example, let’s say you have a CSV file with the following information:
Name | Composition | Distance from the Sun (AU) | Orbital Period (years) | Diameter (km) |
---|---|---|---|---|
Mercury | Rocky | 0.39 | 0.24 | 4879 |
Venus | Rocky | 0.72 | 0.62 | 12104 |
Earth | Rocky | 1 | 1 | 12742 |
Mars | Rocky | 1.52 | 1.88 | 6779 |
Jupiter | Gas Giant | 5.2 | 11.86 | 139820 |
Saturn | Gas Giant | 9.58 | 29.46 | 116460 |
Uranus | Ice Giant | 19.22 | 84.01 | 50724 |
Neptune | Ice Giant | 30.05 | 164.79 | 49244 |
Moon | Rocky | N/A | 0.074 | 3474 |
Titan | Icy | N/A | 15.94 | 5149 |
Here’s what a Story Tile™ would look like:
This transformation enables our AI to understand relationships and context within your data, making it possible to answer complex queries.
Querying Your Data
With Storytell, you can ask questions about your CSV data and receive clear, insightful answers through SmartChat™:
As demonstrated in the image above, when you ask about Titan’s orbital period, Storytell provides a precise answer based on the processed data.
Verifying Accuracy
Storytell’s responses are based on the data you provide. You can always verify the information by checking the original CSV file:
This screenshot confirms that Storytell’s response matches the data in your original CSV file.
Technical Considerations
- Processing limited to first 1,000 rows for speed and efficiency
- Secure, isolated environments for data privacy
- Scalable architecture for concurrent processing
Handling Structured vs. Semi-Structured CSV Files
The Storytell process involves classifying CSV file content as either “structured” or “semi-structured” to determine the appropriate processing strategy. This classification is crucial for handling CSV files that do not conform to traditional tabular formats, ensuring that data is processed accurately and efficiently.
Structured vs. Semi-Structured Data
Structured Data
Structured data in CSV files typically includes a clear header row followed by consistent data rows. Each column represents a specific data attribute, and each row contains data entries corresponding to these attributes. This format is straightforward to process using standard CSV parsing techniques.
Semi-Structured Data
Semi-structured data, on the other hand, may not have a consistent structure. These CSV files might lack headers, have inconsistent columns, or contain data that resembles reports rather than traditional tables. Such files require a different approach to ensure accurate data extraction and processing.
Process for Classifying CSV Content
- Initial Inspection: Automatically inspect the CSV file to determine if it contains a header row and consistent data rows.
- Classification:
- Structured: If the file has a clear header and consistent data rows, classify it as structured.
- Semi-Structured: If the file lacks a header or has inconsistent columns, classify it as semi-structured.
- Prompt Selection: Based on the classification, select the appropriate prompt for processing:
- For structured data, use standard CSV processing prompts.
- For semi-structured data:
- Use prompts designed to handle variability and generate data “chunks.”
- Ensure that the LLM does not drop data points by refining prompts.
- Include a “source” identifier in each chunk to facilitate data searchability.
XLS to CSV Conversion
The process of converting an XLS file to CSV involves transforming each worksheet (or tab) within the XLS file into a separate CSV file. This conversion is particularly useful for handling data in a more accessible and standardized format, as CSV files are widely supported across various platforms and applications.
Conversion Process
- File Structure:
- An XLS file, such as
MyFile.xls
, may contain multiple worksheets. For example, it could have tabs named1Q24
,2Q24
,3Q24
, etc. - Each tab represents a separate dataset that needs to be converted into its own CSV file.
- An XLS file, such as
- Conversion Output:
- Each tab in the XLS file is converted into a CSV file. The naming convention for these files follows the pattern:
MyFile - tab <TabName>.csv
. For instance:MyFile - tab 1Q24.csv
MyFile - tab 2Q24.csv
- And so on for each tab.
- Each tab in the XLS file is converted into a CSV file. The naming convention for these files follows the pattern:
- Validation:
- The conversion process is validated locally using simple XLSX files to ensure accuracy and reliability.
- Even with large Excel files containing numerous tabs and extensive data, the conversion should handle the data without truncation.
Handling Different Data Structures
- Tabular Data:
- The conversion process is optimized for tabular data, where data is organized in a consistent row and column format.
- This structure is straightforward to convert as each row in the tab becomes a row in the CSV file.
- Non-Tabular Data:
- Some XLS files may contain non-tabular data, such as reports or files with inconsistent column counts.
- For these files, the conversion process may require additional handling to ensure data integrity.
- Challenges and Solutions:
- Varied Headers: Files with inconsistent headers can confuse the conversion process. Enhancements to the classifier can help identify and handle these cases.
- Single Column Data: If a tab contains only a single column, it may be classified as unstructured data, requiring adjustments in the conversion approach.
- Fallback Mechanism: For files that do not conform to standard tabular formats, a fallback mechanism to a semi-structured approach can be implemented.
Future Enhancements
- Support for larger datasets (beyond 1,000 rows)
- Advanced data type detection and custom Story Tile™ generation
Was this page helpful?