Просмотр исходного кода

added scripts to convert txt to db; readme update

Jeff Tang 1 год назад
Родитель
Сommit
70c49b6874
4 измененных файлов с 1390 добавлено и 1 удалено
  1. 5 1
      llama-demo-apps/README.md
  2. 38 0
      llama-demo-apps/csv2db.py
  3. 1294 0
      llama-demo-apps/nba.txt
  4. 53 0
      llama-demo-apps/txt2csv.py

+ 5 - 1
llama-demo-apps/README.md

@@ -41,7 +41,11 @@ To run Llama2 in Google Colab using [llama-cpp-python](https://github.com/abetle
 This demo app uses Llama2 to return a text summary of a YouTube video. It shows how to retrieve the caption of a YouTube video and how to ask Llama to summarize the content in four different ways, from the simplest naive way that works for short text to more advanced methods of using LangChain's map_reduce and refine to overcome the 4096 limit of Llama's max input token size.
 
 ## [NBA2023-24](StructuredLlama.ipynb): Ask Llama2 about Structured Data
-This demo app shows how to use LangChain and Llama2 to let users ask questions about **structured** data stored in a SQL DB. As the 2023-24 NBA season is around the corner, we use the NBA roster info saved in a SQLite DB to show you how to ask Llama2 questions about your favorite teams or players.
+This demo app shows how to use LangChain and Llama2 to let users ask questions about **structured** data stored in a SQL DB. As the 2023-24 NBA season is around the corner, we use the NBA roster info saved in a SQLite DB to show you how to ask Llama2 questions about your favorite teams or players. To save the info in the nba.txt file, created by scraping from the web, to a SQLite db, run the commands below to generate `nba_roster.db` used in the notebook:
+```
+python txt2csv.py
+python csv2db.py
+```
 
 ## [BreakingNews](BreakingNews.ipynb): Ask Llama2 about Live Data
 This demo app shows how to perform live data augmented generation tasks with Llama2 and [LlamaIndex](https://github.com/run-llama/llama_index), another leading open-source framework for building LLM apps: it uses the [You.com serarch API](https://documentation.you.com/quickstart) to get breaking news and ask Llama2 about them.

+ 38 - 0
llama-demo-apps/csv2db.py

@@ -0,0 +1,38 @@
+import sqlite3
+import csv
+
+# Define the input CSV file and the SQLite database file
+input_csv = 'nba_roster.csv'
+database_file = 'nba_roster.db'
+
+# Connect to the SQLite database
+conn = sqlite3.connect(database_file)
+cursor = conn.cursor()
+
+# Create a table to store the data
+cursor.execute('''CREATE TABLE IF NOT EXISTS nba_roster (
+                    Team TEXT,
+                    NAME TEXT,
+                    Jersey TEXT,
+                    POS TEXT,
+                    AGE INT,
+                    HT TEXT,
+                    WT TEXT,
+                    COLLEGE TEXT,
+                    SALARY TEXT
+                )''')
+
+# Read data from the CSV file and insert it into the SQLite table
+with open(input_csv, 'r', newline='') as csvfile:
+    csv_reader = csv.reader(csvfile)
+    next(csv_reader)  # Skip the header row
+    
+    for row in csv_reader:
+        cursor.execute('INSERT INTO nba_roster VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)', row)
+
+# Commit the changes and close the database connection
+conn.commit()
+conn.close()
+
+print(f'Data from {input_csv} has been successfully imported into {database_file}')
+

Разница между файлами не показана из-за своего большого размера
+ 1294 - 0
llama-demo-apps/nba.txt


+ 53 - 0
llama-demo-apps/txt2csv.py

@@ -0,0 +1,53 @@
+import csv
+
+# Define the input and output file names
+input_file = 'nba.txt'
+output_file = 'nba_roster.csv'
+
+# Initialize lists to store data
+roster_data = []
+current_team = None
+
+# Open the input file
+with open(input_file, 'r') as file:
+    for line in file:
+        # Remove leading and trailing whitespaces from the line
+        line = line.strip()
+        
+        # Check if the line starts with 'https', skip it
+        if line.startswith('https'):
+            continue
+        
+        # Check if the line contains the team name
+        if 'Roster' in line:
+            current_team = line.split(' Roster ')[0]
+        elif line and "NAME" not in line:  # Skip empty lines and header lines
+            # Split the line using tabs as the delimiter
+            player_info = line.split('\t')
+            
+            # Remove any numbers from the player's name and set Jersey accordingly
+            name = ''.join([c for c in player_info[0] if not c.isdigit()])
+            jersey = ''.join([c for c in player_info[0] if c.isdigit()])
+            
+            # If no number found, set Jersey to "NA"
+            if not jersey:
+                jersey = "NA"
+            
+            # Append the team name, name, and jersey to the player's data
+            player_info = [current_team, name, jersey] + player_info[1:]
+            
+            # Append the player's data to the roster_data list
+            roster_data.append(player_info)
+
+# Write the data to a CSV file
+with open(output_file, 'w', newline='') as csvfile:
+    writer = csv.writer(csvfile)
+    
+    # Write the header row
+    writer.writerow(['Team', 'NAME', 'Jersey', 'POS', 'AGE', 'HT', 'WT', 'COLLEGE', 'SALARY'])
+    
+    # Write the player data
+    writer.writerows(roster_data)
+
+print(f'Conversion completed. Data saved to {output_file}')
+