用AI大模型分析个人财务数据，免费、无需联网且保护数据隐私

发布日期：2024-07-23 07:32:03 浏览次数： 4135

2024年上半年已过，我决定整理我的财务状况，减少开支，并制定更好的理财计划。

直接使用类似ChatGPT这样的工具，会泄漏我的个人财务数据，我并不想这么做。

所以，我创建了一个可以在电脑本地运行的AI财务分析助手。

完全在本地运行，无需网络连接，免费。

本地AI财务助手首先导入我的财务数据，然后分析我的财务状况，包括收入和支出，并且提供个性化的财务规划，以适应我的生活目标。

下面，我将演示如何从0到1搭建本地AI财务助手。

免责声明：本文仅用于学习，不作为个人财务、投资建议。所有观点仅代表作者个人。

整体介绍

目标及架构

该应用程序使用Streamlit构建用户界面，核心部分使用LangChain与Ollama中的本地开源大模型。

在这个应用程序中，我使用了一些最先进的开源大模型，如Mistral和LLAVA，以实现多模态功能。

通过提示词，我将大模型设置为一个“专业财务规划师”，可以为个人提供财务建议。

我设定这个项目的目标是：处理和分类财务数据数据；根据财务数据数据分析总收入、支出和节余，同时可视化收入、支出等变化趋势；利用多模态功能理解图表，发现财务变化规律；最后，根据我的生活方式，生成个性化的投资建议。

（注：本文需要一些编程基础，已经学习过AI线下工作坊二期的同学，可以使用我们提供的方法，操作更丝滑）

下面先介绍下需要使用的工具。

需要使用的工具

Ollama：目前最好、最简单地运行开源大模型的工具之一。

支持Llama 2、Mistral、LLAVA等在内的开源大模型，你可以在ollama.ai/library上找到所有可以下载使用的开源大模型。Ollama开源在MacOS、Windows和Linux系统上安装。

LangChain：LangChain是围绕大模型构建的开源框架。它极大简化了AI应用程序的设计和开发，并围绕大模型创建更高级的用例。

它与Ollama中的开源模型有很好的集成。

Streamlit：Streamlit是一个开源框架，开发者能够快速创建和共享数据，只需要使用少量Python代码，即可开发基于Web的用户界面。

Streamlit可用于快速进行原型设计和开发复杂的数据仪表盘项目。

Step1：安装应用和准备财务数据

安装Ollama

访问Ollama下载页面，选择与你的操作系统匹配的版本，下载并安装。

下载地址：https://ollama.com/download

安装Ollama后，打开终端（MAC用户搜索终端、windows用户搜索cmd），输入以下命令。

这些命令将把开源大模型下载到你的计算机上。在这个项目中，我们下载Mistral和LLAVA。

ollama serveollama pull mistralollama pull llavaollama run mistralollama run llava

准备数据集

我将使用合成数据代替我的个人财务数据数据。我用ChatGPT生成了1000笔财务数据数据，你可以直接使用自己真实财务数据。

以下是你可以用来生成测试数据的提示词。

生成一个年轻金融专业人士在欧洲生活的财务财务数据数据集，涵盖2022年1月至2023年12月的1000笔财务数据。确保收入和支出在各个类别中均衡分布。数据集应包括以下四列：
日期：财务数据日期（格式：YYYY-MM-DD）名称/描述：每笔财务数据的独特详细描述（例如："工资存款"，"每月房租支付"，"与朋友的餐馆晚餐"）支出/收入：明确标明财务数据是支出（例如："支出"）还是收入（例如："收入"）金额（欧元）：财务数据金额（单位：欧元）

生成的数据集应包括以下四列：

• 日期：财务数据日期
• 名称/描述：财务数据的简短描述，用于大模型根据财务数据性质进行分类。
• 支出/收入：标明是收入还是支出
• 金额（欧元）：财务数据金额（单位：欧元）。
生成的财务数据

安装依赖项

现在，我们需要安装Langchain和Streamlit的相关依赖项。

pip install langchain-communitypip install streamlit

Step2：上传并处理数据

上传数据

创建一个新的Python文件“Upload.py”并添加以下代码。

步骤如下：

• 导入必要的库
• 初始化用于分类财务数据的大模型
• 定义类别：涵盖各种收入和支出类型，帮助大模型准确分类财务数据。

import streamlit as stimport pandas as pdfrom langchain_community.llms import Ollama
llm = Ollama(model="mistral")categories = ["Salary/Wages", "Investment Income", "Freelance Income", "Business Revenue","Rental Income","Housing", "Utilities","Groceries","Transportation","Insurance","Healthcare","Entertainment","Personal Care","Education","Savings/Investments","Loans/Debt","Taxes","Childcare","Gifts/Donations","Dining Out","Travel","Shopping","Subscriptions","Pet Care", "Home Improvement","Clothing","Tech/Gadgets", "Fitness/Sports",]categories_string = ",".join(categories)

构建财务数据分类函数

1. 分类财务数据

编写一个categorize_transactions函数，该函数接收财务数据名称。我们将使用提示工程技术引导大模型的输出。在提示中包含财务数据名称，并要求大模型根据预定义的类别进行分类。

收到大模型的输出后，我们将这些数据组织并转换为结构化的pandas DataFrame。

def categorize_transactions(transaction_names, llm):

prompt = f"""把以下费用分到适当的类别中。

请记住，类别应从以下列表中选择一个，根据它们的主要目的或性质选择最相关的类别：{categories_string}。\n

输出格式应始终为：transaction name - category。例如：Spotify #2 - Entertainment, Basic Fit Amsterdam Nld #3 - Fitness/Sports \n

以下是待分类的交易：{transaction_names} \n"""

print(prompt)

filtered_response = []

# retry is the LLM output is not consistent

while len(filtered_response) < 2:

response = llm.invoke(prompt).split("\n")

print(response)

# Remove items that do not contain "transaction: category" pairs

filtered_response = [item for item in response if '-' in item]

print(filtered_response)

# Put in dataframe

categories_df = pd.DataFrame({"Transaction vs category": filtered_response})

size_dif = len(categories_df) - len(transaction_names.split(","))

if size_dif >= 0:

categories_df["Transaction"] = transaction_names.split(",") + [None] * size_dif

else:

categories_df["Transaction"] = transaction_names.split(",")[:len(categories_df)]

categories_df["Category"] = categories_df["Transaction vs category"].str.split("-", expand=True)[1]

return categories_df

2. 创建数据处理函数

创建一个process_data函数，处理上传的数据文件，使用categorize_transactions对财务数据进行分类，并将分类后的数据合并到用于进一步分析的全局DataFrame中。

def hop(start, stop, step):

for i in range(start, stop, step):

yield i

yield stop

def process_data(df: pd.DataFrame):

unique_transactions = df["Name/Description"].unique()

index_list = list(hop(0, len(unique_transactions), 30))

# Intialise the categories_df_all dataframe

categories_df_all = pd.DataFrame()

# Loop through the index_list

for i in range(0, len(index_list) - 1):

print(f"Looping: {i}")

transaction_names = unique_transactions[index_list[i] : index_list[i + 1]]

transaction_names = ",".join(transaction_names)

categories_df = categorize_transactions(transaction_names, llm)

categories_df_all = pd.concat(

[categories_df_all, categories_df], ignore_index=True

)

# futher clean data:

# Drop NA values

categories_df_all = categories_df_all.dropna()

# Remove the numbering eg "1. " from Transaction column

categories_df_all["Transaction"] = categories_df_all["Transaction"].str.replace(

r"\d+\.\s?", "", regex=True

).str.strip()

new_df = pd.merge(

df,

categories_df_all,

left_on="Name/Description",

right_on="Transaction",

how="left",

)

new_df.to_csv(f"data/{uploaded_file.name}_categorized.csv", index=False)

return new_df

3. 创建Streamlit Web应用程序

首先设置Web应用程序的标题，并添加一个文件上传小部件。这样用户就可以上传他们的财务数据。

st.title("? Load your financial data here")uploaded_file = st.file_uploader("Upload your financial data", type=("txt", "csv", "pdf"))

4. 处理上传的数据

文件上传后，读取到pandas DataFrame中，并调用process_data函数进行财务数据分类。

if uploaded_file:with st.spinner("Processing data..."):file_details = {"FileName": uploaded_file.name, "FileType": uploaded_file.type}df = pd.read_csv(uploaded_file)df = process_data(df)st.markdown("Data processed : OK")

5. 运行Streamlit应用程序

你将看到一个类似这样的界面。

Step3：财务数据分析

所有财务数据都已通过大模型Mistral分类后，你可以进行财务分析了。

包括下三个步骤：

1、定量分析：为了全面了解你的财务状况，你需要首先计算收入和支出，确定资金的主要流向。

2、可视化展示：绘制财务数据数据图表，发现趋势。

3、定性分析：将收集到的主要财务指标等信息（包括图表）反馈给Mistral。通过精心设计的提示词，让大模型对你的财务状况进行定性分析。

定量分析

首先，创建一个新的Python文件“Finance_Dashboard.py”，导入必要的Python库并初始化Ollama。

import osimport streamlit as stimport pandas as pdimport matplotlib.pyplot as pltfrom langchain_community.llms import Ollama
llm_llava = Ollama(model="llava")llm = Ollama(model="mistral")

然后，创建一个名为financial_analysis的函数来分析财务数据。

def financial_analysis(data:pd.DataFrame):

key_figures = {}

# Calculate yearly total income and total expenses

yearly_income = data.loc[data['Expense/Income'] == 'Income'].groupby('Year')['Amount(EUR)'].sum().mean()

yearly_expenses = data.loc[data['Expense/Income'] == 'Expense'].groupby('Year')['Amount(EUR)'].sum().mean()

# Identify the top expense categories

top_expenses = data.loc[data['Expense/Income'] == 'Expense'].groupby('Category')['Amount(EUR)'].sum().sort_values(

ascending=False)

# Calculate average monthly income and expenses

monthly_income = data.loc[data['Expense/Income'] == 'Income'].groupby(data['Date'].dt.to_period('M'))[

'Amount(EUR)'].sum().mean()

monthly_expenses = data.loc[data['Expense/Income'] == 'Expense'].groupby(data['Date'].dt.to_period('M'))[

'Amount(EUR)'].sum().mean()

# Determine the savings rate

savings = yearly_income - yearly_expenses

savings_rate = (savings / yearly_income) * 100 if yearly_income > 0 else 0

key_figures['Average Annual Income'] = f"€{yearly_income:,.2f}"

key_figures['Average Annual Expenses'] = f"€{yearly_expenses:,.2f}"

key_figures['Annual Savings Rate'] = f" {savings_rate:.2f}%"

key_figures['Top Expense Categories'] = {category: f"€{amount:,.2f}" for category, amount in

top_expenses.head().items()}

key_figures['Average Monthly Income'] = f"€{monthly_income:,.2f}"

key_figures['Average Monthly Expenses'] = f"€{monthly_expenses:,.2f}"

return key_figures

这个函数计算年度和月度的收入与支出、储蓄率，并识别主要的支出类别。

可视化展示

在这里，我们将可视化财务数据，包括收入与支出随时间变化图、每月存款、收入来源图、支出类别。

def plot_income_vs_expense_over_time(df):

# Income vs Expense Over time

st.markdown("1. Income vs Expense Over time")

income_expense_summary = (

df.groupby(["YearMonth", "Expense/Income"])["Amount(EUR)"]

.sum()

.unstack()

.fillna(0)

)

income_expense_summary.plot(kind="bar", figsize=(10, 8))

plt.title("Income vs Expenses Over Time")

plt.ylabel("Amount (EUR)")

plt.xlabel("Month")

plt.savefig("data/income_vs_expense_over_time.png", bbox_inches="tight")

st.pyplot(plt)

def plot_saving_rate_trend(data: pd.DataFrame):

st.markdown("2. Monthly Saving Rate Trend")

monthly_data = data.groupby(['YearMonth', 'Expense/Income'])['Amount(EUR)'].sum().unstack().fillna(0)

monthly_data['Savings Rate'] = (monthly_data['Income'] - monthly_data['Expense']) / monthly_data['Income'] * 100

fig, ax = plt.subplots()

monthly_data['Savings Rate'].plot(ax=ax)

ax.set_xlabel('Month')

ax.set_ylabel('Savings Rate (%)')

plt.savefig("data/saving_rate_over_time.png", bbox_inches="tight")

st.pyplot(fig)

def plot_income_source_analysis(data: pd.DataFrame):

st.markdown("3. Income Sources Analysis")

income_sources = data[data['Expense/Income'] == 'Income'].groupby('Category')['Amount(EUR)'].sum()

income_sources.plot(kind="pie", figsize=(10, 8), autopct="%1.1f%%", startangle=140)

plt.title("Income Sources Analysis")

plt.ylabel("") # Hide the y-label as it's unnecessary for pie charts

plt.savefig("data/income_source_analysis.png", bbox_inches="tight")

st.pyplot(plt)

def plot_category_wise_spending_analysis(data: pd.DataFrame):

st.markdown("4. Category-wise Spending Analysis")

expenses_by_category = data[data['Expense/Income'] == 'Expense'].groupby('Category')['Amount(EUR)'].sum()

expenses_by_category.plot(kind="pie", figsize=(10, 8), autopct="%1.1f%%", startangle=140)

plt.title("Expenses Analysis")

plt.ylabel("") # Hide the y-label as it's unnecessary for pie charts

plt.savefig("data/expense_category_analysis.png", bbox_inches="tight")

st.pyplot(plt)

加载财务数据

total_df = pd.DataFrame()for root, dirs, files in os.walk("data"):for file in files:if file.endswith(".csv"):df = pd.read_csv(os.path.join(root, file))total_df = pd.concat([total_df, df], ignore_index=True)
total_df["Date"] = pd.to_datetime(total_df["Date"])total_df["YearMonth"] = total_df["Date"].dt.to_period("M")total_df["Year"] = total_df["Date"].dt.year

从CSV文件读取财务数据，处理数据并准备进行分析。

设置Streamlit仪表盘

st.title("My Local AI Finance Insighter")

st.markdown(

"**A personalized and secure approach to analyzing financial data, providing insights and recommendations tailored to individual needs.**"

)

analysis_results = financial_analysis(total_df)

results_str = ""

# Loop through the dictionary

for key, value in analysis_results.items():

if isinstance(value, dict):

# If the value is another dictionary, further iterate to get sub-keys and values

sub_results = ', '.join([f"{sub_key}: {sub_value}" for sub_key, sub_value in value.items()])

results_str += f"{key}: {sub_results}\n"

else:

# For direct key-value pairs, simply concatenate

results_str += f"{key}: {value}\n"

st.subheader("Yearly Figures")

col1, col2, col3 = st.columns(3)

col1.metric(label="Average Annual Income", value=analysis_results['Average Annual Income'])

col2.metric(label="Average Annual Expenses", value=analysis_results['Average Annual Expenses'])

col3.metric(label="Savings Rate", value=analysis_results['Annual Savings Rate'])

# Display average monthly figures

st.subheader("Average Monthly Figures")

col1, col2 = st.columns(2)

col1.metric(label="Average Monthly Income", value=analysis_results['Average Monthly Income'])

col2.metric(label="Average Monthly Expenses", value=analysis_results['Average Monthly Expenses'])

# Display top expense categories in a table

st.subheader("Top Expense Categories")

expenses_df = pd.DataFrame(list(analysis_results['Top Expense Categories'].items()), columns=['Category', 'Amount'])

st.table(expenses_df)

with st.container():

col1, col2 = st.columns(2)

with col1:

plot_income_vs_expense_over_time(total_df)

with col2:

plot_saving_rate_trend(total_df)

with st.container():

col3, col4 = st.columns(2)

with col3:

plot_income_source_analysis(total_df)

with col4:

plot_category_wise_spending_analysis(total_df)

使用Streamlit为仪表盘创建标题，显示分析结果，并整合绘图函数。

运行Streamlit时，你应该会看到一个类似这样的仪表盘：

Step4：提供财务建议

最后，我们将之前生成的定量和定性分析结果提供给Mistral，生成个性化财务建议！

with st.container():

col3, col4 = st.columns(2)

with col3:

plot_income_source_analysis(total_df)

with col4:

plot_category_wise_spending_analysis(total_df)

with st.spinner("Generating reports ..."):

total_response = ""

for root, dirs, files in os.walk("data"):

for file in files:

if file.endswith(".png"):

response = llm_llava.invoke(

f"Act as an expert finance planner and analyse the image : {os.path.join(root, file)}. You should give your insights extracted from the image and key figures you see from the image "

)

total_response += response

total_response += f"\nHere are the user key financial figures : {results_str}"

st.write("---------------")

st.markdown("**Finance analysis and budget planner**")

summary = llm.invoke(

f"You are a helpful and expert finance planner. Base on the following analysis: {total_response}, make a summary of the financial status of the user and suggest tips on savings. Highlight categories where the user can potentially reduce expenses and suggest an ideal savings rate based on their income and goals. Tailor these suggestions to fit the user’s lifestyle and financial objectives. Use a friendly tone. "

)

st.write(summary)

st.write("---------------")

st.markdown("**Investment tips**")

if "user_answers_str" in st.session_state:

user_investment_answer = st.session_state.user_answers_str

else:

user_investment_answer = ""

investment_tips = llm.invoke(

f"You are a helpful and expert finance planner. Based on the user's risk tolerance and investment goals, provide an overview of suitable investment options. Discuss the basics of stocks, bonds, mutual funds, ETFs, and other investment vehicles that align with their profile. Explain the importance of diversification and the role of risk management in investing. Offer to guide them through setting up a diversified investment portfolio, suggesting steps to get started based on their current financial situation. Use a friendly tone. Below are the user´s investment objective and risk tolerance : {user_investment_answer}"

)

st.write(investment_tips)

报告结构良好且连贯，但长度超过了我的预期。为了获得更简洁的输出，我们可以进一步优化提示。