一个专注于大数据技术架构与应用分享的技术博客

Python3安装textrank4zh实现分词关键词提取及摘要生成报错:AttributeError: module ‘networkx’ has no attribute ‘from_numpy_matrix’

1、安装textrank4zh

pip install textrank4zh

2、测试


from textrank4zh import TextRank4Keyword, TextRank4Sentence

text = "过去两天,国内生成式人工智能服务领域热闹极了:阿里云推出“通义千问”大模型;商汤科技“日日新”、昆仑万维“天工”大模型、有赞“加我智能”在同一天发布;360基于大模型开发的人工智能产品矩阵“360智脑”率先落地搜索场景……再加上百度已发布的“文心一言”,国内互联网巨头们在3月许下的诺言正在一一兑现。"
tr4w = TextRank4Keyword()

tr4w.analyze(text=text, lower=True, window=2, vertex_source="all_filters")

print( '关键词:' )
for item in tr4w.get_keywords(20, word_min_len=2):
    print(item.word, item.weight)

print()
print( '关键短语:' )
for phrase in tr4w.get_keyphrases(keywords_num=20, min_occur_num= 2):
    print(phrase)

tr4s = TextRank4Sentence()
tr4s.analyze(text=text, lower=True, source = 'all_filters')

print()
print( '摘要:' )
for item in tr4s.get_key_sentences(num=3):
    print(item.index, item.weight, item.sentence)  # index是语句在文本中位置,weight是权重

最后,运行结果如下:

C:\workspace>python create_tag.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\osystem\AppData\Local\Temp\jieba.cache
Loading model cost 0.837 seconds.
Prefix dict has been built successfully.
关键词:
人工智能 0.06974899614411177
推出 0.050498553906766885
科技 0.050498553906766885
搜索 0.050498553906766885
服务 0.0382373567280689
产品 0.0382373567280689
模型 0.036220973395951234
开发 0.03540654295435084
发布 0.03460207612456747
加上 0.03460207612456747
百度 0.03460207612456747
文心 0.03460207612456747
互联网 0.03460207612456747
巨头 0.03460207612456747
诺言 0.03460207612456747
正在 0.03460207612456747
通义 0.026653837233467786
商汤 0.026653837233467786
日日 0.026653837233467786
落地 0.026653837233467786

关键短语:

摘要:
1 0.2610907441134144 商汤科技“日日新”、昆仑万维“天工”大模型、有赞“加我智能”在同一天发布
0 0.25544560910201897 过去两天,国内生成式人工智能服务领域热闹极了:阿里云推出“通义千问”大模型
2 0.2480067018897252 360基于大模型开发的人工智能产品矩阵“360智脑”率先落地搜索场景

常见问题:

1、AttributeError: module 'networkx' has no attribute 'from_numpy_matrix'

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\osystem\AppData\Local\Temp\jieba.cache
Loading model cost 0.805 seconds.
Prefix dict has been built successfully.
Traceback (most recent call last):
  File "create_tag.py", line 14, in <module>
    tr4w.analyze(text=text, lower=True, window=2, vertex_source="all_filters")  # py2中text必须是utf8编码的str或者unicode对象,py3中必须是utf8编码的bytes或者str对象
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\textrank4zh\TextRank4Keyword.py", line 93, in analyze
    self.keywords = util.sort_words(_vertex_source, _edge_source, window = window, pagerank_config = pagerank_config)
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\textrank4zh\util.py", line 160, in sort_words
    nx_graph = nx.from_numpy_matrix(graph)
AttributeError: module 'networkx' has no attribute 'from_numpy_matrix'

由于上面第一步安装extrank4zh时,自动安装的networkx包为3.1版本,而与extrank4zh适配的版本为1.9.1。因此,需要回退版本。

C:\workspace\>pip3 install networkx==1.9.1
Collecting networkx==1.9.1
  Downloading networkx-1.9.1-py2.py3-none-any.whl (1.2 MB)
     ---------------------------------------- 1.2/1.2 MB 2.2 MB/s eta 0:00:00
Collecting decorator>=3.4.0 (from networkx==1.9.1)
  Downloading decorator-5.1.1-py3-none-any.whl (9.1 kB)
Installing collected packages: decorator, networkx
  Attempting uninstall: networkx
    Found existing installation: networkx 3.1
    Uninstalling networkx-3.1:
      Successfully uninstalled networkx-3.1
Successfully installed decorator-5.1.1 networkx-1.9.1

2、ImportError: cannot import name 'escape' from 'cgi'

Traceback (most recent call last):
  File "create_tag.py", line 9, in <module>
    from textrank4zh import TextRank4Keyword, TextRank4Sentence
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\textrank4zh\__init__.py", line 3, in <module>
    from .TextRank4Keyword import TextRank4Keyword
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\textrank4zh\TextRank4Keyword.py", line 10, in <module>
    import networkx as nx
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\networkx\__init__.py", line 76, in <module>
    import networkx.readwrite
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\networkx\readwrite\__init__.py", line 14, in <module>
    from networkx.readwrite.gml import *
  File "C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\site-packages\networkx\readwrite\gml.py", line 39, in <module>
    from cgi import escape
ImportError: cannot import name 'escape' from 'cgi' (C:\Users\osystem\AppData\Local\Programs\Python\Python38\lib\cgi.py)

回退版本后执行,发现又报错了。此时,需要修改gml.py文件(文件路径在上面的报错提示的倒数第4行)中的引用来源。

将文件中的“from cgi import escape”修改为“from html import escape”即可。

赞(3)
版权声明:本文采用知识共享 署名4.0国际许可协议 [BY-NC-SA] 进行授权
文章名称:《Python3安装textrank4zh实现分词关键词提取及摘要生成报错:AttributeError: module ‘networkx’ has no attribute ‘from_numpy_matrix’》
文章链接:https://macsishu.com/python3-textrank4zh
本站资源仅供个人学习交流,请于下载后24小时内删除,不允许用于商业用途,否则法律问题自行承担。