❶ python怎麼統計一句英語的單詞數量並輸出
題主你好,
代碼及測試截圖如下:
說明: 上圖紅框處的result可不寫, 只是為了看一下分隔結果是否正確.
希望可以幫到題主, 歡迎追問.
❷ Python如何統計文本中各個詞性的數量
如果是統計文本中某個詞出現的數量就用循環遍歷讀取,匹配到一次,num+=1,最後print
❸ 用Python統計詞頻
def statistics(astr):
# astr.replace("\n", "")
slist = list(astr.split("\t"))
alist = []
[alist.append(i) for i in slist if i not in alist]
alist[-1] = alist[-1].replace("\n", "")
return alist
if __name__ == "__main__":
code_doc = {}
with open("test_data.txt", "r", encoding='utf-8') as fs:
for ln in fs.readlines():
l = statistics(ln)
for t in l:
if t not in code_doc:
code_doc.setdefault(t, 1)
else:
code_doc[t] += 1
for keys in code_doc.keys():
print(keys + ' ' + str(code_doc[keys]))
❹ python,字元串怎麼統計單詞個數
如果你是指一串單詞,空格隔開的,統計詞頻,就用列表和字典來。
比如輸入的是這樣:this one ok this one two three go end at end
dic1={}
n=input().split()
for i in n:
縮進if i in dic1:dic1[i]+=1
縮進else:dic1[i]=1
print(dic1)
❺ python 可以統計出 一個詞的出現的次數的代碼
先用split()將輸入切分成一個列表,獲得列表data
然後用列表統計函數data.count('aa') 就能統計出有多少個aa
具體自己寫寫吧。
❻ python jieba 統計詞數問題
看不到前面的代碼,但從後面的代碼來看,counts不是集合而是字典對象。
如果前面是這樣初始化counts處理的,你可以看到counts是一個dict的類型
❼ 如何用python統計單詞的頻率
代碼:
passage="""Editor』s Note: Looking through VOA's listener mail, we came across a letter that asked a simple question. "What do Americans think about China?" We all care about the perceptions of others. It helps us better understand who we are. VOA Reporter Michael Lipin begins a series providing some answers to our listener's question. His assignment: present a clearer picture of what Americans think about their chief world rival, and what drives those perceptions.
Two common American attitudes toward China can be identified from the latest U.S. public opinion surveys published by Gallup and Pew Research Center in the past year.
First, most of the Americans surveyed have unfavorable opinions of China as a whole, but do not view the country as a threat toward the United States at the present time.
Second, most survey respondents expect China to pose an economic and military threat to the United States in the future, with more Americans worried about the perceived economic threat than the military one.
Most Americans view China unfavorably
To understand why most Americans appear to have negative feelings about China, analysts interviewed by VOA say a variety of factors should be considered. Primary among them is a lack of familiarity.
"Most Americans do not have a strong interest in foreign affairs, Chinese or otherwise," says Robert Daly, director of the Kissinger Institute on China and the United States at the Washington-based Wilson Center.
Many of those Americans also have never traveled to China, in part because of the distance and expense. "That means that like most human beings, they take short cuts to understanding China," Daly says.
Rather than make the effort to regularly consume a wide range of U.S. media reports about China, analysts say many Americans base their views on widely-publicized major events in China's recent history."""
passage=passage.replace(","," ").replace("."," ").replace(":"," ").replace("』","'").
replace('"'," ").replace("?"," ").replace("!"," ").replace(" "," ")#把標點改成空格
passagelist=passage.split(" ")#拆分成一個個單詞
pc=passagelist.()#復制一份
for i in range(len(pc)):
pi=pc[i]#這一個字元串
if pi.count(" ")==len(pi):#如果全是空格
passagelist.remove(pi)#刪除此項
worddict={}
for j in range(len(passagelist)):
pj=passagelist[j]#這一個單詞
if pj not in worddict:#如果未被統計到
worddict[pj]=1#增加單詞統計,次數設為1
else:#如果統計過了
worddict[pj]+=1#次數增加1
output=""#按照字母表順序,製表符
worddictlist=list(worddict.keys())#提取所有的單詞
worddictlist.sort()#排序(但大小寫會出現問題)
worddict2={}
for k in worddictlist:
worddict2[k]=worddict[k]#排序好的字典
print("單次 次數")
for m in worddict2:#遍歷輸出
tabs=(23-len(m))//8#根據單次長度輸入,如果復制到表格,請把此行改為tabs=2
print("%s%s%d"%(m," "*tabs,worddict[m]))
註:加粗部分是您要統計的短文,請修改。我這里的輸出效果是:
American 1
Americans 9
Center 2
China 10
China's 1
Chinese 1
Daly 2
Editor's 1
First 1
Gallup 1
His 1
Institute 1
It 1
Kissinger 1
Lipin 1
Looking 1
Many 1
Michael 1
Most 2
Note 1
Pew 1
Primary 1
Rather 1
Reporter 1
Research 1
Robert 1
S 2
Second 1
States 3
That 1
To 1
Two 1
U 2
United 3
VOA 2
VOA's 1
Washington-based1
We 1
What 1
Wilson 1
a 10
about 6
across 1
affairs 1
all 1
also 1
among 1
an 1
analysts 2
and 5
answers 1
appear 1
are 1
as 2
asked 1
assignment 1
at 2
attitudes 1
base 1
be 2
because 1
begins 1
beings 1
better 1
but 1
by 2
came 1
can 1
care 1
chief 1
clearer 1
common 1
considered 1
consume 1
country 1
cuts 1
director 1
distance 1
do 3
drives 1
economic 2
effort 1
events 1
expect 1
expense 1
factors 1
familiarity 1
feelings 1
foreign 1
from 1
future 1
have 4
helps 1
history 1
human 1
identified 1
in 5
interest 1
interviewed 1
is 1
lack 1
latest 1
letter 1
like 1
listener 1
listener's 1
mail 1
major 1
make 1
many 1
means 1
media 1
military 2
more 1
most 4
negative 1
never 1
not 2
of 10
on 2
one 1
opinion 1
opinions 1
or 1
others 1
otherwise 1
our 1
part 1
past 1
perceived 1
perceptions 2
picture 1
pose 1
present 2
providing 1
public 1
published 1
question 2
range 1
recent 1
regularly 1
reports 1
respondents 1
rival 1
say 2
says 2
series 1
short 1
should 1
simple 1
some 1
strong 1
survey 1
surveyed 1
surveys 1
take 1
than 2
that 2
the 16
their 2
them 1
they 1
think 2
those 2
threat 3
through 1
time 1
to 7
toward 2
traveled 1
understand 2
understanding 1
unfavorable 1
unfavorably 1
us 1
variety 1
view 2
views 1
we 2
what 2
who 1
whole 1
why 1
wide 1
widely-publicized1
with 1
world 1
worried 1
year 1
(應該是對齊的,到這就亂了)
註:目前難以解決的漏洞
1、大小寫問題,無法分辨哪些必須大寫哪些只是首字母大寫
2、's問題,目前如果含有隻能算為一個單詞里的
3、排序問題,很難做到按照出現次數排序