Search This Blog

Count number of each word in a string in SQL

Question asked by a user of SQL & MSBI Groups-I have a paragraph, I have taken paragraph as a string in sql then I want how many words are repeated, and how many times the words are repeated?

Eg-Ram name is Ram because of lord Ram since Ram Father is devoted to lord Ram.

Output should be words_Repeated Word_Count as below

Solution-I have written the below code that will count each alphabet in the string.Secondly it will also handle the comma and full Stop in the word for eg if a word contains comma or full stop it will remove the (Comma & full stop) and count it as a word as example below.
declare @str varchar(200) = 'Ram name is Ram because of lord Ram, since Ram Father is devoted to lord Ram.'
declare @temptable table(word varchar(10), word_count int, length int)
declare @value varchar(10)
declare @index int

while (LEN(@str) > 0)
begin
set @index = CHARINDEX(' ',@str,1)
if (@index = 0)
begin
set @value =Replace(@str,'.','')
if exists (select * from @temptable where word=@value)
update @temptable set word_count=word_count+1 where word=@value
else
insert into @temptable select @value,1,LEN(@value)
break;
end
else
set @value = LEFT(@str,charIndex(' ',@str,1))
begin
set @value=left(@str,charindex(' ',@str,1))
IF @value LIKE '%,%' OR @value LIKE '%.%'
      begin
        SET @VALUE=left(@value,len(@value)-1)
      end
end

if exists (select * from @temptable where word=@value)
update @temptable set word_count=word_count+1 where word=@value
else
insert into @temptable select @value,1,LEN(@value)

set @str= RIGHT(@str, Len(@str)-CharIndex(' ',@str,1))
end
select * from @temptable

Output


If you have any issue regarding SQL & BI where you are stuck write to me in the comment box I will try to help you out.
       

4 comments:

  1. Hi Rakesh,

    Glad to see this blog. one flaw i observed here is the last word 'Ram.' is not inserted in the table.

    This is just because of last word in string (i.e Ram.) is not being inserted or counted or updated because of using Break statement when index is 0.

    I used it as following

    declare @str varchar(200) = 'Ram name is Ram because of lord Ram since Ram Father is devoted to lord Ram.'
    declare @temptable table(word varchar(10), count int, leng int)
    declare @value varchar(10)
    declare @index int

    while (LEN(@str) > 0)
    begin
    set @index = CHARINDEX(' ',@str,1)
    if (@index = 0)
    begin
    set @value =@str
    if exists (select * from @temptable where word=@value)
    update @temptable set count=count+1 where word=@value
    else
    insert into @temptable select @value,1,LEN(@value)

    break;
    end
    else
    set @value = LEFT(@str,charIndex(' ',@str,1))


    if exists (select * from @temptable where word=@value)
    update @temptable set count=count+1 where word=@value
    else
    insert into @temptable select @value,1,LEN(@value)

    set @str= RIGHT(@str, Len(@str)-CharIndex(' ',@str,1))
    end
    select * from @temptable



    2) Other Limitation is "there should not be space next to last word. If there is space, query gets looped all the way".


    ReplyDelete
  2. hi rakesh
    (SELECT distinct 'RIG1_HEADERREPORT_StringTable' as TableName, '1' as RigNumber, min(a.DateAndTime) as startdate,max(a1.DateAndTime) as Enddate, min(a.Val) as SerialNumber ,min(b1.Val) as ItemId,min(b2.Val) as FormatNumber,min(b3.Val) as OperatorId,
    --convert(varchar(5),DATEDIFF(s, B.DateAndTime - A.DateAndTime)/3600)+'':''+convert(varchar(5),DATEDIFF(s, B.DateAndTime - A.DateAndTime)%3600/60)+'':''+convert(varchar(5),(DATEDIFF(s, B.DateAndTime - A.DateAndTime)%60)) as Duration
    (SELECT CONVERT(TIME, DATEADD(s, SUM(( DATEPART(hh, (B.DateAndTime - A.DateAndTime)) * 3600 ) + ( DATEPART(mi, (B.DateAndTime - A.DateAndTime)) * 60 ) + DATEPART(ss, (B.DateAndTime - A.DateAndTime))), 0)) AS total_time
    FROM (SELECT DateAndTime, ROW_NUMBER() OVER (ORDER BY DateAndTime ASC) AS S FROM RIG1_HEADERREPORT_StringTable WHERE Marker='S' and TagIndex=2 AND a.Val=Val) AS A
    INNER JOIN
    (SELECT DateAndTime, ROW_NUMBER() OVER (ORDER BY DateAndTime ASC) AS E FROM RIG1_HEADERREPORT_StringTable WHERE Marker='E' and TagIndex=2 AND a.Val=Val) AS B
    ON B.E=A.S) AS Duration
    FROM RIG1_HEADERREPORT_StringTable AS a
    Left join
    RIG1_HEADERREPORT_StringTable a1 on a.Val=a1.Val
    Left join
    RIG1_HEADERREPORT_StringTable b1 on a.DateAndTime=b1.DateAndTime
    Left join
    RIG1_HEADERREPORT_StringTable b2 on a.DateAndTime=b2.DateAndTime
    Left join
    RIG1_HEADERREPORT_FloatTable b3 on a.DateAndTime=b3.DateAndTime
    where a.TagIndex=2 and a.Marker='S' and a1.TagIndex=2 and a1.Marker='E' and b1.TagIndex=1 and b2.TagIndex=4 and b3.TagIndex=0 and b3.Marker='S'
    group by a.Val)


    in this query duration need to get HH:MM:SS but its calculating upto 24 hours.

    if the days difffrence is more then 2 days then i need to get hours calculation how?? pls help me

    ReplyDelete
  3. Baron Betting Tips and Prediction from TABO 온라인카지노 온라인카지노 dafabet link dafabet link 6674NHL Quick Picks - Online Casino

    ReplyDelete