In addition to Solution 1
- Picking up on your statement
Quote:
May data will grow according to time
There is an article at sqlservercentral that discusses the performance of various
string splitting alternatives[
^]
- Picking up on Maciej spotting the full stop in the sentence this
post[
^] suggests a nice method of getting rid of all not alpha characters
- I frequently use the following function to split data as it is reasonably peformant - Note it's not my own work and unfortunately I can't find the original at the moment(if I do, or someone else does then I will accredit the work here)
CREATE FUNCTION [dbo].[fnSplitString]
(
@string NVARCHAR(MAX),
@delimiter CHAR(1)
)
RETURNS @output TABLE(splitdata NVARCHAR(MAX)
)
BEGIN
DECLARE @start INT, @end INT
SELECT @start = 1, @end = CHARINDEX(@delimiter, @string)
WHILE @start < LEN(@string) + 1 BEGIN
IF @end = 0
SET @end = LEN(@string) + 1
INSERT INTO @output (splitdata)
VALUES(SUBSTRING(@string, @start, @end - @start))
SET @start = @end + 1
SET @end = CHARINDEX(@delimiter, @string, @start)
END
RETURN
END
- I went down the route of using temporary tables which might not be best for large data tables. I'm including my efforts here only to demonstrate the calls to the functions I've suggested (It also demonstrates an alternative to using CURSOR if you ever get tempted down that route)
create table #temp
(
rownum int,
datacol varchar(max)
) ON [PRIMARY]
insert into #temp
SELECT rownum = ROW_NUMBER() OVER (order by id), datacol
from topwords
DECLARE @wordlist table (word varchar(max))
declare @maxi int
SELECT @maxi = COUNT(*) FROM topwords
DECLARE @loopCount int = 1
DECLARE @txt varchar(max)
WHILE @loopCount <= @maxi
BEGIN
SELECT @txt = datacol from #temp1 WHERE rownum = @loopCount
INSERT INTO @wordlist SELECT [dbo].fnRemoveNonAlpha(splitdata) FROM [dbo].fnSplitString(@txt,' ')
SET @loopCount = @loopCount + 1
END
select word, count(*) from @wordlist
group by word
order by count(*) desc