1

Topic: Multiplication of a kernel of convolution of the filter in the size 3x3 with an array

There is a convolution kernel 33 and the image, presented by an array of pixels integer value. The convolution kernel is presented so://composite convolution kernels:////convolution kernel H =//...... | 1, 0, 1 |//src x | 0, 0, 0 |//...... |-1, 0,-1 |//////a convolution kernel V =//...... | 1, 0,-1 |//src x | 0, 0, 0 |//...... | 1, 0,-1 |//a convolution kernel = kernel H + a kernel V Implementation on the S-code is available, now I try to shift it on SSE the code. for (int inc=0; inc <height-2; inc ++) {//loaded in  3 lines str1_16pxs = _mm_loadu_si128 ((__ m128i *) (src_all_str)); str2_16pxs = _mm_loadu_si128 ((__ m128i *) (src2_all_str)); str3_16pxs = _mm_loadu_si128 ((__ m128i *) (src3_all_str));//packed on 16 discharges str1_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16 (str1_16pxs); str2_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16 (str2_16pxs); str3_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16 (str3_16pxs);//-! //here it is done 1 to convolution for 8px's//... In this place should the code is interposed!!!!//-//summ 1st 8to16 vertical registers sum1_str12_vert_16pxs_pack1st_8to16 = _mm_add_epi16 (str1_16pxs_pack1st_8to16, str2_16pxs_pack1st_8to16); sum1_str123_vert_16pxs_pack1st_8to16 = _mm_add_epi16 (sum1_str12_vert_16pxs_pack1st_8to16,str3_16pxs_pack1st_8to16); for (int jnc=0; jnc <(width>> 4); jnc ++) {str1_16pxs_plus_8pxs = _mm_srli_si128 (str1_16pxs, 8); str2_16pxs_plus_8pxs = _mm_srli_si128 (str2_16pxs, 8); str3_16pxs_plus_8pxs = _mm_srli_si128 (str3_16pxs, 8);//pack 2nd 8to16 registers (+8px's) str1_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16 (str1_16pxs_plus_8pxs); str2_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16 (str2_16pxs_plus_8pxs); str3_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16 (str3_16pxs_plus_8pxs);//--!//we do convolution for remaining 8px's and so up to the end  lines//... In this place should the code is interposed!!!! //-//summ vertic 8to16 registers sum1_str12_vert_16pxs_pack2nd_8to16 = _mm_add_epi16 (str1_16pxs_pack2nd_8to16, str2_16pxs_pack2nd_8to16); sum1_str123_vert_16pxs_pack2nd_8to16 = _mm_add_epi16 (sum1_str12_vert_16pxs_pack2nd_8to16,str3_16pxs_pack2nd_8to16);//--! 4 loading next 16 px's src_all_str + = 16; src2_all_str + = 16; src3_all_str + = 16;//... _mm_store_si128 ((__ m128i *) (dst_all_str), res); dst_all_str + = 8;}//for (jnc)}//for (inc) I truth do not know how to do multiplication convolution 3x3 kernels with SSE  in the line. I will be very grateful, if show.

2

Re: Multiplication of a kernel of convolution of the filter in the size 3x3 with an array

Such on  becomes is better...

3

Re: Multiplication of a kernel of convolution of the filter in the size 3x3 with an array

Hello, T4r4sB, you wrote: TB> Such on  becomes is better...... Also does not work on computers without Nvidia. That is on a heap . Much better on OpenCL then already, a covering where : CPU, iGPU, Nvidia GPU, AMD GPU. Where is only for specific cases.

4

Re: Multiplication of a kernel of convolution of the filter in the size 3x3 with an array

Hello, MartinEden, you wrote: ME> I truth do not know how to do multiplication convolution 3x3 kernels with SSE  in the line. ME> I will be very grateful, if show. The answer I do not know, because I use the library filter which in itself just SSE and uses.

5

Re: Multiplication of a kernel of convolution of the filter in the size 3x3 with an array

Hello, Nuzhny, you wrote: N> Hello, MartinEden, you wrote: ME>> I truth do not know how to do multiplication convolution 3x3 kernels with SSE  in the line. ME>> I will be very grateful, if show. N> the answer I do not know, because I use the library filter which in itself just SSE and uses. Good afternoon.) the task dared, all thanks.) on source codes . The filter looked, words are not present, children definitely tried.)