1

Topic: Handling of multi-signs at machine training

If each object has as values of a certain sign - an array of the categories, what method of reduction of dimensionality to select? Normally select one-three and  values id cat_1 cat_2 but remaining thus or are lost. To inflate dimensionality too it would not be desirable, as them can be 20 and can be 0.

2

Re: Handling of multi-signs at machine training

And if thus categories have "groups" - generally circus. dad> if each object has as a sign - an array of the values (categories), what method of reduction of dimensionality to select? dad> normally select one-three and  values id cat_1 cat_2 but remaining thus or are lost. To inflate dimensionality too it would not be desirable, dad> as them can be 20 and can be 0.

3

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad> if each object has as values of a certain sign - an array of the categories, what method of reduction of dimensionality to select? dad> normally select one-three and  values id cat_1 cat_2 but remaining thus or are lost. To inflate dimensionality too it would not be desirable, dad> as them can be 20 and can be 0.  weight for categories and , receive a scalar. From such values it is possible to do new signs. Weight it is possible to select as for neural networks.

4

Re: Handling of multi-signs at machine training

_> Vvvedite of weight for categories and , receive a scalar. From such values it is possible to do new signs. Weight it is possible to select as for neural networks. Geometrically if, it is reduced to vector product? And how to pick up weight? We assume, a sign - "subject". Exists about 200 those broken on 10 groups. Value of a sign is - an array in which can be as a subject and group that.

5

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad> geometrically if it is reduced to vector product? To scalar product dad> and how to pick up weight? Either a stare method. Or search. dad> we assume, a sign - "subject". dad> exists about 200 those broken on 10 groups. dad> value of a sign is - an array in which can be as a subject and group that. S1=w11*t1+w12*t2 +... S2=w21*t1+w22*t2 +... For example technical subjects w=1  0 or subjects connected to any area w=0. 1 depending on communication Then S [i] - the relation to

6

Re: Handling of multi-signs at machine training

dad> if each object has as values of a certain sign - an array of the categories, what method of reduction of dimensionality to select? dad> normally select one-three and  values id cat_1 cat_2 but remaining thus or are lost. To inflate dimensionality too it would not be desirable, Factor analysis?

7

Re: Handling of multi-signs at machine training

LVV> Factor analysis? It for correlation of different signs. Well and on high-speed performance still a question. It is necessary to contract a dial-up of values of one sign to one-two values. For example, "subject" - contains 10 groups that and in each group on 20 those. The object can have an unlimited number that (both groups, and specific subjects)

8

Re: Handling of multi-signs at machine training

_> S1=w11*t1+w12*t2 +... _> S2=w21*t1+w22*t2 +... _> For example technical subjects w=1  0 _> or subjects connected to any area w=0. 1 depending on communication _> Then S [i] - the relation to  yet I do not understand. Judging by the formula at you product of scales on a vector that. Look, for example, at to change there are objects 1.theme = [t1, t2, tg3] 2.theme = [t4, tg5] I need to contract values of attribute theme to one-two-n values. Roughly-speaking theme_1 theme_2 in my opinion, weight adding to subjects makes negative impact on data domain.

9

Re: Handling of multi-signs at machine training

dad> it for correlation of different signs. Well and on high-speed performance still a question. Well it is so strong  signs and unite in one factor - the general dimensionality decreases

10

Re: Handling of multi-signs at machine training

Hello, LaptevVV, you wrote: dad>> it for correlation of different signs. Well and on high-speed performance still a question. LVV> well it is so strong  signs and unite in one factor - the general dimensionality here one sign decreases. Which has some values. An array. Can use any statistical techniques is easier?

11

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad> yet I do not understand. Judging by the formula at you product of scales on a vector that. dad> look, for example, at to change there are objects dad> 1.theme = [t1, t2, tg3] dad> 2.theme = [t4, tg5] dad> I need to contract values of attribute theme to one-two-n values. dad> it is rough-speaking theme_1 theme_2 Roughly w1: [t1:0.3, t2:0.3, tg3:0.6, t4:0.02, tg4:0.02] w2: [t1:0.1, t2:0.1, tg3:-0.2, t4:0.6, tg5:0.4] S (O1) = {(O1.theme, w1), (O1.theme, w2)} = {1, 0} S (O2) = {(O2.theme, w1), (O2, theme, w2)} = {0, 1} dad> in my opinion, weight adding to subjects makes negative impact on data domain. So weight it is a search subject. Clustering, a method of the main things  or it is simple FCNN methods a great lot.

12

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad> if each object has as values of a certain sign - an array of the categories, what method of reduction of dimensionality to select? dad> normally select one-three and  values id cat_1 cat_2 but remaining thus or are lost. To inflate dimensionality too it would not be desirable, dad> as them can be 20 and can be 0. I would think of following variants: 1. Principal Component Analysis or any variety of this method. As a matter of fact it also is technics of reduction of dimensionality of a matrix of signs. I.e. you can present the categories in the form of big  matrixes, then apply to it PCA, receive principal components and use them for training. As a matter of fact I select kol-in a principal component and reducing dimensionality of space of signs, you set what kol-in the information will be lost. 2. Manually to discard features which have a low dispersion or it is strong  with other features. 3. To check up - and whether has generally sense to reduce dimensionality or model itself with it consults? For example, type models xgboost, decision trees, lasso regression, in my opinion, not bad can get rid of unnecessary features. 4. If be going to use neural networks it is possible to think about autoencoder.

13

Re: Handling of multi-signs at machine training

dad>> in my opinion, weight adding to subjects makes negative impact on data domain. _> so weight it is a search subject. Clustering, a method of the main things  or it is simple FCNN methods a great lot. Did not understand about a search subject. I have a learning sampling of objects (for a network or the regression for example), among a dial-up of values of attribute is not present preferences - therefore weight it is impossible to place. About clustering it is clear - i.e. to break an array of values of attribute into two-three clusters. Truth from the point of view of data domain of value of categories are not allocation.

14

Re: Handling of multi-signs at machine training

I would think of following variants: the method of principal components looked. Here the matter is that it is used for a matrix of a dial-up of attributes. And me it is necessary to displace a dial-up of values of one attribute to one or two values. Now first two of an array are stupidly selected. Also are torn in two attributes of model cat1 cat2 in my opinion for training it not so well.

15

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad>>> in my opinion, weight adding to subjects makes negative impact on data domain. _>> so weight it is a search subject. Clustering, a method of the main things  or it is simple FCNN methods a great lot. dad> did not understand about a search subject. I have a learning sampling of objects (for a network or the regression for example), dad> among a dial-up of values of attribute is not present preferences - therefore weight it is impossible to place. To you already advised from M signs a network displaces N <<M and then again tears in M thus   more close to repeat inputs. https://habrahabr.ru/post/331382/dad> about clustering it is clear - i.e. to break an array of values of attribute into two-three clusters. Truth from the point of view of data domain dad> values of categories are not allocation. For this purpose it is necessary to enter still

16

Re: Handling of multi-signs at machine training

_> to you already advised from M signs a network displaces N <<M and then again tears in M thus   more close to repeat inputs. _> https://habrahabr.ru/post/331382/a python, , it simply tin for this purpose that of an array of the variable size to do two values. Thus millions times a second. Something is necessary idle time as an axe. A such median for  signs. dad>> about clustering it is clear - i.e. to break an array of values of attribute into two-three clusters. Truth from the point of view of data domain dad>> values of categories are not allocation. _> for this purpose it is necessary to enter still  distance from what to what?

17

Re: Handling of multi-signs at machine training

_> to you already advised from M signs a network displaces N <<M and then again tears in M thus   more close to repeat inputs. _> https://habrahabr.ru/post/331382/it seems understood - i.e. you offer separately on sampling by training methods to receive certain weight for "subjects". And then at data preparation for "the main" training by means of these scales to displace attribute arrays?

18

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: _>> to you already advised from M signs a network displaces N <<M and then again tears in M thus   more close to repeat inputs. _>> https://habrahabr.ru/post/331382/dad> it seems understood - i.e. you offer separately on sampling by training methods to receive certain weight for "subjects". And then at data preparation for "the main" training by means of these scales to displace attribute arrays? Yes than you will receive weight not in essence.

19

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad> distance from what to what? For clustering

20

Re: Handling of multi-signs at machine training

Hello, Jeffrey, you wrote: Hello, dad, you wrote 3. To check up - and whether has generally sense to reduce dimensionality or model itself with it consults? For example, type models xgboost, decision trees, lasso regression, in my opinion, not bad can get rid of unnecessary features. Then already CatBoost from Yandex. For  it will be better than signs.

21

Re: Handling of multi-signs at machine training

N> Then already CatBoost from Yandex. For  it will be better than signs. This another absolutely. It   signs, i.e. removes this problem of data preparation from the user. The task another. In each object to reduce values of one attribute from adjustable array to fixed one-two.

22

Re: Handling of multi-signs at machine training

Hello, kov_serg, you wrote: _> Hello, dad, you wrote: dad>> distance from what to what? _> for clustering here in it and a problem - transposition of pears with pigs. What distance between a subject of shopping and a formation subject, or a hobby of the radio fan and symphonic music? We assume that it is possible to analyze combinations that on sampling and to construct the strong combinations, but samplings in which are all subjects - stupidly are not present. And where the warranty that it is representative.

23

Re: Handling of multi-signs at machine training

Hello, dad, you wrote: dad> we assume that it is possible to analyze combinations that on sampling and to construct the strong combinations, but dad> samplings in which are all subjects - stupidly are not present. And where the warranty that it is representative. And what generally is? In sense of the initial data.

24

Re: Handling of multi-signs at machine training

dad>> we assume that it is possible to analyze combinations that on sampling and to construct the strong combinations, but dad>> samplings in which are all subjects - stupidly are not present. And where the warranty that it is representative. N> and what generally is? In sense of the initial data. There is a sampling. But it changes. We take users U who make certain action A. The user has a heap of signs P (1. pn) with them everything is all right. But one sign "subject" is an adjustable array. On the basis of this sampling it is necessary to build assumptions of probability of action And for new users. Classical . A question in how to contract values of a sign "subject" to one-two signs.

25

Re: Handling of multi-signs at machine training

dad>> distance from what to what? _> for clustering about autoencoder. Interesting here so it turns out: if to make the triplex codec, picking up the size of an inside layer under the necessary number of variables, and input and output for number of all "subjects". To train with reverse propagation submitting 0/1 on an input for different combinations. Then for new objects to remove from the latent layer of value.