| 2 | 1/1 | 返回列表 |
| 查看: 758 | 回復(fù): 1 | |||
lionel0822鐵蟲 (初入文壇)
|
[求助]
摘要翻譯(通信、計(jì)算機(jī)、機(jī)器學(xué)習(xí))
|
|
目前主要的訪問權(quán)限控制機(jī)制為:DAC(Discretionary Access Control)、MAC(Mandatory Access Control)、RBAC(Role- based Access Control)。本文旨在提出一種新的方法,用機(jī)器學(xué)習(xí)算法建立一個(gè)訪問權(quán)限自動(dòng)化配置的模型。近年來,在機(jī)器學(xué)習(xí)領(lǐng)域,越來越多的學(xué)者把關(guān)注重點(diǎn)放在對(duì)原始數(shù)據(jù)集的處理上,因?yàn)槿绻軌蛴锰卣鞴こ痰姆椒ūM可能的挖掘出隱藏在原始的數(shù)據(jù)集中的更多數(shù)據(jù)、更多特征,用相同的機(jī)器學(xué)習(xí)算法,可以得到更好的效果。 本文由原始的數(shù)據(jù)集生成了很多新的數(shù)據(jù)集、特征集的組合,介紹了幾種機(jī)器學(xué)習(xí)算法:邏輯回歸、梯度提升決策樹、隨機(jī)森林。用上述三種算法在數(shù)據(jù)集、特征集的組合上產(chǎn)生了很多分類器模型,最后在上述幾種典型分類器模型的基礎(chǔ)上,研究了一些常用的集成學(xué)習(xí)算法,并用兩種集成學(xué)習(xí)算法組合了上述幾種分類器模型。 具體來說,本文的工作主要體現(xiàn)在以下幾個(gè)方面: (1)在原始數(shù)據(jù)集的基礎(chǔ)上,產(chǎn)生了4個(gè)新的數(shù)據(jù)集、5個(gè)新的特征集,本文介紹了幾種數(shù)學(xué)上的處理方式并選擇性的應(yīng)用在數(shù)據(jù)集和特征集上。尤其是在greedy數(shù)據(jù)集的產(chǎn)生過程中,本文利用貪婪前向選擇的特征選擇算法從繁雜的數(shù)據(jù)集合中選擇了最優(yōu)子集。 (2)介紹了邏輯回歸、梯度提升決策樹、隨機(jī)森林等機(jī)器學(xué)習(xí)算法,分別在不同的訓(xùn)練集上訓(xùn)練,最終選擇了14個(gè)典型分類器模型(五個(gè)邏輯回歸模型、四個(gè)梯度提升決策樹模型、五個(gè)隨機(jī)森林模型),邏輯回歸模型的AUC(Area Under Curve )分?jǐn)?shù)分布在0.9109~0.9196;梯度提升決策樹模型的AUC分布在0.8756~0.9079;隨機(jī)森林模型的AUC分布在0.8782~0.9047,并用上述三個(gè)算法在三個(gè)數(shù)據(jù)集上分別訓(xùn)練,比較了各個(gè)模型在三個(gè)數(shù)據(jù)集上的表現(xiàn)。證明了特征工程的處理在單一分類模型中是非常必要的。邏輯回歸在含有g(shù)reedy數(shù)據(jù)集的訓(xùn)練集中表現(xiàn)不錯(cuò),而梯度提升決策樹和隨機(jī)森林在含有tuples數(shù)據(jù)集的訓(xùn)練集中表現(xiàn)不錯(cuò)。但總體來說,邏輯回歸模型,在某些訓(xùn)練集上的表現(xiàn)是較好的。 (3)在上述分類模型的基礎(chǔ)上,本文介紹了投票表決和stacked generation集成學(xué)習(xí)算法,并集成上述14種典型分類器模型,投票表決集成模型的AUC達(dá)到了0.9244,相對(duì)于上述14個(gè)分類器模型的最大AUC,提高了0.0048,而stacked generation集成模型的AUC達(dá)到了0.9247,提高了0.0051。實(shí)驗(yàn)表明,運(yùn)用集成學(xué)習(xí)算法提高了最終模型的分類能力。 |
至尊木蟲 (著名寫手)
|
At present, the main control mechanism of access permissions are: DAC(Discretionary Access Control)、MAC(Mandatory Access Control)、RBAC(Role- based Access Control). A new method that establish a automation configuration model of access permissions using machine learning algorithms was introduced in this paper. Recently, more and more scholars have focused on original data processing in machine learning field, due to that the better effects would be received using the same machine learning algorithms base on that more data and features hidden in the original data sets can be dig by feature engineering methods. Many new combinations consisted of data sets and feature sets were created base on the original data sets. This paper was introduced several machine learning algorithms, including the logistic regression, gradient boost decision tree and random forest. Many classification models were produced by using the three above algorithms in data sets and feature sets’ combinations. Some commonly used ensemble learning algorithm based on several above classification models, which are grouped by using two ensemble learning algorithm soon afterwards. Specifically speaking, The main contributions of this paper are as follows: (1) 4 data sets and 5 feature sets were generated base on the original data sets. Several mathematical treatment methods were introduced and selectively applied in data sets and feature sets, especially in the process of greedy data sets, subset regression was selected from tedious data sets using feature selection algorithm which greed to choose before. (2)The logistic regression, gradient boost decision tree and random forest were introduced in this study. 14 typical classification models( 5 logistic regression, 4 gradient boost decision tree and 5 random forest) were selected base on training in different training sets. The AUC(Area Under Curve ) distribution for logistic regression, gradient boost decision tree and random forest were 0.9109~0.9196, 0.8756~0.9079 and 0.8782~0.9047, respectively. Then, used above three algorithms training in three data sets separately to compare the performance of each model to prove the very necessary of feature engineering in single classification model. Logistic regression showed a good performance in training of greedy data sets, while gradient boost decision tree and random forest showed a better performance in training of tuples data sets. But generally speaking, logistic regression showed better in some training sets. (3)This paper introduced voting ensemble learning algorithm and stacked generation ensemble learning algorithm base on above classification models, and integrated above 14 typical classification models. The AUC distribution for voting reached 0.9244, 0.0048 higher than the biggest AUC of above 14 classification models, moreover, The AUC distribution for stacked generation was 0.9247, advanced 0.0051. Results showed that the classification capability of final model was improved by using ensemble Learning Algorithm. |

| 2 | 1/1 | 返回列表 |
| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 材料與化工求調(diào)劑 +6 | 為學(xué)666 2026-03-16 | 6/300 |
|
|---|---|---|---|---|
|
[考研] 328求調(diào)劑,英語(yǔ)六級(jí)551,有科研經(jīng)歷 +3 | 生物工程調(diào)劑 2026-03-16 | 8/400 |
|
|
[考研] 268求調(diào)劑 +8 | 一定有學(xué)上- 2026-03-14 | 9/450 |
|
|
[考研] 有沒有道鐵/土木的想調(diào)劑南林,給自己招師弟中~ +3 | TqlXswl 2026-03-16 | 7/350 |
|
|
[考研] 材料與化工304求B區(qū)調(diào)劑 +7 | 邱gl 2026-03-11 | 8/400 |
|
|
[考研] 283求調(diào)劑 +3 | 聽風(fēng)就是雨; 2026-03-16 | 3/150 |
|
|
[考研] 286求調(diào)劑 +3 | lemonzzn 2026-03-16 | 5/250 |
|
|
[考研] 一志愿985,本科211,0817化學(xué)工程與技術(shù)319求調(diào)劑 +5 | Liwangman 2026-03-15 | 5/250 |
|
|
[考研] 304求調(diào)劑 +5 | 素年祭語(yǔ) 2026-03-15 | 5/250 |
|
|
[考研] 285求調(diào)劑 +6 | ytter 2026-03-12 | 6/300 |
|
|
[考研] 070303 總分349求調(diào)劑 +3 | LJY9966 2026-03-15 | 5/250 |
|
|
[考研] 0703 物理化學(xué)調(diào)劑 +3 | 我可以上岸的對(duì)?/a> 2026-03-13 | 5/250 |
|
|
[考研] 288求調(diào)劑 +4 | 奇點(diǎn)0314 2026-03-14 | 4/200 |
|
|
[考研] 材料工程327求調(diào)劑 +3 | xiaohe12w 2026-03-11 | 3/150 |
|
|
[考研] 中科大材料專碩319求調(diào)劑 +3 | 孟鑫材料 2026-03-13 | 3/150 |
|
|
[考研] 281求調(diào)劑 +9 | Koxui 2026-03-12 | 11/550 |
|
|
[考研] 求調(diào)劑 +5 | 一定有學(xué)上- 2026-03-12 | 5/250 |
|
|
[考研] 310求調(diào)劑 +3 | 【上上簽】 2026-03-11 | 3/150 |
|
|
[考研] 材料專碩350 求調(diào)劑 +4 | 王金科 2026-03-12 | 4/200 |
|
|
[考研] 290求調(diào)劑 +3 | ADT 2026-03-13 | 3/150 |
|