AIOPS
Concepts
-
XDR(Extended Detection and Response):综合网络、终端、云等层面
- EDR(Endpoint Detection and Response):主要关注终端层面的威胁和响应
- NDR(Network Detection and Response):主要关注网络层面的威胁和响应
-
SIEM(Security Information and Event Management)
- Splunk
- Elastic SIEM
- Log Rhythm
-
DXL(Data Exchange Layer):用于安全产品之间通信的协议
-
SOC(Security Operations Center)
-
IDS(Intrusion Detection System)
- HIDS (Host-Based Intrusion Detection System)
- FIM (File Integrity Monitoring)
- NIDS (Network-Based IDS)
- Signature-based IDS (Knowledge-based IDS)
- Anomaly-based IDS
- HIDS (Host-Based Intrusion Detection System)
-
NTA(Network Traffic Analysis)
-
DLP(Data Loss Prevention)
- EDLP(Endpoint-based DLP)
- NDLP(Network-based DLP)
-
NAC(Network Access Control):网络准入控制,确保只有符合条件的设备才能访问
Projects
- OpenDXL
- OpenXDR
- OpenEDR
- OpenSOC
- GrayLog
- OSSIM
- Security Onion
- Apache Matron
- IDS
- Sigma
- OpenSearch-Using Security Analysis
- MSTIC: msticpy is a library for InfoSec investigation and hunting in Jupyter Notebooks
Anomaly Detection
Log-based
Log Parser
Projects
Researchers
References
Mechine Learning
Libraries
Time series
Libraries
DLP
Document Classification
Articles
Datasets
Models
NLI(Natural Language Inference)
Text Similarity
- Brown Clustering
LLM Deploy
- Venus
- Qpilot
- hunyuan
Tencent Logs Data
- 将容器内日志上报到 CLS:
- https://kubernetes.woa.com/v4/projects/prjzcdvg/cls/
- http://wiki.kubernetes.woa.com/quickstart/log_service.html
- 配置 CLS 转发到 CKafka
Applications
UEBA
Solution
- feature engineering -> machine learning anomaly detection -> llm detection and explanations
- llm labelling -> supervised anomaly detection -> llm explanations
- llm labelling -> select anomaly detection model and parameters -> detection -> llm(weaker?) explanations
- llm fine tuning
- general rule-based model alternative
- llm classification
Backends
- Venus: ChatGLM, LLaMA
- Qpilot: ChatGPT
- Hunyuan: hunyuan-13B, hunyuan-176B
Evaluation
-
llm model comparison
-
llm v.s. rule-based
-
params tuning
- temperature
- top_p
-
use prompt to control risk level
-
llm write rules
Difficulties
-
deployment
-
tokens limitation
-
model incompetence
- unexpected answer
- wrong answer
- inconsistent answer
- good answer with bad explanation
-
costs/rate limit
-
data shiftiness/evolution
-
hyperparameter of llm (context length, time range)
Costs
- average prompt tokens:
- private deployment
TODO
- rule based labeling
- model selection
- gpt4 lableing
- data anonymization
- mock abnormal event
- compromised accounts
- insider threats
- account sharing
- bot
- account lockout
- dormant accounts
- data breaches
- Data generalization
- CoT prompts