- 线性代数:
机器学习开发人员需要数据结构,如向量,矩阵和张量,它们具有紧凑的语法和硬件加速操作。其他语言的例子:NumPy,MATLAB和R标准库,Torch。 - 概率论:
各种随机数据生成:随机数和它们的集合; 概率分布; 排列; 收集,加权抽样等等。示例:NumPy和R标准库。 - 数据输入输出:
在机器学习中,我们通常最感兴趣的是以下列格式解析和保存数据:纯文本,CSV等表格文件,SQL等数据库,Internet格式JSON,XML,HTML和Web抓取。还有很多特定于域的格式。 - 数据争用:
类似表的数据结构,数据工程工具:数据集清理,查询,拆分,合并,改组等。Pandas,dplyr。 - 数据分析/统计:
描述性统计,假设检验和各种统计资料。R标准库,以及很多CRAN包。 - 可视化:
统计数据可视化(非饼图):图形可视化,直方图,马赛克图,热图,树状图,3D表面,空间和多维数据可视化,交互式可视化,Matplotlib,Seaborn,Bokeh,ggplot2,ggmap,Graphviz,D3 .js。 - 符号计算:
自动区分:SymPy,Theano,Autograd。 - 机器学习包:
机器学习算法和求解器。Scikit-learn,Keras,XGBoost,E1071和caret。 - 交互式原型设计环境:
Jupyter,R studio,MATLAB和iTorch。
Here is a list of components that are needed for the successful machine learning research and development, and examples of popular libraries and tools of the type:
- Linear algebra:
Machine learning developer needs data structures like vectors, matrices, and tensors with compact syntax and hardware-accelerated operations on them. Examples in other languages: NumPy, MATLAB, and R standard libraries, Torch. - Probability theory:
All kinds of random data generation: random numbers and collections of them; probability distributions; permutations; shuffling of collections, weighted sampling, and so on. Examples: NumPy, and R standard library. - Data input-output:
In machine learning, we are usually most interested in the parsing and saving data in the following formats: plain text, tabular files like CSV, databases like SQL, internet formats JSON, XML, HTML, and web scraping. There are also a lot of domain-specific formats. - Data wrangling:
Table-like data structures, data engineering tools: dataset cleaning, querying, splitting, merging, shuffling, and so on. Pandas, dplyr. - Data analysis/statistic:
Descriptive statistic, hypotheses testing and all kinds of statistical stuff. R standard library, and a lot of CRAN packages. - Visualization:
Statistical data visualization (not pie charts): graph visualization, histograms, mosaic plots, heat maps, dendrograms, 3D-surfaces, spatial and multidimensional data visualization, interactive visualization, Matplotlib, Seaborn, Bokeh, ggplot2, ggmap, Graphviz, D3.js. - Symbolic computations:
Automatic differentiation: SymPy, Theano, Autograd. - Machine learning packages:
Machine learning algorithms and solvers. Scikit-learn, Keras, XGBoost, E1071, and caret. - Interactive prototyping environment:
Jupyter, R studio, MATLAB, and iTorch.
摘录来自: Oleksandr Sosnovshchenko. “Machine Learning with Swift: Artificial Intelligence for iOS。” Apple Books.