英文原文地址 "Memory optimization for feeds on Android" 讀後感 在Java中HashSet只能存放繼承自Objcet的對象,這中情況下“基本數據類型”轉化為繼承自Object的( 、`Long`等)會產生很多中間Object對象,占用過多的記憶體,從而引發垃 ...
英文原文地址Memory optimization for feeds on Android
讀後感
在Java中HashSet只能存放繼承自Objcet的對象,這中情況下“基本數據類型”轉化為繼承自Object的(Integer
、Long
等)會產生很多中間Object對象,占用過多的記憶體,從而引發垃圾回收,造成系統卡頓。
下邊是原文及翻譯
Millions of people use Facebook on Android devices every day, scrolling through News Feed, Profile, Events, Pages, and Groups to interact with the people and information they care about. All of these different feed types are powered by a platform created by the Android Feed Platform team, so any optimization we make to the Feed platform could potentially improve performance across our app. We focus on scroll performance, as we want people to have a smooth experience when scrolling through their feeds.
每天有數百萬的用戶,在Android平臺上使用Facebook,通過滑動 "新聞動態"、"用戶資料"、"Events"、"Pages"和"群組"動態列表 來與他們關心的人和信息互動。所有這些不同的動態類型是由Android動態平臺組提供的,因此我們對動態列表的任何優化,都有可能提高Fackbook這個app的性能。我們專註於動態列表滑動時的性能,當用戶滑動動態列表時,我們希望可以為用戶提供一種流暢的用戶體驗。
To help us achieve that, we have several automatic tools that run performance tests on the Feed platform, across different scenarios and on different devices, measuring how our code performs in runtime, memory use, frame rate, and more. One of those tools, Traceview, showed a relatively high number of calls to the Long.valueOf() function, which led to objects accumulating in memory and causing the app to stall. This post describes the problem, the potential solutions we weighed, and the set of optimizations we made to improve the Feed platform.
為了幫助我們實現這一點,我們使用了幾個自動工具,在不同的設備和不同的情景下,測試動態列表滑動時候的記憶體使用情況、幀率等。這些測試工具中Traceview顯示Long.valueof()
方法的調用次數較多,使記憶體空閑對象迅速增加,從而影響App的運行速度。在這篇文章中,我們介紹了問題、潛在解決方案、以及在動態列表上我們做的優化。
The downside of convenience (便利化的缺陷)
After noticing the high number of calls to the Long.valueOf() function in one of our method profiling reports from Traceview, we ran further tests that confirmed that as we scrolled through News Feed, there were an unexpectedly high number of invocations for this method.
通過分析Traceview的運行報告,在註意到Long.valueof()
方法被多次調用後,我們進行了進一步的測試,確認當我們滑動新聞動態時,這個方法的調用數量出乎意料地高。
When we looked at the stacktrace, we found that the function was not being called explicitly from Facebook's code but implicitly by code inserted by the compiler. This function was called when assigning(分派、設定) a primitive long value where a Long object is expected. Java supports both Object and primitive representation of the simple types (e.g., integer, long character) and provides a way to seamlessly convert between them. This feature is called autoboxing, because it “boxes” a primitive type into a corresponding Object type. While that's a convenient development feature, it creates new objects unbeknownst to the developer.
當我們查看stacktrace時,我們發現在Facebook的代碼並沒有直接調用該方法,而是由編譯器插入的代碼隱式調用的。這個函數在long
轉化為Long
時被調用。Java提供了“基本數據類型”(例如: int
、long
)與繼承自Object
數據類型(例如: Integer
、Long
)轉化的支持。這個特性被稱為 autoboxing,它將“基本數據類型”轉化為相應的對象類型。這是一個方便的開發特性,在開發者無感知的情況下,為開發者創建了一個新的對象。
And these objects add up.
In this heap dump taken from a sample app, Long
objects have a noticeable presence; while each object is not big by itself, there are so many that they occupy(占據) a large portion( 部分) of the app's memory in the heap. This is especially problematic(問題的) for devices running the Dalvik runtime. Unlike ART, which is the newer Android runtime environment, Dalvik doesn't have a generational garbage collection, known to be more optimal for handling many small objects. As we scroll through News Feed and the number of objects grows, the garbage collection will cause the app to pause and sweep(打掃) unused objects from the memory. The more objects that accumulate(積攢), the more frequently( 頻繁地) the garbage collector will have to pause the app, causing it to stutter(結巴) and stall and making for a poor user experience.
截圖取自一個示例App,其中Long
有一個明顯的調用,從圖中可以看出,每個對象本身並不大,卻占用了App堆記憶體的很大一部分空間。這肯定是有問題的。不像Android runtime(ART虛擬機)環境,Dalvik虛擬機在處理許多小對象時,並沒有分代垃圾收集機制。當我們滾動 “新聞動態” 時,對象的數量迅速增長,垃圾回收機制會導致應用程式暫停並清理沒有引用的對象的記憶體。對象積累越多,垃圾回收的次數就越頻繁。進行垃圾回收時,Dalvik將暫停應用程式,導致App卡頓,造成一個很差的用戶體驗。
Luckily, tools such as Traceview and Allocation Tracker can help find where these calls are made. After reviewing the origins of these autoboxing occurrences, we found that the majority of them were done while inserting long values into a HashSet
幸運的是,Traceview
和Allocation Tracker
這些工具可以幫助我們查找這些調用。回顧這些autoboxing
轉化,我們發現大部分的autoboxing
轉化發生在long
插入到HashSet<Long>
時。(我們用HashSet<Long>
這個數據結構存儲“新聞動態”數據,並通過hash查找某一個“新聞動態”數據是否在此數據結構中)。當用戶滑動“新聞動態”列表時,HashSet提供快速快速的數據查找能力。HashSet
只能存儲繼承自Object的對象,並不能存儲“基本數據類型”,而我們用於計算和存儲的對象是一個long
型的變數,當進行setStories.put(lStoryHash)
操作時,不可避免的發生long
到Long
的轉化。
As a solution, a Set
implementation( 實現) for primitive types can be used, but that turned out not to be as straightforward(簡單的) as we expected.
其中一個解決方案,可以實現一種支持"“基本數據類型”數據類型"的集合,但這種解決方案並不像我們預期的那麼簡單。
Existing solutions (解決方案)
There are a few existing Java libraries that provide a Set
implementation for primitive types. Almost all of these libraries were created more than 10 years ago, when the only Java running on mobile devices was J2ME. So, to determine viability, we needed to test them under Dalvik/ART and ensure they could perform well on more constrained mobile devices. We created a small testing framework to help compare these libraries with the existing HashSet
and with one another. The results showed that a few of these libraries had a faster runtime than HashSet
, and with fewer Long objects, but they still internally allocated a lot of objects. As an example, TLongHashSet
, part of the Trove library, allocated about 2 MB worth of objects when tested with 1,000 items:
有一些現有的Java庫,為"“基本數據類型”提供Set
實現。幾乎所有這些庫創建十多年前,當時在移動設備上使用Java的設備只有J2ME。因此,為了確定是否可行,我們需要在Dalvik/ART
上進行測試,確保他們可以運行在更多的移動設備上。我們構建了一個小測試框架來,來實現這些現有庫與HashSet
進行比較。結果表明,其中一些庫相比於HashSet
運行速度更快,創建的Long
對象更少,但他們仍然在記憶體中分配了大量的對象。例如,Trove庫中的TLongHashSet
,1000項Items分配了約2 MB記憶體控制項的Object對象。
Testing other libraries, such as PCJ and Colt, showed similar results.
It was possible that the existing solutions didn't fit our needs. We considered whether we could instead create a new Set implementation and optimize it for Android. Looking inside Java's HashSet, there's a relatively simple implementation using a single HashMap to do the heavy lifting.
現有的解決方案不符合我們的需要。我們考慮是否可以創建一個對Android優化的Set
實現。查看Java HashSet
源碼實現,發現其實現相對簡單,內部採用HashMap
來實現。
public class HashSet<E> extends AbstractSet<E> implements Set<E>, ... {
transient HashMap<E, HashSet<E>> backingMap;
...
@Override public boolean add(E object) {
return backingMap.put(object, this) == null;
}
@Override public boolean contains(Object object) {
return backingMap.containsKey(object);
}
...
}
Adding a new item to the HashSet
means adding it to the internal HashMap
where the object is the key and the HashSet
's instance is the value. To check object membership, HashSet
checks whether its internal HashMap
contains the object as a key. An alternative to HashSet
could be implemented using an Android-optimized map and the same principles.
添加一個Item到HashSet
意味著將此Item添加到HashMap
中,在HashMap中,該Item對象做為Key存在;在HashSet
中查找該Item對象時,實際是在對比HashMap
中的Key中查找是否存在該Item對象。在Android平臺上的數據交換同樣使用的這套準則。
Introducing(介紹) LongArraySet
You may already be familiar with LongSparseArray
, a class in the Android Support Library that acts as a map using a primitive long
as a key. Example usage:
您可能已經熟悉LongSparseArray
,其存在於Android的Support Library
庫中,做為一個Map
實現,其key為long
型數據變數。使用示例:
LongSparseArray<String> longSparseArray = new LongSparseArray<>();
longSparseArray.put(3L, "Data");
String data = longSparseArray.get(3L); // the value of data is "Data"
LongSparseArray
works differently than HashMap
, though. When calling mapHashmap.get(KEY5)
, this is how the value is found in the HashMap
:
LongSparseArray
與HashMap
工作原理不同。當調用mapHashmap.get(KEY5)
時,以下為HashMap
查找原理:
When retrieving a value using a key on a HashMap, it's accessing the value in the array using a hash of the key as the index, a direct access in O(1)
. Making the same call on a LongSparseArray looks like this:
在HashMap
中檢索一個值時,通過其對應key的hash值進行比較,直接檢索時時間複雜度為O(1)
。LongSparseArray
做同樣操作時:
LongSparseArray
searches a sorted keys array for the key's value using a binary search, an operation with runtime of O(log N)
. The index of the key in the array is later used to find the value in the values array.
LongSparseArray
搜索某一個key時,採用二分查找演算法查找存放key的數組,時間複雜度O(log N)
。查找到的key的index用於在Value數組中找到對應的value。
HashMap
allocates one big array in an effort to avoid collisions, which are causing the search to be slower. LongSparseArray
allocates two small arrays, making its memory footprint smaller. But to support its search algorithm, LongSparseArray
needs to allocate its internal arrays in consecutive memory blocks. Adding more items will require allocating new arrays when there's no more space in the current ones. The way LongSparseArray
works makes it less ideal when holding more than 1,000 items, where these differences have a more significant impact on performance. (You can learn more about LongSparseArray
in the official documentation and by watching this short video by Google.)
為了避免數據衝突HashMap
創建了一個很大的數組,從而導致搜索速度較慢。LongSparseArray
分配兩個較小的數組,使其記憶體占用較小。但是為了支持其搜索演算法,LongSparseArray
需要分配連續的記憶體塊。添加更多的Item時,當數組空間不足時,需要分配新的數組空間。LongSparseArray
的工作方式,使得Item數量超過1000項時,其表現不是很理想,這些差異對性能有更重大的影響。(想瞭解更多關於LongSparseArray
的內容,可以閱讀official documentation和谷歌提供的short video ) 。
Since the LongSparseArray'
s keys are of a primitive long
type, we can create a data structure with the same approach as HashSet
but with a LongSparseArray
as the internal map instead of HashMap
.
由於LongSparseArray
使用“基本數據類型”long
做為key,我們可以用LongSparseArray
替換HashSet
中的HashMap
,從而創建一個與HashSet
相似的數據結構。
And so LongArraySet
was created.
因此LongArraySet
被創建出來了。
The new data structure looked promising, but the first rule of optimization is “always measure.” By using the same testing framework from earlier, we compared the new data structure with HashSet
. Each data structure was tested by adding X number of items, checking the existence of each item, and later removing all of them. We ran tests using different numbers of items (X=10, X=100, X=1,000 ...) and averaged the time it took to complete each operation per item.
新的數據結構看起來是很有前景的,但是最優的解決方案總是在不斷的權衡的。通過之間介紹的小測試程式,我們把新的數據結構與HashSet
進行了對比,對每一種數據結構進行添加X條數據、檢測Item數據存在情況和移除數據測試。我們測試使用不同數量的Item(X = 10、X = 100、X = 1000…),並計算對應操作的平均時間。
The runtime results (time shown is in nanoseconds):
測試結果(單位:納秒)
We saw a runtime improvement for the contains
and remove
methods using the new data structure. Also, as the number of items increased in the array set, it took more time to add new items. That's consistent with what we already knew about LongSparseArray
— it doesn't perform as well as HashMap
when the number of items is more than 1,000. In our use cases, we're dealing with only hundreds of items, so this is a trade-off we're willing to make.
我們看到使用新的數據結構後,contains
與remove
操作性能的顯著提升。同時,隨著item數量的增加,需要花更多的時間去添加一個新的item。這符合我們對於 LongSparseArray
中item超過1000後的操作表現預期。在我們的案例中,我們僅僅要處理數百條的數據,因此經過權衡,我們將採用這種方案。
We also saw a big improvement related to memory. In reviewing the heap dumps and Allocation Tracker reports, we noticed a decrease in object allocations. Here's a side-by-side allocation report for the HashSet
and LongArraySet
implementations, when adding 1,000 items for 20 iterations:
我們還看到一個記憶體的重大提升。回顧heap dumps
和Allocation Tracker
報告,我們註意到對象分配的減少。以下添加1000條數據時,HashSet
和LongArraySet
的記憶體分配報告。
In addition to avoiding all the Long
object allocations, LongSparseArray
was more memory-efficient internally, with about 30 percent fewer allocations in this scenario.
避免了Long
對象占用記憶體,在這種情景下,LongSparseArray
減少了30%的記憶體占用。
Conclusion 結論
By understanding how other data structures work, we were able to create a more optimized data structure for our needs. The less the garbage collector has to work, the lower the likelihood of dropped frames. Using the new LongArraySet
class, and a similar IntArraySet
for the primitive int data type, we were able to cut down a significant number of allocations in our entire app.
通過瞭解數據結構的工作原理,我們可以創建一個滿足我們要求的更優的數據結構。垃圾回收的次數越少,對幀率的影響就越小。使用LongArraySet
和IntArraySet
,我們可以減少整個應用程式中,對象的記憶體分配。
This case study demonstrates the importance of measuring our options to ensure that we were introducing an improvement. We also accepted that while this solution is not perfect for all use cases, since this implementation is slower for very large data sets, it was an acceptable trade-off for us to optimize our code.
這個學習案例展示了,為了確保我們引入改進的可行,不斷權衡的重要性。對於這個方案並不適用於所有的情況,這點我們是可以接收的,雖然當數據量很大時,這個方案的效率是很差的,但是這是一個我們可以接受的折衷優化方案。
You can find the source code for the two data structures here. We are excited to continue working on challenges and optimizing our Feed platform and to share our solutions with the community.
從這裡可以下載LongArraySet
與IntArraySet
的代碼。
如果打不開,可以從以下連接獲取源碼:
http://blog.csdn.net/xiaxl/article/details/72730959
本人英語水平較爛,如有翻譯錯誤之處,請指出,我會積極修正,謝謝大家